[PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like
@ 2017-05-03 11:37 Chris Wilson
  2017-05-03 11:37 ` [PATCH 2/9] drm/i915: Redefine ptr_pack_bits() and friends Chris Wilson
                   ` (8 more replies)
  0 siblings, 9 replies; 33+ messages in thread
From: Chris Wilson @ 2017-05-03 11:37 UTC (permalink / raw)
  To: intel-gfx

ptr_unpack_bits() is a function-like macro, as such it is meant to be
replaceable by a function. In this case, we should be passing in the
out-param as a pointer.

Bizarrely this does affect code generation:

function                                     old     new   delta
i915_gem_object_pin_map                      409     389     -20

An improvement(?) in this case, but one can't help wonder what
strict-aliasing optimisations we are preventing.

The generated code looks identical in using ptr_unpack_bits (no extra
motions to stack, the pointer and bits appear to be kept in registers),
the difference appears to be code ordering and with a reorder it is able
to use smaller forward jumps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c   | 2 +-
 drivers/gpu/drm/i915/i915_utils.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0f8046e0a63c..a0186a5df05b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2631,7 +2631,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	}
 	GEM_BUG_ON(!obj->mm.pages);
 
-	ptr = ptr_unpack_bits(obj->mm.mapping, has_type);
+	ptr = ptr_unpack_bits(obj->mm.mapping, &has_type);
 	if (ptr && has_type != type) {
 		if (pinned) {
 			ret = -EBUSY;
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index c5455d36b617..aca11aad5da7 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -77,7 +77,7 @@
 
 #define ptr_unpack_bits(ptr, bits) ({					\
 	unsigned long __v = (unsigned long)(ptr);			\
-	(bits) = __v & ~PAGE_MASK;					\
+	*(bits) = __v & ~PAGE_MASK;					\
 	(typeof(ptr))(__v & PAGE_MASK);					\
 })
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/9] drm/i915: Redefine ptr_pack_bits() and friends
  2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
@ 2017-05-03 11:37 ` Chris Wilson
  2017-05-03 11:37 ` [PATCH 3/9] drm/i915/execlists: Pack the count into the low bits of the port.request Chris Wilson
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2017-05-03 11:37 UTC (permalink / raw)
  To: intel-gfx

Rebrand the current (pointer | bits) pack/unpack utility macros as
explicit bit twiddling for PAGE_SIZE so that we can use the more
flexible underlying macros for different bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c        |  6 +++---
 drivers/gpu/drm/i915/i915_utils.h      | 19 +++++++++++++------
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 2a1a3347495a..f0cb22cc0dd6 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1284,7 +1284,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 
 		if (*cmd == MI_BATCH_BUFFER_END) {
 			if (needs_clflush_after) {
-				void *ptr = ptr_mask_bits(shadow_batch_obj->mm.mapping);
+				void *ptr = page_mask_bits(shadow_batch_obj->mm.mapping);
 				drm_clflush_virt_range(ptr,
 						       (void *)(cmd + 1) - ptr);
 			}
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a0186a5df05b..f79079f2e265 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2299,7 +2299,7 @@ void __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
 	if (obj->mm.mapping) {
 		void *ptr;
 
-		ptr = ptr_mask_bits(obj->mm.mapping);
+		ptr = page_mask_bits(obj->mm.mapping);
 		if (is_vmalloc_addr(ptr))
 			vunmap(ptr);
 		else
@@ -2631,7 +2631,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	}
 	GEM_BUG_ON(!obj->mm.pages);
 
-	ptr = ptr_unpack_bits(obj->mm.mapping, &has_type);
+	ptr = page_unpack_bits(obj->mm.mapping, &has_type);
 	if (ptr && has_type != type) {
 		if (pinned) {
 			ret = -EBUSY;
@@ -2653,7 +2653,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 			goto err_unpin;
 		}
 
-		obj->mm.mapping = ptr_pack_bits(ptr, type);
+		obj->mm.mapping = page_pack_bits(ptr, type);
 	}
 
 out_unlock:
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index aca11aad5da7..f0500c65726d 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -70,20 +70,27 @@
 #define overflows_type(x, T) \
 	(sizeof(x) > sizeof(T) && (x) >> (sizeof(T) * BITS_PER_BYTE))
 
-#define ptr_mask_bits(ptr) ({						\
+#define ptr_mask_bits(ptr, n) ({					\
 	unsigned long __v = (unsigned long)(ptr);			\
-	(typeof(ptr))(__v & PAGE_MASK);					\
+	(typeof(ptr))(__v & -BIT(n));					\
 })
 
-#define ptr_unpack_bits(ptr, bits) ({					\
+#define ptr_unmask_bits(ptr, n) ((unsigned long)(ptr) & (BIT(n) - 1))
+
+#define ptr_unpack_bits(ptr, bits, n) ({				\
 	unsigned long __v = (unsigned long)(ptr);			\
-	*(bits) = __v & ~PAGE_MASK;					\
-	(typeof(ptr))(__v & PAGE_MASK);					\
+	*(bits) = __v & (BIT(n) - 1);					\
+	(typeof(ptr))(__v & -BIT(n));					\
 })
 
-#define ptr_pack_bits(ptr, bits)					\
+#define ptr_pack_bits(ptr, bits, n)					\
 	((typeof(ptr))((unsigned long)(ptr) | (bits)))
 
+#define page_mask_bits(ptr) ptr_mask_bits(ptr, PAGE_SHIFT)
+#define page_unmask_bits(ptr) ptr_unmask_bits(ptr, PAGE_SHIFT)
+#define page_pack_bits(ptr, bits) ptr_pack_bits(ptr, bits, PAGE_SHIFT)
+#define page_unpack_bits(ptr, bits) ptr_unpack_bits(ptr, bits, PAGE_SHIFT)
+
 #define ptr_offset(ptr, member) offsetof(typeof(*(ptr)), member)
 
 #define fetch_and_zero(ptr) ({						\
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/9] drm/i915/execlists: Pack the count into the low bits of the port.request
  2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
  2017-05-03 11:37 ` [PATCH 2/9] drm/i915: Redefine ptr_pack_bits() and friends Chris Wilson
@ 2017-05-03 11:37 ` Chris Wilson
  2017-05-05 10:49   ` Tvrtko Ursulin
  2017-05-03 11:37 ` [PATCH 4/9] drm/i915: Don't mark an execlists context-switch when idle Chris Wilson
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-03 11:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function                                     old     new   delta
execlists_submit_ports                       262     471    +209
port_assign.isra                               -     136    +136
capture                                     6344    6359     +15
reset_common_ring                            438     452     +14
execlists_submit_request                     228     238     +10
gen8_init_common_ring                        334     341      +7
intel_engine_is_idle                         106     105      -1
i915_engine_info                            2314    2290     -24
__i915_gem_set_wedged_BKL                    485     411     -74
intel_lrc_irq_handler                       1789    1604    -185
execlists_update_context                     294       -    -294

The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).

v2: Use the port_api() for guc as well (even though currently we do not
pack any counters in there, yet) and hide all port->request_count inside
the helpers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  32 ++++----
 drivers/gpu/drm/i915/i915_gem.c            |   6 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |  13 +++-
 drivers/gpu/drm/i915/i915_guc_submission.c |  32 +++++---
 drivers/gpu/drm/i915/intel_engine_cs.c     |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 117 ++++++++++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  10 ++-
 7 files changed, 122 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 870c470177b5..0b5d7142d8d9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3315,6 +3315,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 		if (i915.enable_execlists) {
 			u32 ptr, read, write;
 			struct rb_node *rb;
+			unsigned int idx;
 
 			seq_printf(m, "\tExeclist status: 0x%08x %08x\n",
 				   I915_READ(RING_EXECLIST_STATUS_LO(engine)),
@@ -3332,8 +3333,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			if (read > write)
 				write += GEN8_CSB_ENTRIES;
 			while (read < write) {
-				unsigned int idx = ++read % GEN8_CSB_ENTRIES;
-
+				idx = ++read % GEN8_CSB_ENTRIES;
 				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
 					   idx,
 					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
@@ -3341,21 +3341,19 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			}
 
 			rcu_read_lock();
-			rq = READ_ONCE(engine->execlist_port[0].request);
-			if (rq) {
-				seq_printf(m, "\t\tELSP[0] count=%d, ",
-					   engine->execlist_port[0].count);
-				print_request(m, rq, "rq: ");
-			} else {
-				seq_printf(m, "\t\tELSP[0] idle\n");
-			}
-			rq = READ_ONCE(engine->execlist_port[1].request);
-			if (rq) {
-				seq_printf(m, "\t\tELSP[1] count=%d, ",
-					   engine->execlist_port[1].count);
-				print_request(m, rq, "rq: ");
-			} else {
-				seq_printf(m, "\t\tELSP[1] idle\n");
+			for (idx = 0; idx < ARRAY_SIZE(engine->execlist_port); idx++) {
+				unsigned int count;
+
+				rq = port_unpack(&engine->execlist_port[idx],
+						 &count);
+				if (rq) {
+					seq_printf(m, "\t\tELSP[%d] count=%d, ",
+						   idx, count);
+					print_request(m, rq, "rq: ");
+				} else {
+					seq_printf(m, "\t\tELSP[%d] idle\n",
+						   idx);
+				}
 			}
 			rcu_read_unlock();
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f79079f2e265..c1df4b9d2661 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3038,12 +3038,14 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
 	 */
 
 	if (i915.enable_execlists) {
+		struct execlist_port *port = engine->execlist_port;
 		unsigned long flags;
+		unsigned int n;
 
 		spin_lock_irqsave(&engine->timeline->lock, flags);
 
-		i915_gem_request_put(engine->execlist_port[0].request);
-		i915_gem_request_put(engine->execlist_port[1].request);
+		for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++)
+			i915_gem_request_put(port_request(&port[n]));
 		memset(engine->execlist_port, 0, sizeof(engine->execlist_port));
 		engine->execlist_queue = RB_ROOT;
 		engine->execlist_first = NULL;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index ec526d92f908..e18f350bc364 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1324,12 +1324,17 @@ static void engine_record_requests(struct intel_engine_cs *engine,
 static void error_record_engine_execlists(struct intel_engine_cs *engine,
 					  struct drm_i915_error_engine *ee)
 {
+	const struct execlist_port *port = engine->execlist_port;
 	unsigned int n;
 
-	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++)
-		if (engine->execlist_port[n].request)
-			record_request(engine->execlist_port[n].request,
-				       &ee->execlist[n]);
+	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
+		struct drm_i915_gem_request *rq = port_request(&port[n]);
+
+		if (!rq)
+			break;
+
+		record_request(rq, &ee->execlist[n]);
+	}
 }
 
 static void record_context(struct drm_i915_error_context *e,
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 7e85b5ab8ae2..62d3831a8a0d 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -653,10 +653,22 @@ static void nested_enable_signaling(struct drm_i915_gem_request *rq)
 	spin_unlock(&rq->lock);
 }
 
+static void port_assign(struct execlist_port *port,
+			struct drm_i915_gem_request *rq)
+{
+	GEM_BUG_ON(rq == port_request(port));
+
+	if (port_isset(port))
+		i915_gem_request_put(port_request(port));
+
+	port_set(port, i915_gem_request_get(rq));
+	nested_enable_signaling(rq);
+}
+
 static bool i915_guc_dequeue(struct intel_engine_cs *engine)
 {
 	struct execlist_port *port = engine->execlist_port;
-	struct drm_i915_gem_request *last = port[0].request;
+	struct drm_i915_gem_request *last = port_request(&port[0]);
 	struct rb_node *rb;
 	bool submit = false;
 
@@ -670,8 +682,7 @@ static bool i915_guc_dequeue(struct intel_engine_cs *engine)
 			if (port != engine->execlist_port)
 				break;
 
-			i915_gem_request_assign(&port->request, last);
-			nested_enable_signaling(last);
+			port_assign(port, last);
 			port++;
 		}
 
@@ -686,8 +697,7 @@ static bool i915_guc_dequeue(struct intel_engine_cs *engine)
 		submit = true;
 	}
 	if (submit) {
-		i915_gem_request_assign(&port->request, last);
-		nested_enable_signaling(last);
+		port_assign(port, last);
 		engine->execlist_first = rb;
 	}
 	spin_unlock_irq(&engine->timeline->lock);
@@ -703,17 +713,19 @@ static void i915_guc_irq_handler(unsigned long data)
 	bool submit;
 
 	do {
-		rq = port[0].request;
+		rq = port_request(&port[0]);
 		while (rq && i915_gem_request_completed(rq)) {
 			trace_i915_gem_request_out(rq);
 			i915_gem_request_put(rq);
-			port[0].request = port[1].request;
-			port[1].request = NULL;
-			rq = port[0].request;
+
+			port[0] = port[1];
+			memset(&port[1], 0, sizeof(port[1]));
+
+			rq = port_request(&port[0]);
 		}
 
 		submit = false;
-		if (!port[1].request)
+		if (!port_count(&port[1]))
 			submit = i915_guc_dequeue(engine);
 	} while (submit);
 }
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 6d3d83876da9..24659788e7a3 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1230,7 +1230,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 		return false;
 
 	/* Both ports drained, no more ELSP submission? */
-	if (engine->execlist_port[0].request)
+	if (port_request(&engine->execlist_port[0]))
 		return false;
 
 	/* Ring stopped? */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0909549ad320..9f7b6112c53a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -337,39 +337,32 @@ static u64 execlists_update_context(struct drm_i915_gem_request *rq)
 
 static void execlists_submit_ports(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
 	struct execlist_port *port = engine->execlist_port;
 	u32 __iomem *elsp =
-		dev_priv->regs + i915_mmio_reg_offset(RING_ELSP(engine));
-	u64 desc[2];
-
-	GEM_BUG_ON(port[0].count > 1);
-	if (!port[0].count)
-		execlists_context_status_change(port[0].request,
-						INTEL_CONTEXT_SCHEDULE_IN);
-	desc[0] = execlists_update_context(port[0].request);
-	GEM_DEBUG_EXEC(port[0].context_id = upper_32_bits(desc[0]));
-	port[0].count++;
-
-	if (port[1].request) {
-		GEM_BUG_ON(port[1].count);
-		execlists_context_status_change(port[1].request,
-						INTEL_CONTEXT_SCHEDULE_IN);
-		desc[1] = execlists_update_context(port[1].request);
-		GEM_DEBUG_EXEC(port[1].context_id = upper_32_bits(desc[1]));
-		port[1].count = 1;
-	} else {
-		desc[1] = 0;
-	}
-	GEM_BUG_ON(desc[0] == desc[1]);
+		engine->i915->regs + i915_mmio_reg_offset(RING_ELSP(engine));
+	unsigned int n;
 
-	/* You must always write both descriptors in the order below. */
-	writel(upper_32_bits(desc[1]), elsp);
-	writel(lower_32_bits(desc[1]), elsp);
+	for (n = ARRAY_SIZE(engine->execlist_port); n--; ) {
+		struct drm_i915_gem_request *rq;
+		unsigned int count;
+		u64 desc;
+
+		rq = port_unpack(&port[n], &count);
+		if (rq) {
+			GEM_BUG_ON(count > !n);
+			if (!count++)
+				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
+			port_set(&port[n], port_pack(rq, count));
+			desc = execlists_update_context(rq);
+			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
+		} else {
+			GEM_BUG_ON(!n);
+			desc = 0;
+		}
 
-	writel(upper_32_bits(desc[0]), elsp);
-	/* The context is automatically loaded after the following */
-	writel(lower_32_bits(desc[0]), elsp);
+		writel(upper_32_bits(desc), elsp);
+		writel(lower_32_bits(desc), elsp);
+	}
 }
 
 static bool ctx_single_port_submission(const struct i915_gem_context *ctx)
@@ -390,6 +383,17 @@ static bool can_merge_ctx(const struct i915_gem_context *prev,
 	return true;
 }
 
+static void port_assign(struct execlist_port *port,
+			struct drm_i915_gem_request *rq)
+{
+	GEM_BUG_ON(rq == port_request(port));
+
+	if (port_isset(port))
+		i915_gem_request_put(port_request(port));
+
+	port_set(port, port_pack(i915_gem_request_get(rq), port_count(port)));
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *last;
@@ -397,7 +401,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	struct rb_node *rb;
 	bool submit = false;
 
-	last = port->request;
+	last = port_request(port);
 	if (last)
 		/* WaIdleLiteRestore:bdw,skl
 		 * Apply the wa NOOPs to prevent ring:HEAD == req:TAIL
@@ -407,7 +411,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		 */
 		last->tail = last->wa_tail;
 
-	GEM_BUG_ON(port[1].request);
+	GEM_BUG_ON(port_isset(&port[1]));
 
 	/* Hardware submission is through 2 ports. Conceptually each port
 	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
@@ -464,7 +468,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 			GEM_BUG_ON(last->ctx == cursor->ctx);
 
-			i915_gem_request_assign(&port->request, last);
+			if (submit)
+				port_assign(port, last);
 			port++;
 		}
 
@@ -479,7 +484,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		submit = true;
 	}
 	if (submit) {
-		i915_gem_request_assign(&port->request, last);
+		port_assign(port, last);
 		engine->execlist_first = rb;
 	}
 	spin_unlock_irq(&engine->timeline->lock);
@@ -488,16 +493,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		execlists_submit_ports(engine);
 }
 
-static bool execlists_elsp_idle(struct intel_engine_cs *engine)
-{
-	return !engine->execlist_port[0].request;
-}
-
 static bool execlists_elsp_ready(const struct intel_engine_cs *engine)
 {
 	const struct execlist_port *port = engine->execlist_port;
 
-	return port[0].count + port[1].count < 2;
+	return port_count(&port[0]) + port_count(&port[1]) < 2;
 }
 
 /*
@@ -547,7 +547,9 @@ static void intel_lrc_irq_handler(unsigned long data)
 		tail = GEN8_CSB_WRITE_PTR(head);
 		head = GEN8_CSB_READ_PTR(head);
 		while (head != tail) {
+			struct drm_i915_gem_request *rq;
 			unsigned int status;
+			unsigned int count;
 
 			if (++head == GEN8_CSB_ENTRIES)
 				head = 0;
@@ -577,20 +579,24 @@ static void intel_lrc_irq_handler(unsigned long data)
 			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
 					 port[0].context_id);
 
-			GEM_BUG_ON(port[0].count == 0);
-			if (--port[0].count == 0) {
+			rq = port_unpack(&port[0], &count);
+			GEM_BUG_ON(count == 0);
+			if (--count == 0) {
 				GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
-				GEM_BUG_ON(!i915_gem_request_completed(port[0].request));
-				execlists_context_status_change(port[0].request,
-								INTEL_CONTEXT_SCHEDULE_OUT);
+				GEM_BUG_ON(!i915_gem_request_completed(rq));
+				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
+
+				trace_i915_gem_request_out(rq);
+				i915_gem_request_put(rq);
 
-				trace_i915_gem_request_out(port[0].request);
-				i915_gem_request_put(port[0].request);
 				port[0] = port[1];
 				memset(&port[1], 0, sizeof(port[1]));
+			} else {
+				port_set(&port[0], port_pack(rq, count));
 			}
 
-			GEM_BUG_ON(port[0].count == 0 &&
+			/* After the final element, the hw should be idle */
+			GEM_BUG_ON(port_count(&port[0]) == 0 &&
 				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
 		}
 
@@ -1148,6 +1154,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
 	struct drm_i915_private *dev_priv = engine->i915;
 	struct execlist_port *port = engine->execlist_port;
 	unsigned int n;
+	bool submit;
 	int ret;
 
 	ret = intel_mocs_init_engine(engine);
@@ -1169,19 +1176,21 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
 	/* After a GPU reset, we may have requests to replay */
 	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 
+	submit = false;
 	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
-		if (!port[n].request)
+		if (!port_isset(&port[n]))
 			break;
 
 		DRM_DEBUG_DRIVER("Restarting %s:%d from 0x%x\n",
 				 engine->name, n,
-				 port[n].request->global_seqno);
+				 port_request(&port[n])->global_seqno);
 
 		/* Discard the current inflight count */
-		port[n].count = 0;
+		port_set(&port[n], port_request(&port[n]));
+		submit = true;
 	}
 
-	if (!i915.enable_guc_submission && !execlists_elsp_idle(engine))
+	if (submit && !i915.enable_guc_submission)
 		execlists_submit_ports(engine);
 
 	return 0;
@@ -1259,13 +1268,13 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 	intel_ring_update_space(request->ring);
 
 	/* Catch up with any missed context-switch interrupts */
-	if (request->ctx != port[0].request->ctx) {
-		i915_gem_request_put(port[0].request);
+	if (request->ctx != port_request(&port[0])->ctx) {
+		i915_gem_request_put(port_request(&port[0]));
 		port[0] = port[1];
 		memset(&port[1], 0, sizeof(port[1]));
 	}
 
-	GEM_BUG_ON(request->ctx != port[0].request->ctx);
+	GEM_BUG_ON(request->ctx != port_request(&port[0])->ctx);
 
 	/* Reset WaIdleLiteRestore:bdw,skl as well */
 	request->tail =
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4a599e3480a9..68765d45116b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -368,8 +368,14 @@ struct intel_engine_cs {
 	/* Execlists */
 	struct tasklet_struct irq_tasklet;
 	struct execlist_port {
-		struct drm_i915_gem_request *request;
-		unsigned int count;
+		struct drm_i915_gem_request *request_count;
+#define EXECLIST_COUNT_BITS 2
+#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
+#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
+#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
+#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
+#define port_set(p, packed) ((p)->request_count = (packed))
+#define port_isset(p) ((p)->request_count)
 		GEM_DEBUG_DECL(u32 context_id);
 	} execlist_port[2];
 	struct rb_root execlist_queue;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9] drm/i915: Don't mark an execlists context-switch when idle
  2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
  2017-05-03 11:37 ` [PATCH 2/9] drm/i915: Redefine ptr_pack_bits() and friends Chris Wilson
  2017-05-03 11:37 ` [PATCH 3/9] drm/i915/execlists: Pack the count into the low bits of the port.request Chris Wilson
@ 2017-05-03 11:37 ` Chris Wilson
  2017-05-03 11:37 ` [PATCH 5/9] drm/i915: Use a define for the default priority [0] Chris Wilson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2017-05-03 11:37 UTC (permalink / raw)
  To: intel-gfx

If we *know* that the engine is idle, i.e. we have not more contexts in
flight, we can skip any spurious CSB idle interrupts. These spurious
interrupts seem to arrive long after we assert that the engines are
completely idle, triggering later assertions:

[  178.896646] intel_engine_is_idle(bcs): interrupt not handled, irq_posted=2
[  178.896655] ------------[ cut here ]------------
[  178.896658] kernel BUG at drivers/gpu/drm/i915/intel_engine_cs.c:226!
[  178.896661] invalid opcode: 0000 [#1] SMP
[  178.896663] Modules linked in: i915(E) x86_pkg_temp_thermal(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) intel_gtt(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) aesni_intel(E) prime_numbers(E) evdev(E) aes_x86_64(E) drm(E) crypto_simd(E) cryptd(E) glue_helper(E) mei_me(E) mei(E) lpc_ich(E) efivars(E) mfd_core(E) battery(E) video(E) acpi_pad(E) button(E) tpm_tis(E) tpm_tis_core(E) tpm(E) autofs4(E) i2c_i801(E) fan(E) thermal(E) i2c_designware_platform(E) i2c_designware_core(E)
[  178.896694] CPU: 1 PID: 522 Comm: gem_exec_whispe Tainted: G            E   4.11.0-rc5+ #14
[  178.896702] task: ffff88040aba8d40 task.stack: ffffc900003f0000
[  178.896722] RIP: 0010:intel_engine_init_global_seqno+0x1db/0x1f0 [i915]
[  178.896725] RSP: 0018:ffffc900003f3ab0 EFLAGS: 00010246
[  178.896728] RAX: 0000000000000000 RBX: ffff88040af54000 RCX: 0000000000000000
[  178.896731] RDX: ffff88041ec933e0 RSI: ffff88041ec8cc48 RDI: ffff88041ec8cc48
[  178.896734] RBP: ffffc900003f3ac8 R08: 0000000000000000 R09: 000000000000047d
[  178.896736] R10: 0000000000000040 R11: ffff88040b344f80 R12: 0000000000000000
[  178.896739] R13: ffff88040bce0000 R14: ffff88040bce52d8 R15: ffff88040bce0000
[  178.896742] FS:  00007f2cccc2d8c0(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000
[  178.896746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  178.896749] CR2: 00007f41ddd8f000 CR3: 000000040bb03000 CR4: 00000000001406e0
[  178.896752] Call Trace:
[  178.896768]  reset_all_global_seqno.part.33+0x4e/0xd0 [i915]
[  178.896782]  i915_gem_request_alloc+0x304/0x330 [i915]
[  178.896795]  i915_gem_do_execbuffer+0x8a1/0x17d0 [i915]
[  178.896799]  ? remove_wait_queue+0x48/0x50
[  178.896812]  ? i915_wait_request+0x300/0x590 [i915]
[  178.896816]  ? wake_up_q+0x70/0x70
[  178.896819]  ? refcount_dec_and_test+0x11/0x20
[  178.896823]  ? reservation_object_add_excl_fence+0xa5/0x100
[  178.896835]  i915_gem_execbuffer2+0xab/0x1f0 [i915]
[  178.896844]  drm_ioctl+0x1e6/0x460 [drm]
[  178.896858]  ? i915_gem_execbuffer+0x260/0x260 [i915]
[  178.896862]  ? dput+0xcf/0x250
[  178.896866]  ? full_proxy_release+0x66/0x80
[  178.896869]  ? mntput+0x1f/0x30
[  178.896872]  do_vfs_ioctl+0x8f/0x5b0
[  178.896875]  ? ____fput+0x9/0x10
[  178.896878]  ? task_work_run+0x80/0xa0
[  178.896881]  SyS_ioctl+0x3c/0x70
[  178.896885]  entry_SYSCALL_64_fastpath+0x17/0x98
[  178.896888] RIP: 0033:0x7f2ccb455ca7
[  178.896890] RSP: 002b:00007ffcabec72d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  178.896894] RAX: ffffffffffffffda RBX: 000055f897a44b90 RCX: 00007f2ccb455ca7
[  178.896897] RDX: 00007ffcabec74a0 RSI: 0000000040406469 RDI: 0000000000000003
[  178.896900] RBP: 00007f2ccb70a440 R08: 00007f2ccb70d0a4 R09: 0000000000000000
[  178.896903] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  178.896905] R13: 000055f89782d71a R14: 00007ffcabecf838 R15: 0000000000000003
[  178.896908] Code: 00 31 d2 4c 89 ef 8d 70 48 41 ff 95 f8 06 00 00 e9 68 fe ff ff be 0f 00 00 00 48 c7 c7 48 dc 37 a0 e8 fa 33 d6 e0 e9 0b ff ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00

On the other hand, by ignoring the interrupt do we risk running out of
space in CSB ring? Testing for a few hours suggests not, i.e. that we
only seem to get the odd delayed CSB idle notification.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 0e4dcbeb4d06..86ede88daaab 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1359,8 +1359,10 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift)
 	bool tasklet = false;
 
 	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift)) {
-		set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-		tasklet = true;
+		if (port_count(&engine->execlist_port[0])) {
+			set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+			tasklet = true;
+		}
 	}
 
 	if (iir & (GT_RENDER_USER_INTERRUPT << test_shift)) {
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/9] drm/i915: Use a define for the default priority [0]
  2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
                   ` (2 preceding siblings ...)
  2017-05-03 11:37 ` [PATCH 4/9] drm/i915: Don't mark an execlists context-switch when idle Chris Wilson
@ 2017-05-03 11:37 ` Chris Wilson
  2017-05-04 13:32   ` Joonas Lahtinen
  2017-05-03 11:37 ` [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list Chris Wilson
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-03 11:37 UTC (permalink / raw)
  To: intel-gfx

Explicitly assign the default priority, and give it a name (macro).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 1 +
 drivers/gpu/drm/i915/i915_gem_request.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 31a73c39239f..c0ae5cd3cf77 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -199,6 +199,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	kref_init(&ctx->ref);
 	list_add_tail(&ctx->link, &dev_priv->context_list);
 	ctx->i915 = dev_priv;
+	ctx->priority = I915_PRIORITY_DFL;
 
 	/* Default context will never have a file_priv */
 	ret = DEFAULT_CONTEXT_HANDLE;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 4ccab5affd3c..feb81463abc9 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -70,6 +70,7 @@ struct i915_priotree {
 	struct rb_node node;
 	int priority;
 #define I915_PRIORITY_MAX 1024
+#define I915_PRIORITY_DFL 0
 #define I915_PRIORITY_MIN (-I915_PRIORITY_MAX)
 };
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
                   ` (3 preceding siblings ...)
  2017-05-03 11:37 ` [PATCH 5/9] drm/i915: Use a define for the default priority [0] Chris Wilson
@ 2017-05-03 11:37 ` Chris Wilson
  2017-05-05 13:19   ` Tvrtko Ursulin
  2017-05-05 13:50   ` Tvrtko Ursulin
  2017-05-03 11:37 ` [PATCH 7/9] drm/i915/execlists: Reduce lock context between schedule/submit_request Chris Wilson
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 33+ messages in thread
From: Chris Wilson @ 2017-05-03 11:37 UTC (permalink / raw)
  To: intel-gfx

All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.

Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.

There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).

v2: Avoid use-after-free when deleting a depleted priolist

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  12 ++-
 drivers/gpu/drm/i915/i915_gem_request.c    |   4 +-
 drivers/gpu/drm/i915/i915_gem_request.h    |   2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |  53 ++++++----
 drivers/gpu/drm/i915/i915_utils.h          |   9 ++
 drivers/gpu/drm/i915/intel_lrc.c           | 159 +++++++++++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.h    |   7 ++
 7 files changed, 163 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0b5d7142d8d9..a8c7788d986e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3314,7 +3314,6 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 
 		if (i915.enable_execlists) {
 			u32 ptr, read, write;
-			struct rb_node *rb;
 			unsigned int idx;
 
 			seq_printf(m, "\tExeclist status: 0x%08x %08x\n",
@@ -3358,9 +3357,14 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			rcu_read_unlock();
 
 			spin_lock_irq(&engine->timeline->lock);
-			for (rb = engine->execlist_first; rb; rb = rb_next(rb)) {
-				rq = rb_entry(rb, typeof(*rq), priotree.node);
-				print_request(m, rq, "\t\tQ ");
+			for (rb = engine->execlist_first; rb; rb = rb_next(rb)){
+				struct execlist_priolist *plist =
+					rb_entry(rb, typeof(*plist), node);
+
+				list_for_each_entry(rq,
+						    &plist->requests,
+						    priotree.link)
+					print_request(m, rq, "\t\tQ ");
 			}
 			spin_unlock_irq(&engine->timeline->lock);
 		} else if (INTEL_GEN(dev_priv) > 6) {
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 8d2a5c8e5005..ad2be26923fb 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -159,7 +159,7 @@ i915_priotree_fini(struct drm_i915_private *i915, struct i915_priotree *pt)
 {
 	struct i915_dependency *dep, *next;
 
-	GEM_BUG_ON(!RB_EMPTY_NODE(&pt->node));
+	GEM_BUG_ON(!list_empty(&pt->link));
 
 	/* Everyone we depended upon (the fences we wait to be signaled)
 	 * should retire before us and remove themselves from our list.
@@ -185,7 +185,7 @@ i915_priotree_init(struct i915_priotree *pt)
 {
 	INIT_LIST_HEAD(&pt->signalers_list);
 	INIT_LIST_HEAD(&pt->waiters_list);
-	RB_CLEAR_NODE(&pt->node);
+	INIT_LIST_HEAD(&pt->link);
 	pt->priority = INT_MIN;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index feb81463abc9..6c58f7d87746 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -67,7 +67,7 @@ struct i915_dependency {
 struct i915_priotree {
 	struct list_head signalers_list; /* those before us, we depend upon */
 	struct list_head waiters_list; /* those after us, they depend upon us */
-	struct rb_node node;
+	struct list_head link;
 	int priority;
 #define I915_PRIORITY_MAX 1024
 #define I915_PRIORITY_DFL 0
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 62d3831a8a0d..f0d48bd508c9 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -674,32 +674,45 @@ static bool i915_guc_dequeue(struct intel_engine_cs *engine)
 
 	spin_lock_irq(&engine->timeline->lock);
 	rb = engine->execlist_first;
+	GEM_BUG_ON(rb_first(&engine->execlist_queue) != rb);
 	while (rb) {
-		struct drm_i915_gem_request *rq =
-			rb_entry(rb, typeof(*rq), priotree.node);
-
-		if (last && rq->ctx != last->ctx) {
-			if (port != engine->execlist_port)
-				break;
-
-			port_assign(port, last);
-			port++;
+		struct execlist_priolist *plist =
+			rb_entry(rb, typeof(*plist), node);
+		struct drm_i915_gem_request *rq, *rn;
+
+		list_for_each_entry_safe(rq, rn,
+					 &plist->requests, priotree.link) {
+			if (last && rq->ctx != last->ctx) {
+				if (port != engine->execlist_port) {
+					__list_del_many(&plist->requests,
+							&rq->priotree.link);
+					goto done;
+				}
+
+				port_assign(port, last);
+				port++;
+			}
+
+			INIT_LIST_HEAD(&rq->priotree.link);
+			rq->priotree.priority = INT_MAX;
+
+			i915_guc_submit(rq);
+			trace_i915_gem_request_in(rq, port - engine->execlist_port);
+			last = rq;
+			submit = true;
 		}
 
 		rb = rb_next(rb);
-		rb_erase(&rq->priotree.node, &engine->execlist_queue);
-		RB_CLEAR_NODE(&rq->priotree.node);
-		rq->priotree.priority = INT_MAX;
-
-		i915_guc_submit(rq);
-		trace_i915_gem_request_in(rq, port - engine->execlist_port);
-		last = rq;
-		submit = true;
-	}
-	if (submit) {
-		port_assign(port, last);
 		engine->execlist_first = rb;
+
+		rb_erase(&plist->node, &engine->execlist_queue);
+		INIT_LIST_HEAD(&plist->requests);
+		if (plist->priority != I915_PRIORITY_DFL)
+			kfree(plist);
 	}
+done:
+	if (submit)
+		port_assign(port, last);
 	spin_unlock_irq(&engine->timeline->lock);
 
 	return submit;
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index f0500c65726d..9c65f9ae2e97 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -99,4 +99,13 @@
 	__T;								\
 })
 
+#include <linux/list.h>
+
+static inline void __list_del_many(struct list_head *head,
+				   struct list_head *first)
+{
+	head->next = first;
+	first->prev = head;
+}
+
 #endif /* !__I915_UTILS_H */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9f7b6112c53a..fb0025627676 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -436,57 +436,79 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 	spin_lock_irq(&engine->timeline->lock);
 	rb = engine->execlist_first;
+	GEM_BUG_ON(rb_first(&engine->execlist_queue) != rb);
 	while (rb) {
-		struct drm_i915_gem_request *cursor =
-			rb_entry(rb, typeof(*cursor), priotree.node);
-
-		/* Can we combine this request with the current port? It has to
-		 * be the same context/ringbuffer and not have any exceptions
-		 * (e.g. GVT saying never to combine contexts).
-		 *
-		 * If we can combine the requests, we can execute both by
-		 * updating the RING_TAIL to point to the end of the second
-		 * request, and so we never need to tell the hardware about
-		 * the first.
-		 */
-		if (last && !can_merge_ctx(cursor->ctx, last->ctx)) {
-			/* If we are on the second port and cannot combine
-			 * this request with the last, then we are done.
-			 */
-			if (port != engine->execlist_port)
-				break;
-
-			/* If GVT overrides us we only ever submit port[0],
-			 * leaving port[1] empty. Note that we also have
-			 * to be careful that we don't queue the same
-			 * context (even though a different request) to
-			 * the second port.
+		struct execlist_priolist *plist =
+			rb_entry(rb, typeof(*plist), node);
+		struct drm_i915_gem_request *rq, *rn;
+
+		list_for_each_entry_safe(rq, rn,
+					 &plist->requests, priotree.link) {
+			/*
+			 * Can we combine this request with the current port?
+			 * It has to be the same context/ringbuffer and not
+			 * have any exceptions (e.g. GVT saying never to
+			 * combine contexts).
+			 *
+			 * If we can combine the requests, we can execute both
+			 * by updating the RING_TAIL to point to the end of the
+			 * second request, and so we never need to tell the
+			 * hardware about the first.
 			 */
-			if (ctx_single_port_submission(last->ctx) ||
-			    ctx_single_port_submission(cursor->ctx))
-				break;
+			if (last && !can_merge_ctx(rq->ctx, last->ctx)) {
+				/*
+				 * If we are on the second port and cannot
+				 * combine this request with the last, then we
+				 * are done.
+				 */
+				if (port != engine->execlist_port) {
+					__list_del_many(&plist->requests,
+							&rq->priotree.link);
+					goto done;
+				}
+
+				/*
+				 * If GVT overrides us we only ever submit
+				 * port[0], leaving port[1] empty. Note that we
+				 * also have to be careful that we don't queue
+				 * the same context (even though a different
+				 * request) to the second port.
+				 */
+				if (ctx_single_port_submission(last->ctx) ||
+				    ctx_single_port_submission(rq->ctx)) {
+					__list_del_many(&plist->requests,
+							&rq->priotree.link);
+					goto done;
+				}
+
+				GEM_BUG_ON(last->ctx == rq->ctx);
+
+				if (submit)
+					port_assign(port, last);
+				port++;
+			}
 
-			GEM_BUG_ON(last->ctx == cursor->ctx);
+			INIT_LIST_HEAD(&rq->priotree.link);
+			rq->priotree.priority = INT_MAX;
 
-			if (submit)
-				port_assign(port, last);
-			port++;
+			__i915_gem_request_submit(rq);
+			trace_i915_gem_request_in(rq,
+						  port - engine->execlist_port);
+			last = rq;
+			submit = true;
 		}
 
 		rb = rb_next(rb);
-		rb_erase(&cursor->priotree.node, &engine->execlist_queue);
-		RB_CLEAR_NODE(&cursor->priotree.node);
-		cursor->priotree.priority = INT_MAX;
+		engine->execlist_first = rb;
 
-		__i915_gem_request_submit(cursor);
-		trace_i915_gem_request_in(cursor, port - engine->execlist_port);
-		last = cursor;
-		submit = true;
+		rb_erase(&plist->node, &engine->execlist_queue);
+		INIT_LIST_HEAD(&plist->requests);
+		if (plist->priority != I915_PRIORITY_DFL)
+			kfree(plist);
 	}
-	if (submit) {
+done:
+	if (submit)
 		port_assign(port, last);
-		engine->execlist_first = rb;
-	}
 	spin_unlock_irq(&engine->timeline->lock);
 
 	if (submit)
@@ -610,28 +632,53 @@ static void intel_lrc_irq_handler(unsigned long data)
 	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);
 }
 
-static bool insert_request(struct i915_priotree *pt, struct rb_root *root)
+static bool
+insert_request(struct intel_engine_cs *engine,
+	       struct i915_priotree *pt,
+	       int prio)
 {
+	struct execlist_priolist *plist;
 	struct rb_node **p, *rb;
 	bool first = true;
 
+find_plist:
 	/* most positive priority is scheduled first, equal priorities fifo */
 	rb = NULL;
-	p = &root->rb_node;
+	p = &engine->execlist_queue.rb_node;
 	while (*p) {
-		struct i915_priotree *pos;
-
 		rb = *p;
-		pos = rb_entry(rb, typeof(*pos), node);
-		if (pt->priority > pos->priority) {
+		plist = rb_entry(rb, typeof(*plist), node);
+		if (prio > plist->priority) {
 			p = &rb->rb_left;
-		} else {
+		} else if (prio < plist->priority) {
 			p = &rb->rb_right;
 			first = false;
+		} else {
+			list_add_tail(&pt->link, &plist->requests);
+			return false;
 		}
 	}
-	rb_link_node(&pt->node, rb, p);
-	rb_insert_color(&pt->node, root);
+
+	if (prio == I915_PRIORITY_DFL) {
+		plist = &engine->default_priolist;
+	} else {
+		plist = kmalloc(sizeof(*plist), GFP_ATOMIC);
+		/* Convert an allocation failure to a priority bump */
+		if (unlikely(!plist)) {
+			prio = I915_PRIORITY_DFL; /* recurses just once */
+			goto find_plist;
+		}
+	}
+
+	plist->priority = prio;
+	rb_link_node(&plist->node, rb, p);
+	rb_insert_color(&plist->node, &engine->execlist_queue);
+
+	INIT_LIST_HEAD(&plist->requests);
+	list_add_tail(&pt->link, &plist->requests);
+
+	if (first)
+		engine->execlist_first = &plist->node;
 
 	return first;
 }
@@ -644,8 +691,9 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&engine->timeline->lock, flags);
 
-	if (insert_request(&request->priotree, &engine->execlist_queue)) {
-		engine->execlist_first = &request->priotree.node;
+	if (insert_request(engine,
+			   &request->priotree,
+			   request->priotree.priority)) {
 		if (execlists_elsp_ready(engine))
 			tasklet_hi_schedule(&engine->irq_tasklet);
 	}
@@ -734,10 +782,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
 			continue;
 
 		pt->priority = prio;
-		if (!RB_EMPTY_NODE(&pt->node)) {
-			rb_erase(&pt->node, &engine->execlist_queue);
-			if (insert_request(pt, &engine->execlist_queue))
-				engine->execlist_first = &pt->node;
+		if (!list_empty(&pt->link)) {
+			__list_del_entry(&pt->link);
+			insert_request(engine, pt, prio);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 68765d45116b..b8e01a9e2311 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -177,6 +177,12 @@ enum intel_engine_id {
 	VECS
 };
 
+struct execlist_priolist {
+	struct rb_node node;
+	struct list_head requests;
+	int priority;
+};
+
 #define INTEL_ENGINE_CS_MAX_NAME 8
 
 struct intel_engine_cs {
@@ -367,6 +373,7 @@ struct intel_engine_cs {
 
 	/* Execlists */
 	struct tasklet_struct irq_tasklet;
+	struct execlist_priolist default_priolist;
 	struct execlist_port {
 		struct drm_i915_gem_request *request_count;
 #define EXECLIST_COUNT_BITS 2
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/9] drm/i915/execlists: Reduce lock context between schedule/submit_request
  2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
                   ` (4 preceding siblings ...)
  2017-05-03 11:37 ` [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list Chris Wilson
@ 2017-05-03 11:37 ` Chris Wilson
  2017-05-05 12:13   ` Chris Wilson
  2017-05-05 13:30   ` Tvrtko Ursulin
  2017-05-03 11:37 ` [PATCH 8/9] drm/i915: Stop inlining the execlists IRQ handler Chris Wilson
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 33+ messages in thread
From: Chris Wilson @ 2017-05-03 11:37 UTC (permalink / raw)
  To: intel-gfx

If we do not require to perform priority bumping, and we haven't yet
submitted the request, we can update its priority in situ and skip
acquiring the engine locks -- thus avoiding any contention between us
and submit/execute.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fb0025627676..ca7f28795e2d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -767,6 +767,17 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
 		list_safe_reset_next(dep, p, dfs_link);
 	}
 
+	/* If we didn't need to bump any existing priorites, and we haven't
+	 * yet submitted this request (i..e there is no porential race with
+	 * execlists_submit_request()), we can set our own priority and skip
+	 * acquiring the engine locks.
+	 */
+	if (request->priotree.priority == INT_MIN) {
+		request->priotree.priority = prio;
+		if (stack.dfs_link.next == stack.dfs_link.prev)
+			return;
+	}
+
 	engine = request->engine;
 	spin_lock_irq(&engine->timeline->lock);
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/9] drm/i915: Stop inlining the execlists IRQ handler
  2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
                   ` (5 preceding siblings ...)
  2017-05-03 11:37 ` [PATCH 7/9] drm/i915/execlists: Reduce lock context between schedule/submit_request Chris Wilson
@ 2017-05-03 11:37 ` Chris Wilson
  2017-05-04 13:55   ` Mika Kuoppala
  2017-05-03 11:37 ` [PATCH 9/9] drm/i915: Don't force serialisation on marking up execlists irq posted Chris Wilson
  2017-05-03 12:25 ` ✓ Fi.CI.BAT: success for series starting with [1/9] drm/i915: Make ptr_unpack_bits() more function-like Patchwork
  8 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-03 11:37 UTC (permalink / raw)
  To: intel-gfx

As the handler is now quite complex, involving a few atomics, the cost
of the function preamble is negligible in comparison and so we should
leave the function out-of-line for better I$.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 86ede88daaab..8f60c8045b3e 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1353,7 +1353,7 @@ static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 		ivybridge_parity_error_irq_handler(dev_priv, gt_iir);
 }
 
-static __always_inline void
+static void
 gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift)
 {
 	bool tasklet = false;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 9/9] drm/i915: Don't force serialisation on marking up execlists irq posted
  2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
                   ` (6 preceding siblings ...)
  2017-05-03 11:37 ` [PATCH 8/9] drm/i915: Stop inlining the execlists IRQ handler Chris Wilson
@ 2017-05-03 11:37 ` Chris Wilson
  2017-05-05 13:34   ` Tvrtko Ursulin
  2017-05-03 12:25 ` ✓ Fi.CI.BAT: success for series starting with [1/9] drm/i915: Make ptr_unpack_bits() more function-like Patchwork
  8 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-03 11:37 UTC (permalink / raw)
  To: intel-gfx

Since we coordinate with the execlists tasklet using a locked schedule
operation that ensures that after we set the engine->irq_posted we
always have an invocation of the tasklet, we do not need to use a locked
operation to set the engine->irq_posted itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 8f60c8045b3e..951a7705949e 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1360,7 +1360,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift)
 
 	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift)) {
 		if (port_count(&engine->execlist_port[0])) {
-			set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+			__set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 			tasklet = true;
 		}
 	}
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/9] drm/i915: Make ptr_unpack_bits() more function-like
  2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
                   ` (7 preceding siblings ...)
  2017-05-03 11:37 ` [PATCH 9/9] drm/i915: Don't force serialisation on marking up execlists irq posted Chris Wilson
@ 2017-05-03 12:25 ` Patchwork
  8 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2017-05-03 12:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/9] drm/i915: Make ptr_unpack_bits() more function-like
URL   : https://patchwork.freedesktop.org/series/23881/
State : success

== Summary ==

Series 23881v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/23881/revisions/1/mbox/

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time:433s
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time:427s
fi-bsw-n3050     total:278  pass:242  dwarn:0   dfail:0   fail:0   skip:36  time:576s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:517s
fi-bxt-t5700     total:278  pass:258  dwarn:0   dfail:0   fail:0   skip:20  time:542s
fi-byt-j1900     total:278  pass:254  dwarn:0   dfail:0   fail:0   skip:24  time:489s
fi-byt-n2820     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:483s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:413s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:406s
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time:419s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:490s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:464s
fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:455s
fi-kbl-7560u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:620s
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:456s
fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time:567s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time:459s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:495s
fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time:428s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:526s
fi-snb-2600      total:278  pass:249  dwarn:0   dfail:0   fail:0   skip:29  time:408s

26ef9256ab34901a953bd5338a3c94388c745c99 drm-tip: 2017y-05m-03d-10h-11m-12s UTC integration manifest
b50f7cd drm/i915: Don't force serialisation on marking up execlists irq posted
f9caf94 drm/i915: Stop inlining the execlists IRQ handler
1649139 drm/i915/execlists: Reduce lock context between schedule/submit_request
b1b3f8f drm/i915: Split execlist priority queue into rbtree + linked list
e46c698 drm/i915: Use a define for the default priority [0]
e8ad67c drm/i915: Don't mark an execlists context-switch when idle
014e2fd drm/i915/execlists: Pack the count into the low bits of the port.request
0fafe36 drm/i915: Redefine ptr_pack_bits() and friends
2274910 drm/i915: Make ptr_unpack_bits() more function-like

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4610/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/9] drm/i915: Use a define for the default priority [0]
  2017-05-03 11:37 ` [PATCH 5/9] drm/i915: Use a define for the default priority [0] Chris Wilson
@ 2017-05-04 13:32   ` Joonas Lahtinen
  2017-05-04 13:58     ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Joonas Lahtinen @ 2017-05-04 13:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2017-05-03 at 12:37 +0100, Chris Wilson wrote:
> Explicitly assign the default priority, and give it a name (macro).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

<SNIP>

>  	kref_init(&ctx->ref);
>  	list_add_tail(&ctx->link, &dev_priv->context_list);
>  	ctx->i915 = dev_priv;
> +	ctx->priority = I915_PRIORITY_DFL;

I915_PRIORITY_DEFAULT would work better.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 8/9] drm/i915: Stop inlining the execlists IRQ handler
  2017-05-03 11:37 ` [PATCH 8/9] drm/i915: Stop inlining the execlists IRQ handler Chris Wilson
@ 2017-05-04 13:55   ` Mika Kuoppala
  0 siblings, 0 replies; 33+ messages in thread
From: Mika Kuoppala @ 2017-05-04 13:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> As the handler is now quite complex, involving a few atomics, the cost
> of the function preamble is negligible in comparison and so we should
> leave the function out-of-line for better I$.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_irq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 86ede88daaab..8f60c8045b3e 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1353,7 +1353,7 @@ static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
>  		ivybridge_parity_error_irq_handler(dev_priv, gt_iir);
>  }
>  
> -static __always_inline void
> +static void
>  gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift)
>  {
>  	bool tasklet = false;
> -- 
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/9] drm/i915: Use a define for the default priority [0]
  2017-05-04 13:32   ` Joonas Lahtinen
@ 2017-05-04 13:58     ` Chris Wilson
  2017-05-05  8:31       ` Mika Kuoppala
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-04 13:58 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, May 04, 2017 at 04:32:34PM +0300, Joonas Lahtinen wrote:
> On ke, 2017-05-03 at 12:37 +0100, Chris Wilson wrote:
> > Explicitly assign the default priority, and give it a name (macro).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> <SNIP>
> 
> >  	kref_init(&ctx->ref);
> >  	list_add_tail(&ctx->link, &dev_priv->context_list);
> >  	ctx->i915 = dev_priv;
> > +	ctx->priority = I915_PRIORITY_DFL;
> 
> I915_PRIORITY_DEFAULT would work better.

On the one hand I have the symmetry with MIN, DFL, MAX, on the other
hand DFL is plain bizarre.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/9] drm/i915: Use a define for the default priority [0]
  2017-05-04 13:58     ` Chris Wilson
@ 2017-05-05  8:31       ` Mika Kuoppala
  2017-05-05  9:13         ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Mika Kuoppala @ 2017-05-05  8:31 UTC (permalink / raw)
  To: Chris Wilson, Joonas Lahtinen; +Cc: intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Thu, May 04, 2017 at 04:32:34PM +0300, Joonas Lahtinen wrote:
>> On ke, 2017-05-03 at 12:37 +0100, Chris Wilson wrote:
>> > Explicitly assign the default priority, and give it a name (macro).
>> > 
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> 
>> <SNIP>
>> 
>> >  	kref_init(&ctx->ref);
>> >  	list_add_tail(&ctx->link, &dev_priv->context_list);
>> >  	ctx->i915 = dev_priv;
>> > +	ctx->priority = I915_PRIORITY_DFL;
>> 
>> I915_PRIORITY_DEFAULT would work better.
>
> On the one hand I have the symmetry with MIN, DFL, MAX, on the other
> hand DFL is plain bizarre.

DEF?

[1]: http://www.abbreviations.com/abbreviation/default

-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/9] drm/i915: Use a define for the default priority [0]
  2017-05-05  8:31       ` Mika Kuoppala
@ 2017-05-05  9:13         ` Chris Wilson
  2017-05-05  9:21           ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-05  9:13 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Fri, May 05, 2017 at 11:31:14AM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Thu, May 04, 2017 at 04:32:34PM +0300, Joonas Lahtinen wrote:
> >> On ke, 2017-05-03 at 12:37 +0100, Chris Wilson wrote:
> >> > Explicitly assign the default priority, and give it a name (macro).
> >> > 
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> 
> >> <SNIP>
> >> 
> >> >  	kref_init(&ctx->ref);
> >> >  	list_add_tail(&ctx->link, &dev_priv->context_list);
> >> >  	ctx->i915 = dev_priv;
> >> > +	ctx->priority = I915_PRIORITY_DFL;
> >> 
> >> I915_PRIORITY_DEFAULT would work better.
> >
> > On the one hand I have the symmetry with MIN, DFL, MAX, on the other
> > hand DFL is plain bizarre.
> 
> DEF?

I915_PRIORITY_DEFEAT. I'm perfectly happy just to 0, pesky Tvrtko.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/9] drm/i915: Use a define for the default priority [0]
  2017-05-05  9:13         ` Chris Wilson
@ 2017-05-05  9:21           ` Tvrtko Ursulin
  2017-05-05  9:50             ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05  9:21 UTC (permalink / raw)
  To: Chris Wilson, Mika Kuoppala, Joonas Lahtinen, intel-gfx


On 05/05/2017 10:13, Chris Wilson wrote:
> On Fri, May 05, 2017 at 11:31:14AM +0300, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>>> On Thu, May 04, 2017 at 04:32:34PM +0300, Joonas Lahtinen wrote:
>>>> On ke, 2017-05-03 at 12:37 +0100, Chris Wilson wrote:
>>>>> Explicitly assign the default priority, and give it a name (macro).
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> <SNIP>
>>>>
>>>>>  	kref_init(&ctx->ref);
>>>>>  	list_add_tail(&ctx->link, &dev_priv->context_list);
>>>>>  	ctx->i915 = dev_priv;
>>>>> +	ctx->priority = I915_PRIORITY_DFL;
>>>>
>>>> I915_PRIORITY_DEFAULT would work better.
>>>
>>> On the one hand I have the symmetry with MIN, DFL, MAX, on the other
>>> hand DFL is plain bizarre.
>>
>> DEF?
>
> I915_PRIORITY_DEFEAT. I'm perfectly happy just to 0, pesky Tvrtko.

Will to argue deflated. :) I suggested it for benefit in one of the 
later patches which explicitly compared against zero. if < 0 && 
!cap_sys_admin or something.. I thought being explicit what zero means 
there would be a good thing.

DEFAULT or DEF both sounds good to me. Or NORMAL. DFL is not entirely 
new (SIG_DFL) but it does look very weird.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/9] drm/i915: Use a define for the default priority [0]
  2017-05-05  9:21           ` Tvrtko Ursulin
@ 2017-05-05  9:50             ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2017-05-05  9:50 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 05, 2017 at 10:21:32AM +0100, Tvrtko Ursulin wrote:
> 
> On 05/05/2017 10:13, Chris Wilson wrote:
> >On Fri, May 05, 2017 at 11:31:14AM +0300, Mika Kuoppala wrote:
> >>Chris Wilson <chris@chris-wilson.co.uk> writes:
> >>
> >>>On Thu, May 04, 2017 at 04:32:34PM +0300, Joonas Lahtinen wrote:
> >>>>On ke, 2017-05-03 at 12:37 +0100, Chris Wilson wrote:
> >>>>>Explicitly assign the default priority, and give it a name (macro).
> >>>>>
> >>>>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>
> >>>><SNIP>
> >>>>
> >>>>> 	kref_init(&ctx->ref);
> >>>>> 	list_add_tail(&ctx->link, &dev_priv->context_list);
> >>>>> 	ctx->i915 = dev_priv;
> >>>>>+	ctx->priority = I915_PRIORITY_DFL;
> >>>>
> >>>>I915_PRIORITY_DEFAULT would work better.
> >>>
> >>>On the one hand I have the symmetry with MIN, DFL, MAX, on the other
> >>>hand DFL is plain bizarre.
> >>
> >>DEF?
> >
> >I915_PRIORITY_DEFEAT. I'm perfectly happy just to 0, pesky Tvrtko.
> 
> Will to argue deflated. :) I suggested it for benefit in one of the
> later patches which explicitly compared against zero. if < 0 &&
> !cap_sys_admin or something.. I thought being explicit what zero
> means there would be a good thing.
> 
> DEFAULT or DEF both sounds good to me. Or NORMAL. DFL is not
> entirely new (SIG_DFL) but it does look very weird.

I like I915_PRIORITY_NORMAL.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/9] drm/i915/execlists: Pack the count into the low bits of the port.request
  2017-05-03 11:37 ` [PATCH 3/9] drm/i915/execlists: Pack the count into the low bits of the port.request Chris Wilson
@ 2017-05-05 10:49   ` Tvrtko Ursulin
  2017-05-05 11:16     ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05 10:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Mika Kuoppala


On 03/05/2017 12:37, Chris Wilson wrote:
> add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
> function                                     old     new   delta
> execlists_submit_ports                       262     471    +209
> port_assign.isra                               -     136    +136
> capture                                     6344    6359     +15
> reset_common_ring                            438     452     +14
> execlists_submit_request                     228     238     +10
> gen8_init_common_ring                        334     341      +7
> intel_engine_is_idle                         106     105      -1
> i915_engine_info                            2314    2290     -24
> __i915_gem_set_wedged_BKL                    485     411     -74
> intel_lrc_irq_handler                       1789    1604    -185
> execlists_update_context                     294       -    -294
>
> The most important change there is the improve to the
> intel_lrc_irq_handler and excclist_submit_ports (net improvement since
> execlists_update_context is now inlined).
>
> v2: Use the port_api() for guc as well (even though currently we do not
> pack any counters in there, yet) and hide all port->request_count inside
> the helpers.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c        |  32 ++++----
>  drivers/gpu/drm/i915/i915_gem.c            |   6 +-
>  drivers/gpu/drm/i915/i915_gpu_error.c      |  13 +++-
>  drivers/gpu/drm/i915/i915_guc_submission.c |  32 +++++---
>  drivers/gpu/drm/i915/intel_engine_cs.c     |   2 +-
>  drivers/gpu/drm/i915/intel_lrc.c           | 117 ++++++++++++++++-------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |  10 ++-
>  7 files changed, 122 insertions(+), 90 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 870c470177b5..0b5d7142d8d9 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3315,6 +3315,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  		if (i915.enable_execlists) {
>  			u32 ptr, read, write;
>  			struct rb_node *rb;
> +			unsigned int idx;
>
>  			seq_printf(m, "\tExeclist status: 0x%08x %08x\n",
>  				   I915_READ(RING_EXECLIST_STATUS_LO(engine)),
> @@ -3332,8 +3333,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  			if (read > write)
>  				write += GEN8_CSB_ENTRIES;
>  			while (read < write) {
> -				unsigned int idx = ++read % GEN8_CSB_ENTRIES;
> -
> +				idx = ++read % GEN8_CSB_ENTRIES;
>  				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
>  					   idx,
>  					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
> @@ -3341,21 +3341,19 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  			}
>
>  			rcu_read_lock();
> -			rq = READ_ONCE(engine->execlist_port[0].request);
> -			if (rq) {
> -				seq_printf(m, "\t\tELSP[0] count=%d, ",
> -					   engine->execlist_port[0].count);
> -				print_request(m, rq, "rq: ");
> -			} else {
> -				seq_printf(m, "\t\tELSP[0] idle\n");
> -			}
> -			rq = READ_ONCE(engine->execlist_port[1].request);
> -			if (rq) {
> -				seq_printf(m, "\t\tELSP[1] count=%d, ",
> -					   engine->execlist_port[1].count);
> -				print_request(m, rq, "rq: ");
> -			} else {
> -				seq_printf(m, "\t\tELSP[1] idle\n");
> +			for (idx = 0; idx < ARRAY_SIZE(engine->execlist_port); idx++) {
> +				unsigned int count;
> +
> +				rq = port_unpack(&engine->execlist_port[idx],
> +						 &count);
> +				if (rq) {
> +					seq_printf(m, "\t\tELSP[%d] count=%d, ",
> +						   idx, count);
> +					print_request(m, rq, "rq: ");
> +				} else {
> +					seq_printf(m, "\t\tELSP[%d] idle\n",
> +						   idx);
> +				}
>  			}
>  			rcu_read_unlock();
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f79079f2e265..c1df4b9d2661 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3038,12 +3038,14 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>  	 */
>
>  	if (i915.enable_execlists) {
> +		struct execlist_port *port = engine->execlist_port;
>  		unsigned long flags;
> +		unsigned int n;
>
>  		spin_lock_irqsave(&engine->timeline->lock, flags);
>
> -		i915_gem_request_put(engine->execlist_port[0].request);
> -		i915_gem_request_put(engine->execlist_port[1].request);
> +		for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++)
> +			i915_gem_request_put(port_request(&port[n]));
>  		memset(engine->execlist_port, 0, sizeof(engine->execlist_port));
>  		engine->execlist_queue = RB_ROOT;
>  		engine->execlist_first = NULL;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index ec526d92f908..e18f350bc364 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1324,12 +1324,17 @@ static void engine_record_requests(struct intel_engine_cs *engine,
>  static void error_record_engine_execlists(struct intel_engine_cs *engine,
>  					  struct drm_i915_error_engine *ee)
>  {
> +	const struct execlist_port *port = engine->execlist_port;
>  	unsigned int n;
>
> -	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++)
> -		if (engine->execlist_port[n].request)
> -			record_request(engine->execlist_port[n].request,
> -				       &ee->execlist[n]);
> +	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
> +		struct drm_i915_gem_request *rq = port_request(&port[n]);
> +
> +		if (!rq)
> +			break;
> +
> +		record_request(rq, &ee->execlist[n]);
> +	}
>  }
>
>  static void record_context(struct drm_i915_error_context *e,
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 7e85b5ab8ae2..62d3831a8a0d 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -653,10 +653,22 @@ static void nested_enable_signaling(struct drm_i915_gem_request *rq)
>  	spin_unlock(&rq->lock);
>  }
>
> +static void port_assign(struct execlist_port *port,
> +			struct drm_i915_gem_request *rq)
> +{
> +	GEM_BUG_ON(rq == port_request(port));
> +
> +	if (port_isset(port))
> +		i915_gem_request_put(port_request(port));
> +
> +	port_set(port, i915_gem_request_get(rq));
> +	nested_enable_signaling(rq);
> +}
> +
>  static bool i915_guc_dequeue(struct intel_engine_cs *engine)
>  {
>  	struct execlist_port *port = engine->execlist_port;
> -	struct drm_i915_gem_request *last = port[0].request;
> +	struct drm_i915_gem_request *last = port_request(&port[0]);
>  	struct rb_node *rb;
>  	bool submit = false;
>
> @@ -670,8 +682,7 @@ static bool i915_guc_dequeue(struct intel_engine_cs *engine)
>  			if (port != engine->execlist_port)
>  				break;
>
> -			i915_gem_request_assign(&port->request, last);
> -			nested_enable_signaling(last);
> +			port_assign(port, last);
>  			port++;
>  		}
>
> @@ -686,8 +697,7 @@ static bool i915_guc_dequeue(struct intel_engine_cs *engine)
>  		submit = true;
>  	}
>  	if (submit) {
> -		i915_gem_request_assign(&port->request, last);
> -		nested_enable_signaling(last);
> +		port_assign(port, last);
>  		engine->execlist_first = rb;
>  	}
>  	spin_unlock_irq(&engine->timeline->lock);
> @@ -703,17 +713,19 @@ static void i915_guc_irq_handler(unsigned long data)
>  	bool submit;
>
>  	do {
> -		rq = port[0].request;
> +		rq = port_request(&port[0]);
>  		while (rq && i915_gem_request_completed(rq)) {
>  			trace_i915_gem_request_out(rq);
>  			i915_gem_request_put(rq);
> -			port[0].request = port[1].request;
> -			port[1].request = NULL;
> -			rq = port[0].request;
> +
> +			port[0] = port[1];
> +			memset(&port[1], 0, sizeof(port[1]));
> +
> +			rq = port_request(&port[0]);
>  		}
>
>  		submit = false;
> -		if (!port[1].request)
> +		if (!port_count(&port[1]))
>  			submit = i915_guc_dequeue(engine);
>  	} while (submit);
>  }
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 6d3d83876da9..24659788e7a3 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1230,7 +1230,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
>  		return false;
>
>  	/* Both ports drained, no more ELSP submission? */
> -	if (engine->execlist_port[0].request)
> +	if (port_request(&engine->execlist_port[0]))
>  		return false;
>
>  	/* Ring stopped? */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0909549ad320..9f7b6112c53a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -337,39 +337,32 @@ static u64 execlists_update_context(struct drm_i915_gem_request *rq)
>
>  static void execlists_submit_ports(struct intel_engine_cs *engine)
>  {
> -	struct drm_i915_private *dev_priv = engine->i915;
>  	struct execlist_port *port = engine->execlist_port;
>  	u32 __iomem *elsp =
> -		dev_priv->regs + i915_mmio_reg_offset(RING_ELSP(engine));
> -	u64 desc[2];
> -
> -	GEM_BUG_ON(port[0].count > 1);
> -	if (!port[0].count)
> -		execlists_context_status_change(port[0].request,
> -						INTEL_CONTEXT_SCHEDULE_IN);
> -	desc[0] = execlists_update_context(port[0].request);
> -	GEM_DEBUG_EXEC(port[0].context_id = upper_32_bits(desc[0]));
> -	port[0].count++;
> -
> -	if (port[1].request) {
> -		GEM_BUG_ON(port[1].count);
> -		execlists_context_status_change(port[1].request,
> -						INTEL_CONTEXT_SCHEDULE_IN);
> -		desc[1] = execlists_update_context(port[1].request);
> -		GEM_DEBUG_EXEC(port[1].context_id = upper_32_bits(desc[1]));
> -		port[1].count = 1;
> -	} else {
> -		desc[1] = 0;
> -	}
> -	GEM_BUG_ON(desc[0] == desc[1]);
> +		engine->i915->regs + i915_mmio_reg_offset(RING_ELSP(engine));
> +	unsigned int n;
>
> -	/* You must always write both descriptors in the order below. */
> -	writel(upper_32_bits(desc[1]), elsp);
> -	writel(lower_32_bits(desc[1]), elsp);
> +	for (n = ARRAY_SIZE(engine->execlist_port); n--; ) {
> +		struct drm_i915_gem_request *rq;
> +		unsigned int count;
> +		u64 desc;
> +
> +		rq = port_unpack(&port[n], &count);
> +		if (rq) {
> +			GEM_BUG_ON(count > !n);
> +			if (!count++)
> +				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
> +			port_set(&port[n], port_pack(rq, count));
> +			desc = execlists_update_context(rq);
> +			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
> +		} else {
> +			GEM_BUG_ON(!n);
> +			desc = 0;
> +		}
>
> -	writel(upper_32_bits(desc[0]), elsp);
> -	/* The context is automatically loaded after the following */
> -	writel(lower_32_bits(desc[0]), elsp);
> +		writel(upper_32_bits(desc), elsp);
> +		writel(lower_32_bits(desc), elsp);
> +	}
>  }
>
>  static bool ctx_single_port_submission(const struct i915_gem_context *ctx)
> @@ -390,6 +383,17 @@ static bool can_merge_ctx(const struct i915_gem_context *prev,
>  	return true;
>  }
>
> +static void port_assign(struct execlist_port *port,
> +			struct drm_i915_gem_request *rq)
> +{
> +	GEM_BUG_ON(rq == port_request(port));
> +
> +	if (port_isset(port))
> +		i915_gem_request_put(port_request(port));
> +
> +	port_set(port, port_pack(i915_gem_request_get(rq), port_count(port)));
> +}

Reason for not having one implementation of port_assign with 
enable_nested_signalling outside it in the guc case?

> +
>  static void execlists_dequeue(struct intel_engine_cs *engine)
>  {
>  	struct drm_i915_gem_request *last;
> @@ -397,7 +401,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  	struct rb_node *rb;
>  	bool submit = false;
>
> -	last = port->request;
> +	last = port_request(port);
>  	if (last)
>  		/* WaIdleLiteRestore:bdw,skl
>  		 * Apply the wa NOOPs to prevent ring:HEAD == req:TAIL
> @@ -407,7 +411,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  		 */
>  		last->tail = last->wa_tail;
>
> -	GEM_BUG_ON(port[1].request);
> +	GEM_BUG_ON(port_isset(&port[1]));
>
>  	/* Hardware submission is through 2 ports. Conceptually each port
>  	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
> @@ -464,7 +468,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>
>  			GEM_BUG_ON(last->ctx == cursor->ctx);
>
> -			i915_gem_request_assign(&port->request, last);
> +			if (submit)
> +				port_assign(port, last);

Optimisation? GEM_BUG_ON(last != port_request(port))?

>  			port++;
>  		}
>
> @@ -479,7 +484,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  		submit = true;
>  	}
>  	if (submit) {
> -		i915_gem_request_assign(&port->request, last);
> +		port_assign(port, last);
>  		engine->execlist_first = rb;
>  	}
>  	spin_unlock_irq(&engine->timeline->lock);
> @@ -488,16 +493,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  		execlists_submit_ports(engine);
>  }
>
> -static bool execlists_elsp_idle(struct intel_engine_cs *engine)
> -{
> -	return !engine->execlist_port[0].request;
> -}
> -
>  static bool execlists_elsp_ready(const struct intel_engine_cs *engine)
>  {
>  	const struct execlist_port *port = engine->execlist_port;
>
> -	return port[0].count + port[1].count < 2;
> +	return port_count(&port[0]) + port_count(&port[1]) < 2;
>  }
>
>  /*
> @@ -547,7 +547,9 @@ static void intel_lrc_irq_handler(unsigned long data)
>  		tail = GEN8_CSB_WRITE_PTR(head);
>  		head = GEN8_CSB_READ_PTR(head);
>  		while (head != tail) {
> +			struct drm_i915_gem_request *rq;
>  			unsigned int status;
> +			unsigned int count;
>
>  			if (++head == GEN8_CSB_ENTRIES)
>  				head = 0;
> @@ -577,20 +579,24 @@ static void intel_lrc_irq_handler(unsigned long data)
>  			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
>  					 port[0].context_id);
>
> -			GEM_BUG_ON(port[0].count == 0);
> -			if (--port[0].count == 0) {
> +			rq = port_unpack(&port[0], &count);
> +			GEM_BUG_ON(count == 0);
> +			if (--count == 0) {

If you changed this to be:

count--;
port_set(...);
if (count > 0)
	continue;

It would be perhaps a bit more readable, but a potentially useless 
port_set on the other hand. Not sure.

>  				GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
> -				GEM_BUG_ON(!i915_gem_request_completed(port[0].request));
> -				execlists_context_status_change(port[0].request,
> -								INTEL_CONTEXT_SCHEDULE_OUT);
> +				GEM_BUG_ON(!i915_gem_request_completed(rq));
> +				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
> +
> +				trace_i915_gem_request_out(rq);
> +				i915_gem_request_put(rq);
>
> -				trace_i915_gem_request_out(port[0].request);
> -				i915_gem_request_put(port[0].request);
>  				port[0] = port[1];
>  				memset(&port[1], 0, sizeof(port[1]));
> +			} else {
> +				port_set(&port[0], port_pack(rq, count));
>  			}
>
> -			GEM_BUG_ON(port[0].count == 0 &&
> +			/* After the final element, the hw should be idle */
> +			GEM_BUG_ON(port_count(&port[0]) == 0 &&
>  				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
>  		}
>
> @@ -1148,6 +1154,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
>  	struct drm_i915_private *dev_priv = engine->i915;
>  	struct execlist_port *port = engine->execlist_port;
>  	unsigned int n;
> +	bool submit;
>  	int ret;
>
>  	ret = intel_mocs_init_engine(engine);
> @@ -1169,19 +1176,21 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
>  	/* After a GPU reset, we may have requests to replay */
>  	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>
> +	submit = false;
>  	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
> -		if (!port[n].request)
> +		if (!port_isset(&port[n]))
>  			break;
>
>  		DRM_DEBUG_DRIVER("Restarting %s:%d from 0x%x\n",
>  				 engine->name, n,
> -				 port[n].request->global_seqno);
> +				 port_request(&port[n])->global_seqno);
>
>  		/* Discard the current inflight count */
> -		port[n].count = 0;
> +		port_set(&port[n], port_request(&port[n]));
> +		submit = true;
>  	}
>
> -	if (!i915.enable_guc_submission && !execlists_elsp_idle(engine))
> +	if (submit && !i915.enable_guc_submission)
>  		execlists_submit_ports(engine);
>
>  	return 0;
> @@ -1259,13 +1268,13 @@ static void reset_common_ring(struct intel_engine_cs *engine,
>  	intel_ring_update_space(request->ring);
>
>  	/* Catch up with any missed context-switch interrupts */
> -	if (request->ctx != port[0].request->ctx) {
> -		i915_gem_request_put(port[0].request);
> +	if (request->ctx != port_request(&port[0])->ctx) {
> +		i915_gem_request_put(port_request(&port[0]));
>  		port[0] = port[1];
>  		memset(&port[1], 0, sizeof(port[1]));
>  	}
>
> -	GEM_BUG_ON(request->ctx != port[0].request->ctx);
> +	GEM_BUG_ON(request->ctx != port_request(&port[0])->ctx);
>
>  	/* Reset WaIdleLiteRestore:bdw,skl as well */
>  	request->tail =
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 4a599e3480a9..68765d45116b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -368,8 +368,14 @@ struct intel_engine_cs {
>  	/* Execlists */
>  	struct tasklet_struct irq_tasklet;
>  	struct execlist_port {
> -		struct drm_i915_gem_request *request;
> -		unsigned int count;
> +		struct drm_i915_gem_request *request_count;
> +#define EXECLIST_COUNT_BITS 2
> +#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
> +#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
> +#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
> +#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
> +#define port_set(p, packed) ((p)->request_count = (packed))
> +#define port_isset(p) ((p)->request_count)
>  		GEM_DEBUG_DECL(u32 context_id);
>  	} execlist_port[2];
>  	struct rb_root execlist_queue;
>

Looks correct but I am still having trouble accepting the structure 
shrink and code savings are worth having less readable code.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/9] drm/i915/execlists: Pack the count into the low bits of the port.request
  2017-05-05 10:49   ` Tvrtko Ursulin
@ 2017-05-05 11:16     ` Chris Wilson
  2017-05-05 11:58       ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-05 11:16 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, Mika Kuoppala

On Fri, May 05, 2017 at 11:49:21AM +0100, Tvrtko Ursulin wrote:
> 
> On 03/05/2017 12:37, Chris Wilson wrote:
> >+static void port_assign(struct execlist_port *port,
> >+			struct drm_i915_gem_request *rq)
> >+{
> >+	GEM_BUG_ON(rq == port_request(port));
> >+
> >+	if (port_isset(port))
> >+		i915_gem_request_put(port_request(port));
> >+
> >+	port_set(port, port_pack(i915_gem_request_get(rq), port_count(port)));
> >+}
> 
> Reason for not having one implementation of port_assign with
> enable_nested_signalling outside it in the guc case?

The handling of port.count is a big difference between guc and
ordinary execlists. For execlists we count contexts, for guc we count
requests.

> > static void execlists_dequeue(struct intel_engine_cs *engine)
> > {
> > 	struct drm_i915_gem_request *last;
> >@@ -397,7 +401,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > 	struct rb_node *rb;
> > 	bool submit = false;
> >
> >-	last = port->request;
> >+	last = port_request(port);
> > 	if (last)
> > 		/* WaIdleLiteRestore:bdw,skl
> > 		 * Apply the wa NOOPs to prevent ring:HEAD == req:TAIL
> >@@ -407,7 +411,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > 		 */
> > 		last->tail = last->wa_tail;
> >
> >-	GEM_BUG_ON(port[1].request);
> >+	GEM_BUG_ON(port_isset(&port[1]));
> >
> > 	/* Hardware submission is through 2 ports. Conceptually each port
> > 	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
> >@@ -464,7 +468,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >
> > 			GEM_BUG_ON(last->ctx == cursor->ctx);
> >
> >-			i915_gem_request_assign(&port->request, last);
> >+			if (submit)
> >+				port_assign(port, last);
> 
> Optimisation? GEM_BUG_ON(last != port_request(port))?

Kind of, it is so both paths look identical and yes so we do meet the
criteria that last != port_request(port) (because it is silly to the do
the request_count dance if the goal is not to change request_count).

> >@@ -479,7 +484,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > 		submit = true;
> > 	}
> > 	if (submit) {
> >-		i915_gem_request_assign(&port->request, last);
> >+		port_assign(port, last);
> > 		engine->execlist_first = rb;
> > 	}
> > 	spin_unlock_irq(&engine->timeline->lock);
> >@@ -488,16 +493,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > 		execlists_submit_ports(engine);
> > }
> >
> >-static bool execlists_elsp_idle(struct intel_engine_cs *engine)
> >-{
> >-	return !engine->execlist_port[0].request;
> >-}
> >-
> > static bool execlists_elsp_ready(const struct intel_engine_cs *engine)
> > {
> > 	const struct execlist_port *port = engine->execlist_port;
> >
> >-	return port[0].count + port[1].count < 2;
> >+	return port_count(&port[0]) + port_count(&port[1]) < 2;
> > }
> >
> > /*
> >@@ -547,7 +547,9 @@ static void intel_lrc_irq_handler(unsigned long data)
> > 		tail = GEN8_CSB_WRITE_PTR(head);
> > 		head = GEN8_CSB_READ_PTR(head);
> > 		while (head != tail) {
> >+			struct drm_i915_gem_request *rq;
> > 			unsigned int status;
> >+			unsigned int count;
> >
> > 			if (++head == GEN8_CSB_ENTRIES)
> > 				head = 0;
> >@@ -577,20 +579,24 @@ static void intel_lrc_irq_handler(unsigned long data)
> > 			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
> > 					 port[0].context_id);
> >
> >-			GEM_BUG_ON(port[0].count == 0);
> >-			if (--port[0].count == 0) {
> >+			rq = port_unpack(&port[0], &count);
> >+			GEM_BUG_ON(count == 0);
> >+			if (--count == 0) {
> 
> If you changed this to be:
> 
> count--;
> port_set(...);
> if (count > 0)
> 	continue;
> 
> It would be perhaps a bit more readable, but a potentially useless
> port_set on the other hand. Not sure.

We expect to overwrite port in the common path, or at least that's my
expectation. Decrementing count for lite-restore should be the
exception. Part of the intention is to do the minimal amount of work
(especially avoiding the appearance of setting port.count = 0
prematurely) and pass the later port.count == 0 assert.

> > 				GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
> >-				GEM_BUG_ON(!i915_gem_request_completed(port[0].request));
> >-				execlists_context_status_change(port[0].request,
> >-								INTEL_CONTEXT_SCHEDULE_OUT);
> >+				GEM_BUG_ON(!i915_gem_request_completed(rq));
> >+				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
> >+
> >+				trace_i915_gem_request_out(rq);
> >+				i915_gem_request_put(rq);
> >
> >-				trace_i915_gem_request_out(port[0].request);
> >-				i915_gem_request_put(port[0].request);
> > 				port[0] = port[1];
> > 				memset(&port[1], 0, sizeof(port[1]));
> >+			} else {
> >+				port_set(&port[0], port_pack(rq, count));
> > 			}
> >
> >-			GEM_BUG_ON(port[0].count == 0 &&
> >+			/* After the final element, the hw should be idle */
> >+			GEM_BUG_ON(port_count(&port[0]) == 0 &&
> > 				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
> > 		}

> >diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> >index 4a599e3480a9..68765d45116b 100644
> >--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> >+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> >@@ -368,8 +368,14 @@ struct intel_engine_cs {
> > 	/* Execlists */
> > 	struct tasklet_struct irq_tasklet;
> > 	struct execlist_port {
> >-		struct drm_i915_gem_request *request;
> >-		unsigned int count;
> >+		struct drm_i915_gem_request *request_count;
> >+#define EXECLIST_COUNT_BITS 2
> >+#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
> >+#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
> >+#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
> >+#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
> >+#define port_set(p, packed) ((p)->request_count = (packed))
> >+#define port_isset(p) ((p)->request_count)
> > 		GEM_DEBUG_DECL(u32 context_id);
> > 	} execlist_port[2];
> > 	struct rb_root execlist_queue;
> >
> 
> Looks correct but I am still having trouble accepting the structure
> shrink and code savings are worth having less readable code.

Excluding port_unpack, I think it's a reasonable tidy.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/9] drm/i915/execlists: Pack the count into the low bits of the port.request
  2017-05-05 11:16     ` Chris Wilson
@ 2017-05-05 11:58       ` Tvrtko Ursulin
  0 siblings, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05 11:58 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Mika Kuoppala


On 05/05/2017 12:16, Chris Wilson wrote:
> On Fri, May 05, 2017 at 11:49:21AM +0100, Tvrtko Ursulin wrote:
>>
>> On 03/05/2017 12:37, Chris Wilson wrote:
>>> +static void port_assign(struct execlist_port *port,
>>> +			struct drm_i915_gem_request *rq)
>>> +{
>>> +	GEM_BUG_ON(rq == port_request(port));
>>> +
>>> +	if (port_isset(port))
>>> +		i915_gem_request_put(port_request(port));
>>> +
>>> +	port_set(port, port_pack(i915_gem_request_get(rq), port_count(port)));
>>> +}
>>
>> Reason for not having one implementation of port_assign with
>> enable_nested_signalling outside it in the guc case?
>
> The handling of port.count is a big difference between guc and
> ordinary execlists. For execlists we count contexts, for guc we count
> requests.

Bah missed it, scrolling up and down and relying on memory is not a 
substitute for split screen..

>>> static void execlists_dequeue(struct intel_engine_cs *engine)
>>> {
>>> 	struct drm_i915_gem_request *last;
>>> @@ -397,7 +401,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>> 	struct rb_node *rb;
>>> 	bool submit = false;
>>>
>>> -	last = port->request;
>>> +	last = port_request(port);
>>> 	if (last)
>>> 		/* WaIdleLiteRestore:bdw,skl
>>> 		 * Apply the wa NOOPs to prevent ring:HEAD == req:TAIL
>>> @@ -407,7 +411,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>> 		 */
>>> 		last->tail = last->wa_tail;
>>>
>>> -	GEM_BUG_ON(port[1].request);
>>> +	GEM_BUG_ON(port_isset(&port[1]));
>>>
>>> 	/* Hardware submission is through 2 ports. Conceptually each port
>>> 	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
>>> @@ -464,7 +468,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>
>>> 			GEM_BUG_ON(last->ctx == cursor->ctx);
>>>
>>> -			i915_gem_request_assign(&port->request, last);
>>> +			if (submit)
>>> +				port_assign(port, last);
>>
>> Optimisation? GEM_BUG_ON(last != port_request(port))?
>
> Kind of, it is so both paths look identical and yes so we do meet the
> criteria that last != port_request(port) (because it is silly to the do
> the request_count dance if the goal is not to change request_count).

Ok.

>>> @@ -479,7 +484,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>> 		submit = true;
>>> 	}
>>> 	if (submit) {
>>> -		i915_gem_request_assign(&port->request, last);
>>> +		port_assign(port, last);
>>> 		engine->execlist_first = rb;
>>> 	}
>>> 	spin_unlock_irq(&engine->timeline->lock);
>>> @@ -488,16 +493,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>> 		execlists_submit_ports(engine);
>>> }
>>>
>>> -static bool execlists_elsp_idle(struct intel_engine_cs *engine)
>>> -{
>>> -	return !engine->execlist_port[0].request;
>>> -}
>>> -
>>> static bool execlists_elsp_ready(const struct intel_engine_cs *engine)
>>> {
>>> 	const struct execlist_port *port = engine->execlist_port;
>>>
>>> -	return port[0].count + port[1].count < 2;
>>> +	return port_count(&port[0]) + port_count(&port[1]) < 2;
>>> }
>>>
>>> /*
>>> @@ -547,7 +547,9 @@ static void intel_lrc_irq_handler(unsigned long data)
>>> 		tail = GEN8_CSB_WRITE_PTR(head);
>>> 		head = GEN8_CSB_READ_PTR(head);
>>> 		while (head != tail) {
>>> +			struct drm_i915_gem_request *rq;
>>> 			unsigned int status;
>>> +			unsigned int count;
>>>
>>> 			if (++head == GEN8_CSB_ENTRIES)
>>> 				head = 0;
>>> @@ -577,20 +579,24 @@ static void intel_lrc_irq_handler(unsigned long data)
>>> 			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
>>> 					 port[0].context_id);
>>>
>>> -			GEM_BUG_ON(port[0].count == 0);
>>> -			if (--port[0].count == 0) {
>>> +			rq = port_unpack(&port[0], &count);
>>> +			GEM_BUG_ON(count == 0);
>>> +			if (--count == 0) {
>>
>> If you changed this to be:
>>
>> count--;
>> port_set(...);
>> if (count > 0)
>> 	continue;
>>
>> It would be perhaps a bit more readable, but a potentially useless
>> port_set on the other hand. Not sure.
>
> We expect to overwrite port in the common path, or at least that's my
> expectation. Decrementing count for lite-restore should be the
> exception. Part of the intention is to do the minimal amount of work
> (especially avoiding the appearance of setting port.count = 0
> prematurely) and pass the later port.count == 0 assert.

I've seen the mode where we just append and append and append with no 
requests out coming out in a while :), but I agree it is not the typical 
case. So as I said I am fine with this as it is.

>
>>> 				GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
>>> -				GEM_BUG_ON(!i915_gem_request_completed(port[0].request));
>>> -				execlists_context_status_change(port[0].request,
>>> -								INTEL_CONTEXT_SCHEDULE_OUT);
>>> +				GEM_BUG_ON(!i915_gem_request_completed(rq));
>>> +				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
>>> +
>>> +				trace_i915_gem_request_out(rq);
>>> +				i915_gem_request_put(rq);
>>>
>>> -				trace_i915_gem_request_out(port[0].request);
>>> -				i915_gem_request_put(port[0].request);
>>> 				port[0] = port[1];
>>> 				memset(&port[1], 0, sizeof(port[1]));
>>> +			} else {
>>> +				port_set(&port[0], port_pack(rq, count));
>>> 			}
>>>
>>> -			GEM_BUG_ON(port[0].count == 0 &&
>>> +			/* After the final element, the hw should be idle */
>>> +			GEM_BUG_ON(port_count(&port[0]) == 0 &&
>>> 				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
>>> 		}
>
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> index 4a599e3480a9..68765d45116b 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> @@ -368,8 +368,14 @@ struct intel_engine_cs {
>>> 	/* Execlists */
>>> 	struct tasklet_struct irq_tasklet;
>>> 	struct execlist_port {
>>> -		struct drm_i915_gem_request *request;
>>> -		unsigned int count;
>>> +		struct drm_i915_gem_request *request_count;
>>> +#define EXECLIST_COUNT_BITS 2
>>> +#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
>>> +#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
>>> +#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
>>> +#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
>>> +#define port_set(p, packed) ((p)->request_count = (packed))
>>> +#define port_isset(p) ((p)->request_count)
>>> 		GEM_DEBUG_DECL(u32 context_id);
>>> 	} execlist_port[2];
>>> 	struct rb_root execlist_queue;
>>>
>>
>> Looks correct but I am still having trouble accepting the structure
>> shrink and code savings are worth having less readable code.
>
> Excluding port_unpack, I think it's a reasonable tidy.

I think the tidy is separate from the packing. But ok, lets go with it 
and see what happens.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] drm/i915/execlists: Reduce lock context between schedule/submit_request
  2017-05-03 11:37 ` [PATCH 7/9] drm/i915/execlists: Reduce lock context between schedule/submit_request Chris Wilson
@ 2017-05-05 12:13   ` Chris Wilson
  2017-05-05 13:30   ` Tvrtko Ursulin
  1 sibling, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2017-05-05 12:13 UTC (permalink / raw)
  To: intel-gfx

s/context/contention/ in subject

On Wed, May 03, 2017 at 12:37:57PM +0100, Chris Wilson wrote:
> If we do not require to perform priority bumping, and we haven't yet
> submitted the request, we can update its priority in situ and skip
> acquiring the engine locks -- thus avoiding any contention between us
> and submit/execute.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index fb0025627676..ca7f28795e2d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -767,6 +767,17 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>  		list_safe_reset_next(dep, p, dfs_link);
>  	}
>  
> +	/* If we didn't need to bump any existing priorites, and we haven't
> +	 * yet submitted this request (i..e there is no porential race with
> +	 * execlists_submit_request()), we can set our own priority and skip
> +	 * acquiring the engine locks.
> +	 */
> +	if (request->priotree.priority == INT_MIN) {
> +		request->priotree.priority = prio;
> +		if (stack.dfs_link.next == stack.dfs_link.prev)
> +			return;
> +	}
> +
>  	engine = request->engine;
>  	spin_lock_irq(&engine->timeline->lock);
>  
> -- 
> 2.11.0
> 

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-03 11:37 ` [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list Chris Wilson
@ 2017-05-05 13:19   ` Tvrtko Ursulin
  2017-05-05 13:32     ` Chris Wilson
  2017-05-05 13:50   ` Tvrtko Ursulin
  1 sibling, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05 13:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2017 12:37, Chris Wilson wrote:
> All the requests at the same priority are executed in FIFO order. They
> do not need to be stored in the rbtree themselves, as they are a simple
> list within a level. If we move the requests at one priority into a list,
> we can then reduce the rbtree to the set of priorities. This should keep
> the height of the rbtree small, as the number of active priorities can not
> exceed the number of active requests and should be typically only a few.
>
> Currently, we have ~2k possible different priority levels, that may
> increase to allow even more fine grained selection. Allocating those in
> advance seems a waste (and may be impossible), so we opt for allocating
> upon first use, and freeing after its requests are depleted. To avoid
> the possibility of an allocation failure causing us to lose a request,
> we preallocate the default priority (0) and bump any request to that
> priority if we fail to allocate it the appropriate plist. Having a
> request (that is ready to run, so not leading to corruption) execute
> out-of-order is better than leaking the request (and its dependency
> tree) entirely.
>
> There should be a benefit to reducing execlists_dequeue() to principally
> using a simple list (and reducing the frequency of both rbtree iteration
> and balancing on erase) but for typical workloads, request coalescing
> should be small enough that we don't notice any change. The main gain is
> from improving PI calls to schedule, and the explicit list within a
> level should make request unwinding simpler (we just need to insert at
> the head of the list rather than the tail and not have to make the
> rbtree search more complicated).
>
> v2: Avoid use-after-free when deleting a depleted priolist
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c        |  12 ++-
>  drivers/gpu/drm/i915/i915_gem_request.c    |   4 +-
>  drivers/gpu/drm/i915/i915_gem_request.h    |   2 +-
>  drivers/gpu/drm/i915/i915_guc_submission.c |  53 ++++++----
>  drivers/gpu/drm/i915/i915_utils.h          |   9 ++
>  drivers/gpu/drm/i915/intel_lrc.c           | 159 +++++++++++++++++++----------
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |   7 ++
>  7 files changed, 163 insertions(+), 83 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 0b5d7142d8d9..a8c7788d986e 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3314,7 +3314,6 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>
>  		if (i915.enable_execlists) {
>  			u32 ptr, read, write;
> -			struct rb_node *rb;
>  			unsigned int idx;
>
>  			seq_printf(m, "\tExeclist status: 0x%08x %08x\n",
> @@ -3358,9 +3357,14 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  			rcu_read_unlock();
>
>  			spin_lock_irq(&engine->timeline->lock);
> -			for (rb = engine->execlist_first; rb; rb = rb_next(rb)) {
> -				rq = rb_entry(rb, typeof(*rq), priotree.node);
> -				print_request(m, rq, "\t\tQ ");
> +			for (rb = engine->execlist_first; rb; rb = rb_next(rb)){
> +				struct execlist_priolist *plist =
> +					rb_entry(rb, typeof(*plist), node);
> +
> +				list_for_each_entry(rq,
> +						    &plist->requests,
> +						    priotree.link)
> +					print_request(m, rq, "\t\tQ ");
>  			}
>  			spin_unlock_irq(&engine->timeline->lock);
>  		} else if (INTEL_GEN(dev_priv) > 6) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 8d2a5c8e5005..ad2be26923fb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -159,7 +159,7 @@ i915_priotree_fini(struct drm_i915_private *i915, struct i915_priotree *pt)
>  {
>  	struct i915_dependency *dep, *next;
>
> -	GEM_BUG_ON(!RB_EMPTY_NODE(&pt->node));
> +	GEM_BUG_ON(!list_empty(&pt->link));
>
>  	/* Everyone we depended upon (the fences we wait to be signaled)
>  	 * should retire before us and remove themselves from our list.
> @@ -185,7 +185,7 @@ i915_priotree_init(struct i915_priotree *pt)
>  {
>  	INIT_LIST_HEAD(&pt->signalers_list);
>  	INIT_LIST_HEAD(&pt->waiters_list);
> -	RB_CLEAR_NODE(&pt->node);
> +	INIT_LIST_HEAD(&pt->link);
>  	pt->priority = INT_MIN;
>  }
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index feb81463abc9..6c58f7d87746 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -67,7 +67,7 @@ struct i915_dependency {
>  struct i915_priotree {
>  	struct list_head signalers_list; /* those before us, we depend upon */
>  	struct list_head waiters_list; /* those after us, they depend upon us */
> -	struct rb_node node;
> +	struct list_head link;
>  	int priority;
>  #define I915_PRIORITY_MAX 1024
>  #define I915_PRIORITY_DFL 0
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 62d3831a8a0d..f0d48bd508c9 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -674,32 +674,45 @@ static bool i915_guc_dequeue(struct intel_engine_cs *engine)
>
>  	spin_lock_irq(&engine->timeline->lock);
>  	rb = engine->execlist_first;
> +	GEM_BUG_ON(rb_first(&engine->execlist_queue) != rb);
>  	while (rb) {
> -		struct drm_i915_gem_request *rq =
> -			rb_entry(rb, typeof(*rq), priotree.node);
> -
> -		if (last && rq->ctx != last->ctx) {
> -			if (port != engine->execlist_port)
> -				break;
> -
> -			port_assign(port, last);
> -			port++;
> +		struct execlist_priolist *plist =
> +			rb_entry(rb, typeof(*plist), node);
> +		struct drm_i915_gem_request *rq, *rn;
> +
> +		list_for_each_entry_safe(rq, rn,
> +					 &plist->requests, priotree.link) {
> +			if (last && rq->ctx != last->ctx) {
> +				if (port != engine->execlist_port) {
> +					__list_del_many(&plist->requests,
> +							&rq->priotree.link);
> +					goto done;
> +				}
> +
> +				port_assign(port, last);
> +				port++;
> +			}
> +
> +			INIT_LIST_HEAD(&rq->priotree.link);
> +			rq->priotree.priority = INT_MAX;
> +
> +			i915_guc_submit(rq);
> +			trace_i915_gem_request_in(rq, port - engine->execlist_port);
> +			last = rq;
> +			submit = true;
>  		}
>
>  		rb = rb_next(rb);
> -		rb_erase(&rq->priotree.node, &engine->execlist_queue);
> -		RB_CLEAR_NODE(&rq->priotree.node);
> -		rq->priotree.priority = INT_MAX;
> -
> -		i915_guc_submit(rq);
> -		trace_i915_gem_request_in(rq, port - engine->execlist_port);
> -		last = rq;
> -		submit = true;
> -	}
> -	if (submit) {
> -		port_assign(port, last);
>  		engine->execlist_first = rb;
> +
> +		rb_erase(&plist->node, &engine->execlist_queue);
> +		INIT_LIST_HEAD(&plist->requests);
> +		if (plist->priority != I915_PRIORITY_DFL)
> +			kfree(plist);
>  	}
> +done:
> +	if (submit)
> +		port_assign(port, last);
>  	spin_unlock_irq(&engine->timeline->lock);
>
>  	return submit;
> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
> index f0500c65726d..9c65f9ae2e97 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -99,4 +99,13 @@
>  	__T;								\
>  })
>
> +#include <linux/list.h>
> +
> +static inline void __list_del_many(struct list_head *head,
> +				   struct list_head *first)
> +{
> +	head->next = first;
> +	first->prev = head;
> +}

Hm hm hmm this is a bit unintuitive since the first element is not 
deleted. Perhaps more accurate name would be __list_set_head? I was 
thinking __list_del_upto_prev but it was too long. :)

> +
>  #endif /* !__I915_UTILS_H */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9f7b6112c53a..fb0025627676 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -436,57 +436,79 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>
>  	spin_lock_irq(&engine->timeline->lock);
>  	rb = engine->execlist_first;
> +	GEM_BUG_ON(rb_first(&engine->execlist_queue) != rb);
>  	while (rb) {
> -		struct drm_i915_gem_request *cursor =
> -			rb_entry(rb, typeof(*cursor), priotree.node);
> -
> -		/* Can we combine this request with the current port? It has to
> -		 * be the same context/ringbuffer and not have any exceptions
> -		 * (e.g. GVT saying never to combine contexts).
> -		 *
> -		 * If we can combine the requests, we can execute both by
> -		 * updating the RING_TAIL to point to the end of the second
> -		 * request, and so we never need to tell the hardware about
> -		 * the first.
> -		 */
> -		if (last && !can_merge_ctx(cursor->ctx, last->ctx)) {
> -			/* If we are on the second port and cannot combine
> -			 * this request with the last, then we are done.
> -			 */
> -			if (port != engine->execlist_port)
> -				break;
> -
> -			/* If GVT overrides us we only ever submit port[0],
> -			 * leaving port[1] empty. Note that we also have
> -			 * to be careful that we don't queue the same
> -			 * context (even though a different request) to
> -			 * the second port.
> +		struct execlist_priolist *plist =
> +			rb_entry(rb, typeof(*plist), node);
> +		struct drm_i915_gem_request *rq, *rn;
> +
> +		list_for_each_entry_safe(rq, rn,
> +					 &plist->requests, priotree.link) {
> +			/*
> +			 * Can we combine this request with the current port?
> +			 * It has to be the same context/ringbuffer and not
> +			 * have any exceptions (e.g. GVT saying never to
> +			 * combine contexts).
> +			 *
> +			 * If we can combine the requests, we can execute both
> +			 * by updating the RING_TAIL to point to the end of the
> +			 * second request, and so we never need to tell the
> +			 * hardware about the first.
>  			 */
> -			if (ctx_single_port_submission(last->ctx) ||
> -			    ctx_single_port_submission(cursor->ctx))
> -				break;
> +			if (last && !can_merge_ctx(rq->ctx, last->ctx)) {
> +				/*
> +				 * If we are on the second port and cannot
> +				 * combine this request with the last, then we
> +				 * are done.
> +				 */
> +				if (port != engine->execlist_port) {
> +					__list_del_many(&plist->requests,
> +							&rq->priotree.link);
> +					goto done;
> +				}
> +
> +				/*
> +				 * If GVT overrides us we only ever submit
> +				 * port[0], leaving port[1] empty. Note that we
> +				 * also have to be careful that we don't queue
> +				 * the same context (even though a different
> +				 * request) to the second port.
> +				 */
> +				if (ctx_single_port_submission(last->ctx) ||
> +				    ctx_single_port_submission(rq->ctx)) {
> +					__list_del_many(&plist->requests,
> +							&rq->priotree.link);
> +					goto done;
> +				}
> +
> +				GEM_BUG_ON(last->ctx == rq->ctx);
> +
> +				if (submit)
> +					port_assign(port, last);
> +				port++;
> +			}
>
> -			GEM_BUG_ON(last->ctx == cursor->ctx);
> +			INIT_LIST_HEAD(&rq->priotree.link);
> +			rq->priotree.priority = INT_MAX;
>
> -			if (submit)
> -				port_assign(port, last);
> -			port++;
> +			__i915_gem_request_submit(rq);
> +			trace_i915_gem_request_in(rq,
> +						  port - engine->execlist_port);
> +			last = rq;
> +			submit = true;
>  		}
>
>  		rb = rb_next(rb);
> -		rb_erase(&cursor->priotree.node, &engine->execlist_queue);
> -		RB_CLEAR_NODE(&cursor->priotree.node);
> -		cursor->priotree.priority = INT_MAX;
> +		engine->execlist_first = rb;
>
> -		__i915_gem_request_submit(cursor);
> -		trace_i915_gem_request_in(cursor, port - engine->execlist_port);
> -		last = cursor;
> -		submit = true;
> +		rb_erase(&plist->node, &engine->execlist_queue);
> +		INIT_LIST_HEAD(&plist->requests);
> +		if (plist->priority != I915_PRIORITY_DFL)
> +			kfree(plist);
>  	}
> -	if (submit) {
> +done:
> +	if (submit)
>  		port_assign(port, last);
> -		engine->execlist_first = rb;
> -	}
>  	spin_unlock_irq(&engine->timeline->lock);
>
>  	if (submit)
> @@ -610,28 +632,53 @@ static void intel_lrc_irq_handler(unsigned long data)
>  	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);
>  }
>
> -static bool insert_request(struct i915_priotree *pt, struct rb_root *root)
> +static bool
> +insert_request(struct intel_engine_cs *engine,
> +	       struct i915_priotree *pt,
> +	       int prio)
>  {
> +	struct execlist_priolist *plist;
>  	struct rb_node **p, *rb;
>  	bool first = true;
>
> +find_plist:
>  	/* most positive priority is scheduled first, equal priorities fifo */
>  	rb = NULL;
> -	p = &root->rb_node;
> +	p = &engine->execlist_queue.rb_node;
>  	while (*p) {
> -		struct i915_priotree *pos;
> -
>  		rb = *p;
> -		pos = rb_entry(rb, typeof(*pos), node);
> -		if (pt->priority > pos->priority) {
> +		plist = rb_entry(rb, typeof(*plist), node);
> +		if (prio > plist->priority) {
>  			p = &rb->rb_left;
> -		} else {
> +		} else if (prio < plist->priority) {
>  			p = &rb->rb_right;
>  			first = false;
> +		} else {
> +			list_add_tail(&pt->link, &plist->requests);
> +			return false;
>  		}
>  	}
> -	rb_link_node(&pt->node, rb, p);
> -	rb_insert_color(&pt->node, root);
> +
> +	if (prio == I915_PRIORITY_DFL) {
> +		plist = &engine->default_priolist;
> +	} else {
> +		plist = kmalloc(sizeof(*plist), GFP_ATOMIC);
> +		/* Convert an allocation failure to a priority bump */
> +		if (unlikely(!plist)) {
> +			prio = I915_PRIORITY_DFL; /* recurses just once */
> +			goto find_plist;
> +		}
> +	}
> +
> +	plist->priority = prio;
> +	rb_link_node(&plist->node, rb, p);
> +	rb_insert_color(&plist->node, &engine->execlist_queue);
> +
> +	INIT_LIST_HEAD(&plist->requests);
> +	list_add_tail(&pt->link, &plist->requests);
> +
> +	if (first)
> +		engine->execlist_first = &plist->node;
>
>  	return first;
>  }
> @@ -644,8 +691,9 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
>  	/* Will be called from irq-context when using foreign fences. */
>  	spin_lock_irqsave(&engine->timeline->lock, flags);
>
> -	if (insert_request(&request->priotree, &engine->execlist_queue)) {
> -		engine->execlist_first = &request->priotree.node;
> +	if (insert_request(engine,
> +			   &request->priotree,
> +			   request->priotree.priority)) {
>  		if (execlists_elsp_ready(engine))
>  			tasklet_hi_schedule(&engine->irq_tasklet);
>  	}
> @@ -734,10 +782,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>  			continue;
>
>  		pt->priority = prio;
> -		if (!RB_EMPTY_NODE(&pt->node)) {
> -			rb_erase(&pt->node, &engine->execlist_queue);
> -			if (insert_request(pt, &engine->execlist_queue))
> -				engine->execlist_first = &pt->node;
> +		if (!list_empty(&pt->link)) {
> +			__list_del_entry(&pt->link);
> +			insert_request(engine, pt, prio);
>  		}
>  	}
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 68765d45116b..b8e01a9e2311 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -177,6 +177,12 @@ enum intel_engine_id {
>  	VECS
>  };
>
> +struct execlist_priolist {
> +	struct rb_node node;
> +	struct list_head requests;
> +	int priority;
> +};
> +
>  #define INTEL_ENGINE_CS_MAX_NAME 8
>
>  struct intel_engine_cs {
> @@ -367,6 +373,7 @@ struct intel_engine_cs {
>
>  	/* Execlists */
>  	struct tasklet_struct irq_tasklet;
> +	struct execlist_priolist default_priolist;
>  	struct execlist_port {
>  		struct drm_i915_gem_request *request_count;
>  #define EXECLIST_COUNT_BITS 2
>

I've pulled your tree to look at the control flow more easily. 
Cross-checked back and forth a bit and it looks fine to me.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] drm/i915/execlists: Reduce lock context between schedule/submit_request
  2017-05-03 11:37 ` [PATCH 7/9] drm/i915/execlists: Reduce lock context between schedule/submit_request Chris Wilson
  2017-05-05 12:13   ` Chris Wilson
@ 2017-05-05 13:30   ` Tvrtko Ursulin
  2017-05-05 13:38     ` Chris Wilson
  1 sibling, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05 13:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2017 12:37, Chris Wilson wrote:
> If we do not require to perform priority bumping, and we haven't yet
> submitted the request, we can update its priority in situ and skip
> acquiring the engine locks -- thus avoiding any contention between us
> and submit/execute.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index fb0025627676..ca7f28795e2d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -767,6 +767,17 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>  		list_safe_reset_next(dep, p, dfs_link);
>  	}
>
> +	/* If we didn't need to bump any existing priorites, and we haven't

priorities

> +	 * yet submitted this request (i..e there is no porential race with

potential

> +	 * execlists_submit_request()), we can set our own priority and skip
> +	 * acquiring the engine locks.
> +	 */
> +	if (request->priotree.priority == INT_MIN) {
> +		request->priotree.priority = prio;
> +		if (stack.dfs_link.next == stack.dfs_link.prev)
> +			return;

Move the assignment of the priority under the if?

> +	}
> +
>  	engine = request->engine;
>  	spin_lock_irq(&engine->timeline->lock);
>
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-05 13:19   ` Tvrtko Ursulin
@ 2017-05-05 13:32     ` Chris Wilson
  2017-05-05 13:37       ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-05 13:32 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 05, 2017 at 02:19:07PM +0100, Tvrtko Ursulin wrote:
> 
> On 03/05/2017 12:37, Chris Wilson wrote:
> > struct intel_engine_cs {
> >@@ -367,6 +373,7 @@ struct intel_engine_cs {
> >
> > 	/* Execlists */
> > 	struct tasklet_struct irq_tasklet;
> >+	struct execlist_priolist default_priolist;
> > 	struct execlist_port {
> > 		struct drm_i915_gem_request *request_count;
> > #define EXECLIST_COUNT_BITS 2
> >

Just a small bikeshed to consider. Having switched to
I915_PRIORITY_NORMAL, do we have a better name for default_priolist? I
still prefer default_priolist over normal_priolist. Go back to
I915_PRIORITY_DEFAULT?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 9/9] drm/i915: Don't force serialisation on marking up execlists irq posted
  2017-05-03 11:37 ` [PATCH 9/9] drm/i915: Don't force serialisation on marking up execlists irq posted Chris Wilson
@ 2017-05-05 13:34   ` Tvrtko Ursulin
  0 siblings, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05 13:34 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2017 12:37, Chris Wilson wrote:
> Since we coordinate with the execlists tasklet using a locked schedule
> operation that ensures that after we set the engine->irq_posted we
> always have an invocation of the tasklet, we do not need to use a locked
> operation to set the engine->irq_posted itself.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 8f60c8045b3e..951a7705949e 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1360,7 +1360,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift)
>
>  	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift)) {
>  		if (port_count(&engine->execlist_port[0])) {
> -			set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +			__set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>  			tasklet = true;
>  		}
>  	}
>

True :)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-05 13:32     ` Chris Wilson
@ 2017-05-05 13:37       ` Tvrtko Ursulin
  2017-05-05 13:51         ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05 13:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 05/05/2017 14:32, Chris Wilson wrote:
> On Fri, May 05, 2017 at 02:19:07PM +0100, Tvrtko Ursulin wrote:
>>
>> On 03/05/2017 12:37, Chris Wilson wrote:
>>> struct intel_engine_cs {
>>> @@ -367,6 +373,7 @@ struct intel_engine_cs {
>>>
>>> 	/* Execlists */
>>> 	struct tasklet_struct irq_tasklet;
>>> +	struct execlist_priolist default_priolist;
>>> 	struct execlist_port {
>>> 		struct drm_i915_gem_request *request_count;
>>> #define EXECLIST_COUNT_BITS 2
>>>
>
> Just a small bikeshed to consider. Having switched to
> I915_PRIORITY_NORMAL, do we have a better name for default_priolist? I
> still prefer default_priolist over normal_priolist. Go back to
> I915_PRIORITY_DEFAULT?

default_priolist is fine I think since it is dual purpose. Primary 
purpose to avoid allocations as you said.

Although I am still a bit dejected how some userspace could decide one 
day to send everything at I915_PRIORITY_NORMAL - n, in order to use 
I915_PRIORITY_NORMAL as the high prio not requiring cap_sys_admin, and 
in doing so completely defeat the atomic kmalloc avoidance. :(

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] drm/i915/execlists: Reduce lock context between schedule/submit_request
  2017-05-05 13:30   ` Tvrtko Ursulin
@ 2017-05-05 13:38     ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2017-05-05 13:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 05, 2017 at 02:30:08PM +0100, Tvrtko Ursulin wrote:
> 
> On 03/05/2017 12:37, Chris Wilson wrote:
> >If we do not require to perform priority bumping, and we haven't yet
> >submitted the request, we can update its priority in situ and skip
> >acquiring the engine locks -- thus avoiding any contention between us
> >and submit/execute.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> >diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >index fb0025627676..ca7f28795e2d 100644
> >--- a/drivers/gpu/drm/i915/intel_lrc.c
> >+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >@@ -767,6 +767,17 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
> > 		list_safe_reset_next(dep, p, dfs_link);
> > 	}
> >
> >+	/* If we didn't need to bump any existing priorites, and we haven't
> 
> priorities
> 
> >+	 * yet submitted this request (i..e there is no porential race with
> 
> potential
> 
> >+	 * execlists_submit_request()), we can set our own priority and skip
> >+	 * acquiring the engine locks.
> >+	 */
> >+	if (request->priotree.priority == INT_MIN) {
> >+		request->priotree.priority = prio;
> >+		if (stack.dfs_link.next == stack.dfs_link.prev)
> >+			return;
> 
> Move the assignment of the priority under the if?

The assignment always work. I just liked the look of this code more :)
The skip of the assignment is minor benefit. For bonus points, could do
a list_del_entry(&stack.dfs_link) after the return.

Sold.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-03 11:37 ` [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list Chris Wilson
  2017-05-05 13:19   ` Tvrtko Ursulin
@ 2017-05-05 13:50   ` Tvrtko Ursulin
  2017-05-05 14:04     ` Chris Wilson
  1 sibling, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05 13:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2017 12:37, Chris Wilson wrote:

[snip]

> +#include <linux/list.h>
> +
> +static inline void __list_del_many(struct list_head *head,
> +				   struct list_head *first)
> +{
> +	head->next = first;
> +	first->prev = head;
> +}

Btw it is similar to __list_cut_position, but without the second list 
you don't need and deleting the first entry which you do not want.

So I am just thinking if consistent naming with the core function would 
be beneficial like __list_cut_position_before/exclusive?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-05 13:37       ` Tvrtko Ursulin
@ 2017-05-05 13:51         ` Chris Wilson
  2017-05-05 14:20           ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-05 13:51 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 05, 2017 at 02:37:41PM +0100, Tvrtko Ursulin wrote:
> 
> On 05/05/2017 14:32, Chris Wilson wrote:
> >On Fri, May 05, 2017 at 02:19:07PM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 03/05/2017 12:37, Chris Wilson wrote:
> >>>struct intel_engine_cs {
> >>>@@ -367,6 +373,7 @@ struct intel_engine_cs {
> >>>
> >>>	/* Execlists */
> >>>	struct tasklet_struct irq_tasklet;
> >>>+	struct execlist_priolist default_priolist;
> >>>	struct execlist_port {
> >>>		struct drm_i915_gem_request *request_count;
> >>>#define EXECLIST_COUNT_BITS 2
> >>>
> >
> >Just a small bikeshed to consider. Having switched to
> >I915_PRIORITY_NORMAL, do we have a better name for default_priolist? I
> >still prefer default_priolist over normal_priolist. Go back to
> >I915_PRIORITY_DEFAULT?
> 
> default_priolist is fine I think since it is dual purpose. Primary
> purpose to avoid allocations as you said.
> 
> Although I am still a bit dejected how some userspace could decide
> one day to send everything at I915_PRIORITY_NORMAL - n, in order to
> use I915_PRIORITY_NORMAL as the high prio not requiring
> cap_sys_admin, and in doing so completely defeat the atomic kmalloc
> avoidance. :(

Should we just bite the bullet and install a kmem_cache here?
It didn't solve the kmalloc error handling, but it does at least give us
a freelist. There is a reasonable argument that as soon as userspace
starts using non-default priorities, we may see many different levels
justifying allocating a whole slab upfront.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-05 13:50   ` Tvrtko Ursulin
@ 2017-05-05 14:04     ` Chris Wilson
  2017-05-05 14:24       ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2017-05-05 14:04 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 05, 2017 at 02:50:46PM +0100, Tvrtko Ursulin wrote:
> 
> On 03/05/2017 12:37, Chris Wilson wrote:
> 
> [snip]
> 
> >+#include <linux/list.h>
> >+
> >+static inline void __list_del_many(struct list_head *head,
> >+				   struct list_head *first)
> >+{
> >+	head->next = first;
> >+	first->prev = head;
> >+}
> 
> Btw it is similar to __list_cut_position, but without the second
> list you don't need and deleting the first entry which you do not
> want.
> 
> So I am just thinking if consistent naming with the core function
> would be beneficial like __list_cut_position_before/exclusive?

It's not a cut, as in we don't create two seperate lists. It is more of
splice with itself, though I did look at cut and wonder if it would
magically eliminate the unwanted paths but no..

Still favouring a list_del prefix. Once upon a time I just opencoded
this, it's far easily than trying to work out a good name :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-05 13:51         ` Chris Wilson
@ 2017-05-05 14:20           ` Tvrtko Ursulin
  2017-05-05 14:26             ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05 14:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 05/05/2017 14:51, Chris Wilson wrote:
> On Fri, May 05, 2017 at 02:37:41PM +0100, Tvrtko Ursulin wrote:
>>
>> On 05/05/2017 14:32, Chris Wilson wrote:
>>> On Fri, May 05, 2017 at 02:19:07PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 03/05/2017 12:37, Chris Wilson wrote:
>>>>> struct intel_engine_cs {
>>>>> @@ -367,6 +373,7 @@ struct intel_engine_cs {
>>>>>
>>>>> 	/* Execlists */
>>>>> 	struct tasklet_struct irq_tasklet;
>>>>> +	struct execlist_priolist default_priolist;
>>>>> 	struct execlist_port {
>>>>> 		struct drm_i915_gem_request *request_count;
>>>>> #define EXECLIST_COUNT_BITS 2
>>>>>
>>>
>>> Just a small bikeshed to consider. Having switched to
>>> I915_PRIORITY_NORMAL, do we have a better name for default_priolist? I
>>> still prefer default_priolist over normal_priolist. Go back to
>>> I915_PRIORITY_DEFAULT?
>>
>> default_priolist is fine I think since it is dual purpose. Primary
>> purpose to avoid allocations as you said.
>>
>> Although I am still a bit dejected how some userspace could decide
>> one day to send everything at I915_PRIORITY_NORMAL - n, in order to
>> use I915_PRIORITY_NORMAL as the high prio not requiring
>> cap_sys_admin, and in doing so completely defeat the atomic kmalloc
>> avoidance. :(
>
> Should we just bite the bullet and install a kmem_cache here?
> It didn't solve the kmalloc error handling, but it does at least give us
> a freelist. There is a reasonable argument that as soon as userspace
> starts using non-default priorities, we may see many different levels
> justifying allocating a whole slab upfront.

Actually my argument is pants since allocations only happen with 
"opening" a new priority level. So I think leave it as it is until there 
is a need.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-05 14:04     ` Chris Wilson
@ 2017-05-05 14:24       ` Tvrtko Ursulin
  0 siblings, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2017-05-05 14:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 05/05/2017 15:04, Chris Wilson wrote:
> On Fri, May 05, 2017 at 02:50:46PM +0100, Tvrtko Ursulin wrote:
>>
>> On 03/05/2017 12:37, Chris Wilson wrote:
>>
>> [snip]
>>
>>> +#include <linux/list.h>
>>> +
>>> +static inline void __list_del_many(struct list_head *head,
>>> +				   struct list_head *first)
>>> +{
>>> +	head->next = first;
>>> +	first->prev = head;
>>> +}
>>
>> Btw it is similar to __list_cut_position, but without the second
>> list you don't need and deleting the first entry which you do not
>> want.
>>
>> So I am just thinking if consistent naming with the core function
>> would be beneficial like __list_cut_position_before/exclusive?
>
> It's not a cut, as in we don't create two seperate lists. It is more of
> splice with itself, though I did look at cut and wonder if it would
> magically eliminate the unwanted paths but no..
>
> Still favouring a list_del prefix. Once upon a time I just opencoded
> this, it's far easily than trying to work out a good name :)

It's not __list_del_many either since it does not unlink the ones in the 
middle. ;) Doesn't matter, as you say, it is quite hard to find the 
ideal name.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list
  2017-05-05 14:20           ` Tvrtko Ursulin
@ 2017-05-05 14:26             ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2017-05-05 14:26 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 05, 2017 at 03:20:08PM +0100, Tvrtko Ursulin wrote:
> 
> On 05/05/2017 14:51, Chris Wilson wrote:
> >On Fri, May 05, 2017 at 02:37:41PM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 05/05/2017 14:32, Chris Wilson wrote:
> >>>On Fri, May 05, 2017 at 02:19:07PM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>>On 03/05/2017 12:37, Chris Wilson wrote:
> >>>>>struct intel_engine_cs {
> >>>>>@@ -367,6 +373,7 @@ struct intel_engine_cs {
> >>>>>
> >>>>>	/* Execlists */
> >>>>>	struct tasklet_struct irq_tasklet;
> >>>>>+	struct execlist_priolist default_priolist;
> >>>>>	struct execlist_port {
> >>>>>		struct drm_i915_gem_request *request_count;
> >>>>>#define EXECLIST_COUNT_BITS 2
> >>>>>
> >>>
> >>>Just a small bikeshed to consider. Having switched to
> >>>I915_PRIORITY_NORMAL, do we have a better name for default_priolist? I
> >>>still prefer default_priolist over normal_priolist. Go back to
> >>>I915_PRIORITY_DEFAULT?
> >>
> >>default_priolist is fine I think since it is dual purpose. Primary
> >>purpose to avoid allocations as you said.
> >>
> >>Although I am still a bit dejected how some userspace could decide
> >>one day to send everything at I915_PRIORITY_NORMAL - n, in order to
> >>use I915_PRIORITY_NORMAL as the high prio not requiring
> >>cap_sys_admin, and in doing so completely defeat the atomic kmalloc
> >>avoidance. :(
> >
> >Should we just bite the bullet and install a kmem_cache here?
> >It didn't solve the kmalloc error handling, but it does at least give us
> >a freelist. There is a reasonable argument that as soon as userspace
> >starts using non-default priorities, we may see many different levels
> >justifying allocating a whole slab upfront.
> 
> Actually my argument is pants since allocations only happen with
> "opening" a new priority level. So I think leave it as it is until
> there is a need.

Patch is ready for kmalloc showing up as a latency source.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-05-05 14:26 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 11:37 [PATCH 1/9] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
2017-05-03 11:37 ` [PATCH 2/9] drm/i915: Redefine ptr_pack_bits() and friends Chris Wilson
2017-05-03 11:37 ` [PATCH 3/9] drm/i915/execlists: Pack the count into the low bits of the port.request Chris Wilson
2017-05-05 10:49   ` Tvrtko Ursulin
2017-05-05 11:16     ` Chris Wilson
2017-05-05 11:58       ` Tvrtko Ursulin
2017-05-03 11:37 ` [PATCH 4/9] drm/i915: Don't mark an execlists context-switch when idle Chris Wilson
2017-05-03 11:37 ` [PATCH 5/9] drm/i915: Use a define for the default priority [0] Chris Wilson
2017-05-04 13:32   ` Joonas Lahtinen
2017-05-04 13:58     ` Chris Wilson
2017-05-05  8:31       ` Mika Kuoppala
2017-05-05  9:13         ` Chris Wilson
2017-05-05  9:21           ` Tvrtko Ursulin
2017-05-05  9:50             ` Chris Wilson
2017-05-03 11:37 ` [PATCH 6/9] drm/i915: Split execlist priority queue into rbtree + linked list Chris Wilson
2017-05-05 13:19   ` Tvrtko Ursulin
2017-05-05 13:32     ` Chris Wilson
2017-05-05 13:37       ` Tvrtko Ursulin
2017-05-05 13:51         ` Chris Wilson
2017-05-05 14:20           ` Tvrtko Ursulin
2017-05-05 14:26             ` Chris Wilson
2017-05-05 13:50   ` Tvrtko Ursulin
2017-05-05 14:04     ` Chris Wilson
2017-05-05 14:24       ` Tvrtko Ursulin
2017-05-03 11:37 ` [PATCH 7/9] drm/i915/execlists: Reduce lock context between schedule/submit_request Chris Wilson
2017-05-05 12:13   ` Chris Wilson
2017-05-05 13:30   ` Tvrtko Ursulin
2017-05-05 13:38     ` Chris Wilson
2017-05-03 11:37 ` [PATCH 8/9] drm/i915: Stop inlining the execlists IRQ handler Chris Wilson
2017-05-04 13:55   ` Mika Kuoppala
2017-05-03 11:37 ` [PATCH 9/9] drm/i915: Don't force serialisation on marking up execlists irq posted Chris Wilson
2017-05-05 13:34   ` Tvrtko Ursulin
2017-05-03 12:25 ` ✓ Fi.CI.BAT: success for series starting with [1/9] drm/i915: Make ptr_unpack_bits() more function-like Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.