All of lore.kernel.org
 help / color / mirror / Atom feed
* Direct execlists submission
@ 2018-05-14  9:37 Chris Wilson
  2018-05-14  9:37 ` [PATCH 01/10] drm/i915: Mark up nested spinlocks Chris Wilson
                   ` (14 more replies)
  0 siblings, 15 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

Continuing the discussion with the latest refactorings, however I ran
some tests to measure the impact on system (!i915) latency,
using igt/benchmarks/gem_syslatency -t 120

drm-tip:
	latency mean=1.211us max=10us (no load)
	latency mean=2.611us max=83us (i915)

        latency mean=1.720us max=833us (no load, bg writeout)
        latency mean=3.294us max=607us (i915, bg writeout)

this series:
        latency mean=1.280us max=15us (no load)
        latency mean=9.688us max=1271us (i915)

        latency mean=1.712us max=1026us (no load, bg writeout)
        latency mean=14.347us max=489850us (i915, bg writeout)

That certainly takes the shine off directly using the tasklet for
submission from the irq handler. Being selfish, I still think we can't
allow the GPU to stall waiting for ksoftirqd, but at the same time we
need to solve the latency issues introduced elsewhere.
-Chris



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/10] drm/i915: Mark up nested spinlocks
  2018-05-14  9:37 Direct execlists submission Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14 10:06   ` Tvrtko Ursulin
  2018-05-14  9:37 ` [PATCH 02/10] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

When we process the outstanding requests upon banning a context, we need
to acquire both the engine and the client's timeline, nesting the locks.
This requires explicit markup as the two timelines are now of the same
class, since commit a89d1f921c15 ("drm/i915: Split i915_gem_timeline into
individual timelines").

Testcase: igt/gem_eio/banned
Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 89bf5d67cb74..0a2070112b66 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3119,7 +3119,7 @@ static void engine_skip_context(struct i915_request *request)
 	GEM_BUG_ON(timeline == &engine->timeline);
 
 	spin_lock_irqsave(&engine->timeline.lock, flags);
-	spin_lock(&timeline->lock);
+	spin_lock_nested(&timeline->lock, SINGLE_DEPTH_NESTING);
 
 	list_for_each_entry_continue(request, &engine->timeline.requests, link)
 		if (request->ctx == hung_ctx)
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/10] drm/i915: Remove tasklet flush before disable
  2018-05-14  9:37 Direct execlists submission Chris Wilson
  2018-05-14  9:37 ` [PATCH 01/10] drm/i915: Mark up nested spinlocks Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14  9:37 ` [PATCH 03/10] drm/i915: Wrap tasklet_struct for abuse Chris Wilson
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

The idea was to try and let the existing tasklet run to completion
before we began the reset, but it involves a racy check against anything
else that tries to run the tasklet. Rather than acknowledge and ignore
the race, let it be and don't try and be too clever.

The tasklet will resume execution after reset (after spinning a bit
during reset), but before we allow it to resume we will have cleared all
the pending state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0a2070112b66..0dc369a9ec4d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3035,16 +3035,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 * calling engine->init_hw() and also writing the ELSP.
 	 * Turning off the execlists->tasklet until the reset is over
 	 * prevents the race.
-	 *
-	 * Note that this needs to be a single atomic operation on the
-	 * tasklet (flush existing tasks, prevent new tasks) to prevent
-	 * a race between reset and set-wedged. It is not, so we do the best
-	 * we can atm and make sure we don't lock the machine up in the more
-	 * common case of recursively being called from set-wedged from inside
-	 * i915_reset.
 	 */
-	if (!atomic_read(&engine->execlists.tasklet.count))
-		tasklet_kill(&engine->execlists.tasklet);
 	tasklet_disable(&engine->execlists.tasklet);
 
 	/*
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/10] drm/i915: Wrap tasklet_struct for abuse
  2018-05-14  9:37 Direct execlists submission Chris Wilson
  2018-05-14  9:37 ` [PATCH 01/10] drm/i915: Mark up nested spinlocks Chris Wilson
  2018-05-14  9:37 ` [PATCH 02/10] drm/i915: Remove tasklet flush before disable Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14  9:37 ` [PATCH 04/10] drm/i915: Only sync tasklets once for recursive reset preparation Chris Wilson
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

In the next few patches, we want to abuse tasklet to avoid ksoftirqd
latency along critical paths. To make that abuse easily to swallow,
first coat the tasklet in a little syntactic sugar.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c             |  4 +-
 drivers/gpu/drm/i915/i915_irq.c             |  2 +-
 drivers/gpu/drm/i915/i915_tasklet.h         | 78 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_engine_cs.c      | 11 ++-
 drivers/gpu/drm/i915/intel_guc_submission.c |  8 +--
 drivers/gpu/drm/i915/intel_lrc.c            | 18 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  3 +-
 7 files changed, 102 insertions(+), 22 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_tasklet.h

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0dc369a9ec4d..75668b50c81e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3036,7 +3036,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 * Turning off the execlists->tasklet until the reset is over
 	 * prevents the race.
 	 */
-	tasklet_disable(&engine->execlists.tasklet);
+	i915_tasklet_disable(&engine->execlists.tasklet);
 
 	/*
 	 * We're using worker to queue preemption requests from the tasklet in
@@ -3256,7 +3256,7 @@ void i915_gem_reset(struct drm_i915_private *dev_priv,
 
 void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
 {
-	tasklet_enable(&engine->execlists.tasklet);
+	i915_tasklet_enable(&engine->execlists.tasklet);
 	kthread_unpark(engine->breadcrumbs.signaler);
 
 	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f9bc3aaa90d0..f8aff5a5aa83 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1477,7 +1477,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	}
 
 	if (tasklet)
-		tasklet_hi_schedule(&execlists->tasklet);
+		i915_tasklet_schedule(&execlists->tasklet);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
new file mode 100644
index 000000000000..e24e4f77fe8e
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_tasklet.h
@@ -0,0 +1,78 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#ifndef _I915_TASKLET_H_
+#define _I915_TASKLET_H_
+
+#include <linux/atomic.h>
+#include <linux/interrupt.h>
+
+/**
+ * struct i915_tasklet - wrapper around tasklet_struct
+ *
+ * We want to abuse tasklets slightly, such as calling them directly. In some
+ * cases, this requires some auxiliary tracking so subclass the tasklet_struct
+ * so that we have a central place and helpers.
+ */
+struct i915_tasklet {
+	struct tasklet_struct base;
+};
+
+static inline void i915_tasklet_init(struct i915_tasklet *t,
+				     void (*func)(unsigned long),
+				     unsigned long data)
+{
+	tasklet_init(&t->base, func, data);
+}
+
+static inline bool i915_tasklet_is_scheduled(const struct i915_tasklet *t)
+{
+	return test_bit(TASKLET_STATE_SCHED, &t->base.state);
+}
+
+static inline bool i915_tasklet_is_running(const struct i915_tasklet *t)
+{
+	return test_bit(TASKLET_STATE_RUN, &t->base.state);
+}
+
+static inline bool i915_tasklet_is_enabled(const struct i915_tasklet *t)
+{
+	return likely(!atomic_read(&t->base.count));
+}
+
+static inline void i915_tasklet_schedule(struct i915_tasklet *t)
+{
+	tasklet_hi_schedule(&t->base);
+}
+
+static inline void i915_tasklet_flush(struct i915_tasklet *t)
+{
+	tasklet_kill(&t->base);
+}
+
+static inline void i915_tasklet_disable(struct i915_tasklet *t)
+{
+	tasklet_disable(&t->base);
+}
+
+static inline void i915_tasklet_enable(struct i915_tasklet *t)
+{
+	tasklet_enable(&t->base);
+}
+
+static inline void i915_tasklet_set_func(struct i915_tasklet *t,
+					 void (*func)(unsigned long data),
+					 unsigned long data)
+{
+	i915_tasklet_disable(t);
+
+	t->base.func = func;
+	t->base.data = data;
+
+	i915_tasklet_enable(t);
+}
+
+#endif /* _I915_TASKLET_H_ */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 6bfd7e3ed152..70595e995c45 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1021,7 +1021,7 @@ void intel_engines_park(struct drm_i915_private *i915)
 	for_each_engine(engine, i915, id) {
 		/* Flush the residual irq tasklets first. */
 		intel_engine_disarm_breadcrumbs(engine);
-		tasklet_kill(&engine->execlists.tasklet);
+		i915_tasklet_flush(&engine->execlists.tasklet);
 
 		/*
 		 * We are committed now to parking the engines, make sure there
@@ -1238,9 +1238,8 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 			   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)),
 			   yesno(test_bit(ENGINE_IRQ_EXECLIST,
 					  &engine->irq_posted)),
-			   yesno(test_bit(TASKLET_STATE_SCHED,
-					  &engine->execlists.tasklet.state)),
-			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)));
+			   yesno(i915_tasklet_is_scheduled(&engine->execlists.tasklet)),
+			   enableddisabled(i915_tasklet_is_enabled(&engine->execlists.tasklet)));
 		if (read >= GEN8_CSB_ENTRIES)
 			read = 0;
 		if (write >= GEN8_CSB_ENTRIES)
@@ -1468,7 +1467,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 	if (!intel_engine_supports_stats(engine))
 		return -ENODEV;
 
-	tasklet_disable(&execlists->tasklet);
+	i915_tasklet_disable(&execlists->tasklet);
 	write_seqlock_irqsave(&engine->stats.lock, flags);
 
 	if (unlikely(engine->stats.enabled == ~0)) {
@@ -1494,7 +1493,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 
 unlock:
 	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-	tasklet_enable(&execlists->tasklet);
+	i915_tasklet_enable(&execlists->tasklet);
 
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 2feb65096966..a7afc976c3b9 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -591,7 +591,7 @@ static void inject_preempt_context(struct work_struct *work)
 	if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) {
 		execlists_clear_active(&engine->execlists,
 				       EXECLISTS_ACTIVE_PREEMPT);
-		tasklet_schedule(&engine->execlists.tasklet);
+		i915_tasklet_schedule(&engine->execlists.tasklet);
 	}
 }
 
@@ -1263,10 +1263,10 @@ int intel_guc_submission_enable(struct intel_guc *guc)
 	guc_interrupts_capture(dev_priv);
 
 	for_each_engine(engine, dev_priv, id) {
-		struct intel_engine_execlists * const execlists =
-			&engine->execlists;
+		i915_tasklet_set_func(&engine->execlists.tasklet,
+				      guc_submission_tasklet,
+				      (unsigned long)engine);
 
-		execlists->tasklet.func = guc_submission_tasklet;
 		engine->park = guc_submission_park;
 		engine->unpark = guc_submission_unpark;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 15434cad5430..0c3dad5c9709 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1166,7 +1166,7 @@ static void queue_request(struct intel_engine_cs *engine,
 static void __submit_queue(struct intel_engine_cs *engine, int prio)
 {
 	engine->execlists.queue_priority = prio;
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+	i915_tasklet_schedule(&engine->execlists.tasklet);
 }
 
 static void submit_queue(struct intel_engine_cs *engine, int prio)
@@ -1781,7 +1781,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
 
 	/* After a GPU reset, we may have requests to replay */
 	if (execlists->first)
-		tasklet_schedule(&execlists->tasklet);
+		i915_tasklet_schedule(&execlists->tasklet);
 
 	return 0;
 }
@@ -2185,9 +2185,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
 	 * Tasklet cannot be active at this point due intel_mark_active/idle
 	 * so this is just for documentation.
 	 */
-	if (WARN_ON(test_bit(TASKLET_STATE_SCHED,
-			     &engine->execlists.tasklet.state)))
-		tasklet_kill(&engine->execlists.tasklet);
+	if (WARN_ON(i915_tasklet_is_scheduled(&engine->execlists.tasklet)))
+		i915_tasklet_flush(&engine->execlists.tasklet);
 
 	dev_priv = engine->i915;
 
@@ -2212,7 +2211,10 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->submit_request = execlists_submit_request;
 	engine->cancel_requests = execlists_cancel_requests;
 	engine->schedule = execlists_schedule;
-	engine->execlists.tasklet.func = execlists_submission_tasklet;
+
+	i915_tasklet_set_func(&engine->execlists.tasklet,
+			      execlists_submission_tasklet,
+			      (unsigned long)engine);
 
 	engine->park = NULL;
 	engine->unpark = NULL;
@@ -2306,8 +2308,8 @@ logical_ring_setup(struct intel_engine_cs *engine)
 
 	engine->execlists.fw_domains = fw_domains;
 
-	tasklet_init(&engine->execlists.tasklet,
-		     execlists_submission_tasklet, (unsigned long)engine);
+	i915_tasklet_init(&engine->execlists.tasklet,
+			  execlists_submission_tasklet, (unsigned long)engine);
 
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 010750e8ee44..f6ba354faf89 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -11,6 +11,7 @@
 #include "i915_pmu.h"
 #include "i915_request.h"
 #include "i915_selftest.h"
+#include "i915_tasklet.h"
 #include "i915_timeline.h"
 #include "intel_gpu_commands.h"
 
@@ -202,7 +203,7 @@ struct intel_engine_execlists {
 	/**
 	 * @tasklet: softirq tasklet for bottom handler
 	 */
-	struct tasklet_struct tasklet;
+	struct i915_tasklet tasklet;
 
 	/**
 	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/10] drm/i915: Only sync tasklets once for recursive reset preparation
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (2 preceding siblings ...)
  2018-05-14  9:37 ` [PATCH 03/10] drm/i915: Wrap tasklet_struct for abuse Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14  9:37 ` [PATCH 05/10] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

When setting up reset, we may need to recursively prepare an engine. In
which case we should only synchronously flush the tasklets on the outer
most call, the inner calls will then be inside an atomic section where
the tasklet will never be run (and so the sync will never complete).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c     | 2 +-
 drivers/gpu/drm/i915/i915_tasklet.h | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 75668b50c81e..3f7ecbff1179 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3036,7 +3036,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 * Turning off the execlists->tasklet until the reset is over
 	 * prevents the race.
 	 */
-	i915_tasklet_disable(&engine->execlists.tasklet);
+	i915_tasklet_disable_once(&engine->execlists.tasklet);
 
 	/*
 	 * We're using worker to queue preemption requests from the tasklet in
diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
index e24e4f77fe8e..b7cbbc2f8f69 100644
--- a/drivers/gpu/drm/i915/i915_tasklet.h
+++ b/drivers/gpu/drm/i915/i915_tasklet.h
@@ -58,6 +58,12 @@ static inline void i915_tasklet_disable(struct i915_tasklet *t)
 	tasklet_disable(&t->base);
 }
 
+static inline void i915_tasklet_disable_once(struct i915_tasklet *t)
+{
+	if (atomic_inc_return(&t->base.count) == 1)
+		tasklet_unlock_wait(&t->base);
+}
+
 static inline void i915_tasklet_enable(struct i915_tasklet *t)
 {
 	tasklet_enable(&t->base);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/10] drm/i915/execlists: Direct submit onto idle engines
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (3 preceding siblings ...)
  2018-05-14  9:37 ` [PATCH 04/10] drm/i915: Only sync tasklets once for recursive reset preparation Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14  9:37 ` [PATCH 06/10] drm/i915/execlists: Direct submission from irq handler Chris Wilson
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

Bypass using the tasklet to submit the first request to HW, as the
tasklet may be deferred unto ksoftirqd and at a minimum will add in
excess of 10us (and maybe tens of milliseconds) to our execution
latency. This latency reduction is most notable when execution flows
between engines.

v2: Beware handling preemption completion from the direct submit path as
well.
v3: Make the abuse clear and track our extra state inside i915_tasklet.
v4:s/wakeup_queue/update_queue/

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
 drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
 drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
 3 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
index b7cbbc2f8f69..6f3b2fedc65b 100644
--- a/drivers/gpu/drm/i915/i915_tasklet.h
+++ b/drivers/gpu/drm/i915/i915_tasklet.h
@@ -8,8 +8,11 @@
 #define _I915_TASKLET_H_
 
 #include <linux/atomic.h>
+#include <linux/bitops.h>
 #include <linux/interrupt.h>
 
+#include "i915_gem.h"
+
 /**
  * struct i915_tasklet - wrapper around tasklet_struct
  *
@@ -19,6 +22,8 @@
  */
 struct i915_tasklet {
 	struct tasklet_struct base;
+	unsigned long flags;
+#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
 };
 
 static inline void i915_tasklet_init(struct i915_tasklet *t,
@@ -43,6 +48,14 @@ static inline bool i915_tasklet_is_enabled(const struct i915_tasklet *t)
 	return likely(!atomic_read(&t->base.count));
 }
 
+static inline bool i915_tasklet_is_direct_submit(const struct i915_tasklet *t)
+{
+	/* Only legal to be checked from inside the tasklet. */
+	GEM_BUG_ON(!i915_tasklet_is_running(t));
+
+	return t->flags & I915_TASKLET_DIRECT_SUBMIT;
+}
+
 static inline void i915_tasklet_schedule(struct i915_tasklet *t)
 {
 	tasklet_hi_schedule(&t->base);
@@ -81,4 +94,15 @@ static inline void i915_tasklet_set_func(struct i915_tasklet *t,
 	i915_tasklet_enable(t);
 }
 
+static inline void __i915_tasklet_run(const struct i915_tasklet *t)
+{
+	t->base.func(t->base.data);
+}
+
+static inline void i915_tasklet_run(const struct i915_tasklet *t)
+{
+	GEM_BUG_ON(!i915_tasklet_is_running(t));
+	__i915_tasklet_run(t);
+}
+
 #endif /* _I915_TASKLET_H_ */
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index a7afc976c3b9..f2ded1796523 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -754,14 +754,18 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 
 static void guc_dequeue(struct intel_engine_cs *engine)
 {
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 	bool submit;
 
 	local_irq_save(flags);
 
-	spin_lock(&engine->timeline.lock);
+	if (!i915_tasklet_is_direct_submit(&engine->execlists.tasklet))
+		spin_lock(&engine->timeline.lock);
+
 	submit = __guc_dequeue(engine);
-	spin_unlock(&engine->timeline.lock);
+
+	if (!i915_tasklet_is_direct_submit(&engine->execlists.tasklet))
+		spin_unlock(&engine->timeline.lock);
 
 	if (submit)
 		guc_submit(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0c3dad5c9709..70f1b5dd492a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -356,13 +356,15 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
 {
 	struct intel_engine_cs *engine =
 		container_of(execlists, typeof(*engine), execlists);
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
+		spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	__unwind_incomplete_requests(engine);
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
+		spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
 static inline void
@@ -601,6 +603,8 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 		 */
 		GEM_BUG_ON(!execlists_is_active(execlists,
 						EXECLISTS_ACTIVE_USER));
+		GEM_BUG_ON(execlists_is_active(execlists,
+					       EXECLISTS_ACTIVE_PREEMPT));
 		GEM_BUG_ON(!port_count(&port[0]));
 		if (port_count(&port[0]) > 1)
 			return false;
@@ -757,12 +761,16 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 	bool submit;
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
+		spin_lock_irqsave(&engine->timeline.lock, flags);
+
 	submit = __execlists_dequeue(engine);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
+	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
+		spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
 	if (submit)
 		execlists_submit_ports(engine);
@@ -1163,16 +1171,52 @@ static void queue_request(struct intel_engine_cs *engine,
 		      &lookup_priolist(engine, prio)->requests);
 }
 
-static void __submit_queue(struct intel_engine_cs *engine, int prio)
+static void __update_queue(struct intel_engine_cs *engine, int prio)
 {
 	engine->execlists.queue_priority = prio;
+}
+
+static void __schedule_queue(struct intel_engine_cs *engine)
+{
 	i915_tasklet_schedule(&engine->execlists.tasklet);
 }
 
+static bool __direct_submit(struct intel_engine_execlists *const execlists)
+{
+	struct i915_tasklet * const t = &execlists->tasklet;
+
+	if (!tasklet_trylock(&t->base))
+		return false;
+
+	t->flags |= I915_TASKLET_DIRECT_SUBMIT;
+	i915_tasklet_run(t);
+	t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
+
+	tasklet_unlock(&t->base);
+	return true;
+}
+
+static void __submit_queue(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+
+	GEM_BUG_ON(!engine->i915->gt.awake);
+
+	/* If inside GPU reset, the tasklet will be queued later. */
+	if (!i915_tasklet_is_enabled(&execlists->tasklet))
+		return;
+
+	/* Directly submit the first request to reduce the initial latency */
+	if (port_isset(execlists->port) || !__direct_submit(execlists))
+		__schedule_queue(engine);
+}
+
 static void submit_queue(struct intel_engine_cs *engine, int prio)
 {
-	if (prio > engine->execlists.queue_priority)
-		__submit_queue(engine, prio);
+	if (prio > engine->execlists.queue_priority) {
+		__update_queue(engine, prio);
+		__submit_queue(engine);
+	}
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1184,10 +1228,9 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	queue_request(engine, &request->sched, rq_prio(request));
-	submit_queue(engine, rq_prio(request));
-
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
+	submit_queue(engine, rq_prio(request));
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
@@ -1315,8 +1358,10 @@ static void execlists_schedule(struct i915_request *request,
 		}
 
 		if (prio > engine->execlists.queue_priority &&
-		    i915_sw_fence_done(&sched_to_request(node)->submit))
-			__submit_queue(engine, prio);
+		    i915_sw_fence_done(&sched_to_request(node)->submit)) {
+			__update_queue(engine, prio);
+			__schedule_queue(engine);
+		}
 	}
 
 	spin_unlock_irq(&engine->timeline.lock);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/10] drm/i915/execlists: Direct submission from irq handler
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (4 preceding siblings ...)
  2018-05-14  9:37 ` [PATCH 05/10] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14  9:37 ` [PATCH 07/10] drm/i915: Rearrange gen8_cs_irq_handler Chris Wilson
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

Continuing the theme of bypassing ksoftirqd latency, also first try to
directly submit from the CS interrupt handler to clear the ELSP and
queue the next.

In the past, we have been hesitant to do this as the context switch
processing has been quite heavy, requiring forcewaked mmio. However, as
we now can read the GPU state from the cacheable HWSP, it is relatively
cheap!

v2: Explain why we test_bit(IRQ_EXECLIST) after doing notify_ring (it's
because the notify_ring() may itself trigger direct submission clearing
the bit)

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c             | 21 ++++++++++++++-------
 drivers/gpu/drm/i915/i915_tasklet.h         | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc_submission.c |  2 ++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f8aff5a5aa83..e1b3a7575fe7 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1465,19 +1465,26 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	bool tasklet = false;
 
-	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-		if (READ_ONCE(engine->execlists.active))
-			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-						    &engine->irq_posted);
-	}
+	if (iir & GT_CONTEXT_SWITCH_INTERRUPT && READ_ONCE(execlists->active))
+		tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
+					    &engine->irq_posted);
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
 		notify_ring(engine);
-		tasklet |= USES_GUC_SUBMISSION(engine->i915);
+		/*
+		 * notify_ring() may trigger direct submission onto this
+		 * engine, clearing the ENGINE_IRQ_EXECLIST bit. In that
+		 * case, we don't want to resubmit and so clear the tasklet
+		 * boolean. GuC never sets the ENGINE_IRQ_EXECLIST bit and
+		 * so when using the GuC this equates to an unconditional
+		 * setting of tasklet to true.
+		 */
+		if (!test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+			tasklet = USES_GUC_SUBMISSION(engine->i915);
 	}
 
 	if (tasklet)
-		i915_tasklet_schedule(&execlists->tasklet);
+		i915_tasklet(&execlists->tasklet);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
index 6f3b2fedc65b..0aff77caa346 100644
--- a/drivers/gpu/drm/i915/i915_tasklet.h
+++ b/drivers/gpu/drm/i915/i915_tasklet.h
@@ -105,4 +105,25 @@ static inline void i915_tasklet_run(const struct i915_tasklet *t)
 	__i915_tasklet_run(t);
 }
 
+static inline bool i915_tasklet_try(struct i915_tasklet *t)
+{
+	if (unlikely(!tasklet_trylock(&t->base)))
+		return false;
+
+	if (i915_tasklet_is_enabled(t))
+		i915_tasklet_run(t);
+
+	tasklet_unlock(&t->base);
+	return true;
+}
+
+static inline void i915_tasklet(struct i915_tasklet *t)
+{
+	if (!i915_tasklet_is_enabled(t)) /* GPU reset active */
+		return;
+
+	if (!i915_tasklet_try(t))
+		i915_tasklet_schedule(t);
+}
+
 #endif /* _I915_TASKLET_H_ */
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index f2ded1796523..4e09abf7e206 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -780,6 +780,8 @@ static void guc_submission_tasklet(unsigned long data)
 	struct execlist_port *port = execlists->port;
 	struct i915_request *rq;
 
+	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+
 	rq = port_request(port);
 	while (rq && i915_request_completed(rq)) {
 		trace_i915_request_out(rq);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/10] drm/i915: Rearrange gen8_cs_irq_handler
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (5 preceding siblings ...)
  2018-05-14  9:37 ` [PATCH 06/10] drm/i915/execlists: Direct submission from irq handler Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14  9:37 ` [PATCH 08/10] drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler Chris Wilson
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

After using direct submission from the irq handler, it is very likely
that the ENGINE_IRQ_EXECLISTS bit is unset and so we do not need to
test it first. It also follows that we then want to do a direct
submission evertime the irq_posted bit is set, and can use that as our
boolean rather than a separate local.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e1b3a7575fe7..034c603867e6 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1459,31 +1459,21 @@ static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 		ivybridge_parity_error_irq_handler(dev_priv, gt_iir);
 }
 
-static void
-gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
+static void gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	bool tasklet = false;
 
 	if (iir & GT_CONTEXT_SWITCH_INTERRUPT && READ_ONCE(execlists->active))
-		tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-					    &engine->irq_posted);
+		set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
+		if (USES_GUC_SUBMISSION(engine->i915))
+			set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+
 		notify_ring(engine);
-		/*
-		 * notify_ring() may trigger direct submission onto this
-		 * engine, clearing the ENGINE_IRQ_EXECLIST bit. In that
-		 * case, we don't want to resubmit and so clear the tasklet
-		 * boolean. GuC never sets the ENGINE_IRQ_EXECLIST bit and
-		 * so when using the GuC this equates to an unconditional
-		 * setting of tasklet to true.
-		 */
-		if (!test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
-			tasklet = USES_GUC_SUBMISSION(engine->i915);
 	}
 
-	if (tasklet)
+	if (engine && test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
 		i915_tasklet(&execlists->tasklet);
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/10] drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (6 preceding siblings ...)
  2018-05-14  9:37 ` [PATCH 07/10] drm/i915: Rearrange gen8_cs_irq_handler Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14 10:27   ` Tvrtko Ursulin
  2018-05-14  9:37 ` [PATCH 09/10] drm/i915: Speed up idle detection by kicking the tasklets Chris Wilson
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

Store whether or not we need to kick the guc's execlists emulation on
the engine itself to avoid chasing the device info.

gen8_cs_irq_handler                          512     428     -84

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c             | 2 +-
 drivers/gpu/drm/i915/intel_guc_submission.c | 1 +
 drivers/gpu/drm/i915/intel_lrc.c            | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h     | 7 +++++++
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 034c603867e6..f8c39e5b5cc6 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1467,7 +1467,7 @@ static void gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 		set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
-		if (USES_GUC_SUBMISSION(engine->i915))
+		if (intel_engine_uses_guc(engine))
 			set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 
 		notify_ring(engine);
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 4e09abf7e206..eb2a25d84a82 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -1277,6 +1277,7 @@ int intel_guc_submission_enable(struct intel_guc *guc)
 		engine->unpark = guc_submission_unpark;
 
 		engine->flags &= ~I915_ENGINE_SUPPORTS_STATS;
+		engine->flags |= I915_ENGINE_USES_GUC;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 70f1b5dd492a..90b534098d00 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2264,6 +2264,7 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->park = NULL;
 	engine->unpark = NULL;
 
+	engine->flags &= ~I915_ENGINE_USES_GUC;
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (engine->i915->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f6ba354faf89..90d1ca2ca232 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -570,6 +570,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
+#define I915_ENGINE_USES_GUC         BIT(3)
 	unsigned int flags;
 
 	/*
@@ -647,6 +648,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
 }
 
+static inline bool
+intel_engine_uses_guc(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_USES_GUC;
+}
+
 static inline bool __execlists_need_preempt(int prio, int last)
 {
 	return prio > max(0, last);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/10] drm/i915: Speed up idle detection by kicking the tasklets
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (7 preceding siblings ...)
  2018-05-14  9:37 ` [PATCH 08/10] drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14  9:37 ` [PATCH 10/10] drm/i915: Detect if we missed kicking the execlists tasklet Chris Wilson
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

We rely on ksoftirqd to run in a timely fashion in order to drain the
execlists queue. Quite frequently, it does not. In some cases we may see
latencies of over 200ms triggering our idle timeouts and forcing us to
declare the driver wedged!

Thus we can speed up idle detection by bypassing ksoftirqd in these
cases and flush our tasklet to confirm if we are indeed still waiting
for the ELSP to drain.

v2: Put the execlists.first check back; it is required for handling
reset!
v3: Follow Mika's suggestion to try and limit kicking the tasklet to
only when we expect it to make a difference, i.e. in catch up after a CS
interrupt, and not just execute it everytime as that is likely just to
cover over our own bugs.

References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 70595e995c45..f7d67b96220b 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -933,11 +933,20 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 	if (I915_SELFTEST_ONLY(engine->breadcrumbs.mock))
 		return true;
 
+	/*
+	 * ksoftirqd has notorious latency that may cause us to
+	 * timeout while waiting for the engine to idle as we wait for
+	 * ksoftirqd to run the execlists tasklet to drain the ELSP.
+	 * If we are expecting a context switch from the GPU, check now.
+	 */
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+		i915_tasklet_try(&engine->execlists.tasklet);
+
 	/* Waiting to drain ELSP? */
 	if (READ_ONCE(engine->execlists.active))
 		return false;
 
-	/* ELSP is empty, but there are ready requests? */
+	/* ELSP is empty, but there are ready requests? E.g. after reset */
 	if (READ_ONCE(engine->execlists.first))
 		return false;
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/10] drm/i915: Detect if we missed kicking the execlists tasklet
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (8 preceding siblings ...)
  2018-05-14  9:37 ` [PATCH 09/10] drm/i915: Speed up idle detection by kicking the tasklets Chris Wilson
@ 2018-05-14  9:37 ` Chris Wilson
  2018-05-14  9:42 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/10] drm/i915: Mark up nested spinlocks Patchwork
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14  9:37 UTC (permalink / raw)
  To: intel-gfx

If inside hangcheck we see that the engine has paused, but there is an
execlists interrupt still pending, we know that the tasklet did not
fire. Dump the GEM trace along with the current engine state, and kick
the tasklet to recovery without having to go through a GPU reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_hangcheck.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index d47e346bd49e..4b0745d94ab8 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -267,6 +267,21 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 		}
 	}
 
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
+		if (GEM_SHOW_DEBUG()) {
+			struct drm_printer p = drm_debug_printer("hangcheck");
+
+			GEM_TRACE_DUMP();
+			intel_engine_dump(engine, &p,
+					  "%s CS stuck\n", engine->name);
+		}
+
+		if (i915_tasklet_try(&engine->execlists.tasklet))
+			return ENGINE_WAIT_KICK;
+
+		return ENGINE_WAIT;
+	}
+
 	return ENGINE_DEAD;
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/10] drm/i915: Mark up nested spinlocks
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (9 preceding siblings ...)
  2018-05-14  9:37 ` [PATCH 10/10] drm/i915: Detect if we missed kicking the execlists tasklet Chris Wilson
@ 2018-05-14  9:42 ` Patchwork
  2018-05-14  9:45 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2018-05-14  9:42 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/10] drm/i915: Mark up nested spinlocks
URL   : https://patchwork.freedesktop.org/series/43123/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
c01ec3ff8d8e drm/i915: Mark up nested spinlocks
677445fc5fa2 drm/i915: Remove tasklet flush before disable
1fc1419f16a1 drm/i915: Wrap tasklet_struct for abuse
-:49: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#49: 
new file mode 100644

-:54: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#54: FILE: drivers/gpu/drm/i915/i915_tasklet.h:1:
+/*

total: 0 errors, 2 warnings, 0 checks, 221 lines checked
96fc7eacdd83 drm/i915: Only sync tasklets once for recursive reset preparation
2407b8567c91 drm/i915/execlists: Direct submit onto idle engines
-:86: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#86: FILE: drivers/gpu/drm/i915/intel_guc_submission.c:757:
+	unsigned long uninitialized_var(flags);

-:112: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#112: FILE: drivers/gpu/drm/i915/intel_lrc.c:359:
+	unsigned long uninitialized_var(flags);

-:140: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#140: FILE: drivers/gpu/drm/i915/intel_lrc.c:764:
+	unsigned long uninitialized_var(flags);

total: 0 errors, 3 warnings, 0 checks, 192 lines checked
0ccb45ec4488 drm/i915/execlists: Direct submission from irq handler
d7c2e84cc0e1 drm/i915: Rearrange gen8_cs_irq_handler
c4d4c78bc081 drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler
73c47ae8e31f drm/i915: Speed up idle detection by kicking the tasklets
10ad08a8e244 drm/i915: Detect if we missed kicking the execlists tasklet

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/10] drm/i915: Mark up nested spinlocks
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (10 preceding siblings ...)
  2018-05-14  9:42 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/10] drm/i915: Mark up nested spinlocks Patchwork
@ 2018-05-14  9:45 ` Patchwork
  2018-05-14  9:59 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2018-05-14  9:45 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/10] drm/i915: Mark up nested spinlocks
URL   : https://patchwork.freedesktop.org/series/43123/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Mark up nested spinlocks
Okay!

Commit: drm/i915: Remove tasklet flush before disable
Okay!

Commit: drm/i915: Wrap tasklet_struct for abuse
Okay!

Commit: drm/i915: Only sync tasklets once for recursive reset preparation
Okay!

Commit: drm/i915/execlists: Direct submit onto idle engines
+drivers/gpu/drm/i915/intel_guc_submission.c:768:28: warning: context imbalance in 'guc_dequeue' - unexpected unlock
+drivers/gpu/drm/i915/intel_lrc.c:367:39: warning: context imbalance in 'execlists_unwind_incomplete_requests' - unexpected unlock
+drivers/gpu/drm/i915/intel_lrc.c:773:39: warning: context imbalance in 'execlists_dequeue' - unexpected unlock

Commit: drm/i915/execlists: Direct submission from irq handler
Okay!

Commit: drm/i915: Rearrange gen8_cs_irq_handler
Okay!

Commit: drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler
-O:drivers/gpu/drm/i915/intel_ringbuffer.h:652:23: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/intel_ringbuffer.h:652:23: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/intel_ringbuffer.h:659:23: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/intel_ringbuffer.h:659:23: warning: expression using sizeof(void)

Commit: drm/i915: Speed up idle detection by kicking the tasklets
Okay!

Commit: drm/i915: Detect if we missed kicking the execlists tasklet
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/10] drm/i915: Mark up nested spinlocks
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (11 preceding siblings ...)
  2018-05-14  9:45 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-05-14  9:59 ` Patchwork
  2018-05-14 10:11 ` Direct execlists submission Tvrtko Ursulin
  2018-05-14 12:45 ` ✓ Fi.CI.IGT: success for series starting with [01/10] drm/i915: Mark up nested spinlocks Patchwork
  14 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2018-05-14  9:59 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/10] drm/i915: Mark up nested spinlocks
URL   : https://patchwork.freedesktop.org/series/43123/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4175 -> Patchwork_8993 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_8993 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_8993, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43123/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_8993:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_8993 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_module_reload@basic-reload:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106248)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-skl-guc:         PASS -> FAIL (fdo#103191, fdo#104724)

    
    ==== Possible fixes ====

    igt@gem_exec_flush@basic-uc-set-default:
      fi-byt-j1900:       INCOMPLETE (fdo#102657) -> PASS

    
  fdo#102657 https://bugs.freedesktop.org/show_bug.cgi?id=102657
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248


== Participating hosts (41 -> 36) ==

  Missing    (5): fi-ctg-p8600 fi-byt-squawks fi-ilk-m540 fi-bxt-dsi fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4175 -> Patchwork_8993

  CI_DRM_4175: a57366b6029ac86436ad36bbf8b9a31549ef2905 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4476: 03a62cf055481f66b4f58e6228bc45f8ca454216 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8993: 10ad08a8e244b440e8cdda390ed2da29bbd021ea @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4476: 3ba0657bff4216d1ec7179935590261855f1651e @ git://anongit.freedesktop.org/piglit


== Linux commits ==

10ad08a8e244 drm/i915: Detect if we missed kicking the execlists tasklet
73c47ae8e31f drm/i915: Speed up idle detection by kicking the tasklets
c4d4c78bc081 drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler
d7c2e84cc0e1 drm/i915: Rearrange gen8_cs_irq_handler
0ccb45ec4488 drm/i915/execlists: Direct submission from irq handler
2407b8567c91 drm/i915/execlists: Direct submit onto idle engines
96fc7eacdd83 drm/i915: Only sync tasklets once for recursive reset preparation
1fc1419f16a1 drm/i915: Wrap tasklet_struct for abuse
677445fc5fa2 drm/i915: Remove tasklet flush before disable
c01ec3ff8d8e drm/i915: Mark up nested spinlocks

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8993/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 01/10] drm/i915: Mark up nested spinlocks
  2018-05-14  9:37 ` [PATCH 01/10] drm/i915: Mark up nested spinlocks Chris Wilson
@ 2018-05-14 10:06   ` Tvrtko Ursulin
  0 siblings, 0 replies; 21+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 10:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 14/05/2018 10:37, Chris Wilson wrote:
> When we process the outstanding requests upon banning a context, we need
> to acquire both the engine and the client's timeline, nesting the locks.
> This requires explicit markup as the two timelines are now of the same
> class, since commit a89d1f921c15 ("drm/i915: Split i915_gem_timeline into
> individual timelines").
> 
> Testcase: igt/gem_eio/banned
> Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 89bf5d67cb74..0a2070112b66 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3119,7 +3119,7 @@ static void engine_skip_context(struct i915_request *request)
>   	GEM_BUG_ON(timeline == &engine->timeline);
>   
>   	spin_lock_irqsave(&engine->timeline.lock, flags);
> -	spin_lock(&timeline->lock);
> +	spin_lock_nested(&timeline->lock, SINGLE_DEPTH_NESTING);
>   
>   	list_for_each_entry_continue(request, &engine->timeline.requests, link)
>   		if (request->ctx == hung_ctx)
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Direct execlists submission
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (12 preceding siblings ...)
  2018-05-14  9:59 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-14 10:11 ` Tvrtko Ursulin
  2018-05-14 10:25   ` Chris Wilson
  2018-05-14 12:45 ` ✓ Fi.CI.IGT: success for series starting with [01/10] drm/i915: Mark up nested spinlocks Patchwork
  14 siblings, 1 reply; 21+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 10:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 14/05/2018 10:37, Chris Wilson wrote:
> Continuing the discussion with the latest refactorings, however I ran
> some tests to measure the impact on system (!i915) latency,
> using igt/benchmarks/gem_syslatency -t 120
> 
> drm-tip:
> 	latency mean=1.211us max=10us (no load)
> 	latency mean=2.611us max=83us (i915)
> 
>          latency mean=1.720us max=833us (no load, bg writeout)
>          latency mean=3.294us max=607us (i915, bg writeout)
> 
> this series:
>          latency mean=1.280us max=15us (no load)
>          latency mean=9.688us max=1271us (i915)
> 
>          latency mean=1.712us max=1026us (no load, bg writeout)
>          latency mean=14.347us max=489850us (i915, bg writeout)
> 
> That certainly takes the shine off directly using the tasklet for
> submission from the irq handler. Being selfish, I still think we can't
> allow the GPU to stall waiting for ksoftirqd, but at the same time we
> need to solve the latency issues introduced elsewhere.

You dropped direct submit on idle from this incarnation, why?

Before the above data my concern was that i915_tasklet in its current 
form does not buy us anything and adds boilerplate code. I was 
suggesting two alternatives, either no i915_tasklet at all, or 
different, more functional and self-contained version which you said 
wouldn't work with some future code.

But now with this data it looks like a quite significant regression even 
if it fixes the rthog test case. So I don't know where this leaves us. :I

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Direct execlists submission
  2018-05-14 10:11 ` Direct execlists submission Tvrtko Ursulin
@ 2018-05-14 10:25   ` Chris Wilson
  2018-05-14 14:54     ` Chris Wilson
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Wilson @ 2018-05-14 10:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-14 11:11:54)
> 
> On 14/05/2018 10:37, Chris Wilson wrote:
> > Continuing the discussion with the latest refactorings, however I ran
> > some tests to measure the impact on system (!i915) latency,
> > using igt/benchmarks/gem_syslatency -t 120
> > 
> > drm-tip:
> >       latency mean=1.211us max=10us (no load)
> >       latency mean=2.611us max=83us (i915)
> > 
> >          latency mean=1.720us max=833us (no load, bg writeout)
> >          latency mean=3.294us max=607us (i915, bg writeout)
> > 
> > this series:
> >          latency mean=1.280us max=15us (no load)
> >          latency mean=9.688us max=1271us (i915)
> > 
> >          latency mean=1.712us max=1026us (no load, bg writeout)
> >          latency mean=14.347us max=489850us (i915, bg writeout)
> > 
> > That certainly takes the shine off directly using the tasklet for
> > submission from the irq handler. Being selfish, I still think we can't
> > allow the GPU to stall waiting for ksoftirqd, but at the same time we
> > need to solve the latency issues introduced elsewhere.
> 
> You dropped direct submit on idle from this incarnation, why?

It's still there, right? I just haven't made any changes towards making
it more generic.
 
> Before the above data my concern was that i915_tasklet in its current 
> form does not buy us anything and adds boilerplate code. I was 
> suggesting two alternatives, either no i915_tasklet at all, or 
> different, more functional and self-contained version which you said 
> wouldn't work with some future code.

We need struct tasklet_struct, I don't think we can easily replace that
locally. (And you are cc'ed on the code that truly abuses tasklet, where
we mix preemption and reset from timers.)

> But now with this data it looks like a quite significant regression even 
> if it fixes the rthog test case. So I don't know where this leaves us. :I

With a challenge to solve. That nasty part is that the timings are still
so small that it's in the tail of the perf profile for this, so we'll
need to look at the latency tracers instead. My first instinct is that
it is engine->timeline.lock contention. And that does seem like it is
doing the trick...
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 08/10] drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler
  2018-05-14  9:37 ` [PATCH 08/10] drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler Chris Wilson
@ 2018-05-14 10:27   ` Tvrtko Ursulin
  2018-05-14 11:45     ` Chris Wilson
  0 siblings, 1 reply; 21+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 10:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 14/05/2018 10:37, Chris Wilson wrote:
> Store whether or not we need to kick the guc's execlists emulation on
> the engine itself to avoid chasing the device info.
> 
> gen8_cs_irq_handler                          512     428     -84

Impressive, or not, depends whether you look at the saving or code 
generation.

Hm, actually, is it just GEM_BUG_ON in intel_uc_is_using_guc_submission?

But blimey, didn't we pencil in long time ago to stop using modparams at 
runtime in favour of caching the values? This one looks like one to 
confuse the driver easily if fiddled with at runtime.

Regards,

Tvrtko

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_irq.c             | 2 +-
>   drivers/gpu/drm/i915/intel_guc_submission.c | 1 +
>   drivers/gpu/drm/i915/intel_lrc.c            | 1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h     | 7 +++++++
>   4 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 034c603867e6..f8c39e5b5cc6 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1467,7 +1467,7 @@ static void gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>   		set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>   
>   	if (iir & GT_RENDER_USER_INTERRUPT) {
> -		if (USES_GUC_SUBMISSION(engine->i915))
> +		if (intel_engine_uses_guc(engine))
>   			set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>   
>   		notify_ring(engine);
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 4e09abf7e206..eb2a25d84a82 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -1277,6 +1277,7 @@ int intel_guc_submission_enable(struct intel_guc *guc)
>   		engine->unpark = guc_submission_unpark;
>   
>   		engine->flags &= ~I915_ENGINE_SUPPORTS_STATS;
> +		engine->flags |= I915_ENGINE_USES_GUC;
>   	}
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 70f1b5dd492a..90b534098d00 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2264,6 +2264,7 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
>   	engine->park = NULL;
>   	engine->unpark = NULL;
>   
> +	engine->flags &= ~I915_ENGINE_USES_GUC;
>   	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
>   	if (engine->i915->preempt_context)
>   		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index f6ba354faf89..90d1ca2ca232 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -570,6 +570,7 @@ struct intel_engine_cs {
>   #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
>   #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
>   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> +#define I915_ENGINE_USES_GUC         BIT(3)
>   	unsigned int flags;
>   
>   	/*
> @@ -647,6 +648,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
>   }
>   
> +static inline bool
> +intel_engine_uses_guc(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_USES_GUC;
> +}
> +
>   static inline bool __execlists_need_preempt(int prio, int last)
>   {
>   	return prio > max(0, last);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 08/10] drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler
  2018-05-14 10:27   ` Tvrtko Ursulin
@ 2018-05-14 11:45     ` Chris Wilson
  0 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14 11:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-14 11:27:56)
> 
> On 14/05/2018 10:37, Chris Wilson wrote:
> > Store whether or not we need to kick the guc's execlists emulation on
> > the engine itself to avoid chasing the device info.
> > 
> > gen8_cs_irq_handler                          512     428     -84
> 
> Impressive, or not, depends whether you look at the saving or code 
> generation.

The code generation also looked better, at least in my eyes. Fewer
chained movs and simpler test?

> Hm, actually, is it just GEM_BUG_ON in intel_uc_is_using_guc_submission?
> 
> But blimey, didn't we pencil in long time ago to stop using modparams at 
> runtime in favour of caching the values? This one looks like one to 
> confuse the driver easily if fiddled with at runtime.

Yes, this is one where once upon a time we had an engine->guc_client to
inspect that we traded in for a common i915->guc_client, but that will
still require a pointer dance.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [01/10] drm/i915: Mark up nested spinlocks
  2018-05-14  9:37 Direct execlists submission Chris Wilson
                   ` (13 preceding siblings ...)
  2018-05-14 10:11 ` Direct execlists submission Tvrtko Ursulin
@ 2018-05-14 12:45 ` Patchwork
  14 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2018-05-14 12:45 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/10] drm/i915: Mark up nested spinlocks
URL   : https://patchwork.freedesktop.org/series/43123/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4175_full -> Patchwork_8993_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_8993_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_8993_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43123/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_8993_full:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_schedule@deep-vebox:
      shard-kbl:          PASS -> SKIP +1

    igt@kms_frontbuffer_tracking@fbc-tilingchange:
      shard-snb:          SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_8993_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_cursor_legacy@2x-long-cursor-vs-flip-atomic:
      shard-hsw:          PASS -> FAIL (fdo#105767)

    igt@kms_flip@2x-flip-vs-absolute-wf_vblank-interruptible:
      shard-glk:          PASS -> FAIL (fdo#105312)

    igt@kms_flip@2x-flip-vs-expired-vblank:
      shard-glk:          PASS -> FAIL (fdo#105363)

    igt@kms_flip@2x-modeset-vs-vblank-race-interruptible:
      shard-glk:          PASS -> FAIL (fdo#103060)

    igt@kms_flip@plain-flip-fb-recreate-interruptible:
      shard-glk:          PASS -> FAIL (fdo#100368) +3

    igt@kms_flip_tiling@flip-to-y-tiled:
      shard-glk:          PASS -> FAIL (fdo#104724)

    igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt:
      shard-glk:          PASS -> FAIL (fdo#104724, fdo#103167)

    igt@kms_setmode@basic:
      shard-apl:          NOTRUN -> FAIL (fdo#99912)

    igt@perf_pmu@enable-race-vcs0:
      shard-glk:          PASS -> INCOMPLETE (fdo#103359, k.org#198133)
      shard-apl:          PASS -> INCOMPLETE (fdo#103927)

    
    ==== Possible fixes ====

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking:
      shard-glk:          FAIL -> PASS

    igt@kms_busy@basic-flip-c:
      shard-apl:          FAIL (fdo#103182) -> PASS

    igt@kms_flip@2x-absolute-wf_vblank-interruptible:
      shard-glk:          FAIL (fdo#106087) -> PASS

    igt@kms_flip@2x-dpms-vs-vblank-race-interruptible:
      shard-hsw:          FAIL (fdo#103060) -> PASS

    igt@kms_flip@2x-flip-vs-expired-vblank-interruptible:
      shard-hsw:          FAIL (fdo#105707) -> PASS

    igt@kms_flip@flip-vs-wf_vblank-interruptible:
      shard-glk:          FAIL (fdo#105312) -> PASS

    igt@kms_flip@plain-flip-ts-check:
      shard-glk:          FAIL (fdo#100368) -> PASS +1

    igt@kms_plane_lowres@pipe-b-tiling-y:
      shard-kbl:          DMESG-WARN (fdo#103558, fdo#105602) -> PASS +10

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103182 https://bugs.freedesktop.org/show_bug.cgi?id=103182
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105312 https://bugs.freedesktop.org/show_bug.cgi?id=105312
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#105707 https://bugs.freedesktop.org/show_bug.cgi?id=105707
  fdo#105767 https://bugs.freedesktop.org/show_bug.cgi?id=105767
  fdo#106087 https://bugs.freedesktop.org/show_bug.cgi?id=106087
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4175 -> Patchwork_8993

  CI_DRM_4175: a57366b6029ac86436ad36bbf8b9a31549ef2905 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4476: 03a62cf055481f66b4f58e6228bc45f8ca454216 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8993: 10ad08a8e244b440e8cdda390ed2da29bbd021ea @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4476: 3ba0657bff4216d1ec7179935590261855f1651e @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8993/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Direct execlists submission
  2018-05-14 10:25   ` Chris Wilson
@ 2018-05-14 14:54     ` Chris Wilson
  0 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2018-05-14 14:54 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Chris Wilson (2018-05-14 11:25:45)
> Quoting Tvrtko Ursulin (2018-05-14 11:11:54)
> > 
> > On 14/05/2018 10:37, Chris Wilson wrote:
> > > Continuing the discussion with the latest refactorings, however I ran
> > > some tests to measure the impact on system (!i915) latency,
> > > using igt/benchmarks/gem_syslatency -t 120
> > > 
> > > drm-tip:
> > >       latency mean=1.211us max=10us (no load)
> > >       latency mean=2.611us max=83us (i915)
> > > 
> > >          latency mean=1.720us max=833us (no load, bg writeout)
> > >          latency mean=3.294us max=607us (i915, bg writeout)
> > > 
> > > this series:
> > >          latency mean=1.280us max=15us (no load)
> > >          latency mean=9.688us max=1271us (i915)
> > > 
> > >          latency mean=1.712us max=1026us (no load, bg writeout)
> > >          latency mean=14.347us max=489850us (i915, bg writeout)
> > > 
> > > That certainly takes the shine off directly using the tasklet for
> > > submission from the irq handler. Being selfish, I still think we can't
> > > allow the GPU to stall waiting for ksoftirqd, but at the same time we
> > > need to solve the latency issues introduced elsewhere.

> > But now with this data it looks like a quite significant regression even 
> > if it fixes the rthog test case. So I don't know where this leaves us. :I

The ginormous writeout latency is a bit inconsistent. What remains
consistent is that with direct submission, latency with i915 loaded
remains around 10us versus 1us unloaded. The difference for direct
submission is that the latency and i915 throughput remains the same even
with bg writeout (with ksoftirqd there is a very noticeable fall off).

I've still no clue as what causes the latency to spike so badly with bg
write out.

As for the 10us latency, I think that is just i915 hogging the cpu to
handle the flow onto GPU, so the "fairness" issue that ksoftirqd seeks
to prevent (rather than a side-effect of irqoff, as the irqoff periods
do not change that much due to execlists_dequeue being the brunt of the
work).

I no longer think the sky is falling, or that these bad results are
anything more than something we can and should improve. I think the
protection from ksoftirqd is definitely more important for us.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-05-14 14:54 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-14  9:37 Direct execlists submission Chris Wilson
2018-05-14  9:37 ` [PATCH 01/10] drm/i915: Mark up nested spinlocks Chris Wilson
2018-05-14 10:06   ` Tvrtko Ursulin
2018-05-14  9:37 ` [PATCH 02/10] drm/i915: Remove tasklet flush before disable Chris Wilson
2018-05-14  9:37 ` [PATCH 03/10] drm/i915: Wrap tasklet_struct for abuse Chris Wilson
2018-05-14  9:37 ` [PATCH 04/10] drm/i915: Only sync tasklets once for recursive reset preparation Chris Wilson
2018-05-14  9:37 ` [PATCH 05/10] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
2018-05-14  9:37 ` [PATCH 06/10] drm/i915/execlists: Direct submission from irq handler Chris Wilson
2018-05-14  9:37 ` [PATCH 07/10] drm/i915: Rearrange gen8_cs_irq_handler Chris Wilson
2018-05-14  9:37 ` [PATCH 08/10] drm/i915: Remove USES_GUC_SUBMISSION() pointer chasing from gen8_cs_irq_handler Chris Wilson
2018-05-14 10:27   ` Tvrtko Ursulin
2018-05-14 11:45     ` Chris Wilson
2018-05-14  9:37 ` [PATCH 09/10] drm/i915: Speed up idle detection by kicking the tasklets Chris Wilson
2018-05-14  9:37 ` [PATCH 10/10] drm/i915: Detect if we missed kicking the execlists tasklet Chris Wilson
2018-05-14  9:42 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/10] drm/i915: Mark up nested spinlocks Patchwork
2018-05-14  9:45 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-05-14  9:59 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-14 10:11 ` Direct execlists submission Tvrtko Ursulin
2018-05-14 10:25   ` Chris Wilson
2018-05-14 14:54     ` Chris Wilson
2018-05-14 12:45 ` ✓ Fi.CI.IGT: success for series starting with [01/10] drm/i915: Mark up nested spinlocks Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.