All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] drm/i915: Remove tasklet flush before disable
@ 2018-05-09 14:27 Chris Wilson
  2018-05-09 14:27 ` [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse Chris Wilson
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Chris Wilson @ 2018-05-09 14:27 UTC (permalink / raw)
  To: intel-gfx

The idea was to try and let the existing tasklet run to completion
before we began the reset, but it involves a racy check against anything
else that tries to run the tasklet. Rather than acknowledge and ignore
the race, let it be and don't try and be too clever.

The tasklet will resume execution after reset (after spinning a bit
during reset), but before we allow it to resume we will have cleared all
the pending state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 89bf5d67cb74..63c96c2b8fcf 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3035,16 +3035,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 * calling engine->init_hw() and also writing the ELSP.
 	 * Turning off the execlists->tasklet until the reset is over
 	 * prevents the race.
-	 *
-	 * Note that this needs to be a single atomic operation on the
-	 * tasklet (flush existing tasks, prevent new tasks) to prevent
-	 * a race between reset and set-wedged. It is not, so we do the best
-	 * we can atm and make sure we don't lock the machine up in the more
-	 * common case of recursively being called from set-wedged from inside
-	 * i915_reset.
 	 */
-	if (!atomic_read(&engine->execlists.tasklet.count))
-		tasklet_kill(&engine->execlists.tasklet);
 	tasklet_disable(&engine->execlists.tasklet);
 
 	/*
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse
  2018-05-09 14:27 [PATCH 1/5] drm/i915: Remove tasklet flush before disable Chris Wilson
@ 2018-05-09 14:27 ` Chris Wilson
  2018-05-09 14:58   ` Chris Wilson
  2018-05-10 15:49   ` Tvrtko Ursulin
  2018-05-09 14:27 ` [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Chris Wilson @ 2018-05-09 14:27 UTC (permalink / raw)
  To: intel-gfx

In the next few patches, we want to abuse tasklet to avoid ksoftirqd
latency along critical paths. To make that abuse easily to swallow,
first coat the tasklet in a little syntactic sugar.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c             |  4 +-
 drivers/gpu/drm/i915/i915_irq.c             |  2 +-
 drivers/gpu/drm/i915/i915_tasklet.h         | 78 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_engine_cs.c      | 11 ++-
 drivers/gpu/drm/i915/intel_guc_submission.c |  8 +--
 drivers/gpu/drm/i915/intel_lrc.c            | 18 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  3 +-
 7 files changed, 102 insertions(+), 22 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_tasklet.h

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 63c96c2b8fcf..59e04387a27c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3036,7 +3036,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 * Turning off the execlists->tasklet until the reset is over
 	 * prevents the race.
 	 */
-	tasklet_disable(&engine->execlists.tasklet);
+	i915_tasklet_lock(&engine->execlists.tasklet);
 
 	/*
 	 * We're using worker to queue preemption requests from the tasklet in
@@ -3256,7 +3256,7 @@ void i915_gem_reset(struct drm_i915_private *dev_priv,
 
 void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
 {
-	tasklet_enable(&engine->execlists.tasklet);
+	i915_tasklet_unlock(&engine->execlists.tasklet);
 	kthread_unpark(engine->breadcrumbs.signaler);
 
 	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f9bc3aaa90d0..f8aff5a5aa83 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1477,7 +1477,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	}
 
 	if (tasklet)
-		tasklet_hi_schedule(&execlists->tasklet);
+		i915_tasklet_schedule(&execlists->tasklet);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
new file mode 100644
index 000000000000..42b002b88edb
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_tasklet.h
@@ -0,0 +1,78 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#ifndef _I915_TASKLET_H_
+#define _I915_TASKLET_H_
+
+#include <linux/atomic.h>
+#include <linux/interrupt.h>
+
+/**
+ * struct i915_tasklet - wrapper around tasklet_struct
+ *
+ * We want to abuse tasklets slightly, such as calling them directly. In some
+ * cases, this requires some auxiliary tracking so subclass the tasklet_struct
+ * so that we have a central place and helpers.
+ */
+struct i915_tasklet {
+	struct tasklet_struct base;
+};
+
+static inline void i915_tasklet_init(struct i915_tasklet *t,
+				     void (*func)(unsigned long),
+				     unsigned long data)
+{
+	tasklet_init(&t->base, func, data);
+}
+
+static inline bool i915_tasklet_is_scheduled(const struct i915_tasklet *t)
+{
+	return test_bit(TASKLET_STATE_SCHED, &t->base.state);
+}
+
+static inline bool i915_tasklet_is_running(const struct i915_tasklet *t)
+{
+	return test_bit(TASKLET_STATE_RUN, &t->base.state);
+}
+
+static inline bool i915_tasklet_is_enabled(const struct i915_tasklet *t)
+{
+	return likely(!atomic_read(&t->base.count));
+}
+
+static inline void i915_tasklet_schedule(struct i915_tasklet *t)
+{
+	tasklet_hi_schedule(&t->base);
+}
+
+static inline void i915_tasklet_flush(struct i915_tasklet *t)
+{
+	tasklet_kill(&t->base);
+}
+
+static inline void i915_tasklet_lock(struct i915_tasklet *t)
+{
+	tasklet_disable(&t->base);
+}
+
+static inline void i915_tasklet_unlock(struct i915_tasklet *t)
+{
+	tasklet_enable(&t->base);
+}
+
+static inline void i915_tasklet_set_func(struct i915_tasklet *t,
+					 void (*func)(unsigned long data),
+					 unsigned long data)
+{
+	i915_tasklet_lock(t);
+
+	t->base.func = func;
+	t->base.data = data;
+
+	i915_tasklet_unlock(t);
+}
+
+#endif /* _I915_TASKLET_H_ */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 70325e0824e3..3c8a0c3245f3 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1032,7 +1032,7 @@ void intel_engines_park(struct drm_i915_private *i915)
 	for_each_engine(engine, i915, id) {
 		/* Flush the residual irq tasklets first. */
 		intel_engine_disarm_breadcrumbs(engine);
-		tasklet_kill(&engine->execlists.tasklet);
+		i915_tasklet_flush(&engine->execlists.tasklet);
 
 		/*
 		 * We are committed now to parking the engines, make sure there
@@ -1249,9 +1249,8 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 			   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)),
 			   yesno(test_bit(ENGINE_IRQ_EXECLIST,
 					  &engine->irq_posted)),
-			   yesno(test_bit(TASKLET_STATE_SCHED,
-					  &engine->execlists.tasklet.state)),
-			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)));
+			   yesno(i915_tasklet_is_scheduled(&engine->execlists.tasklet)),
+			   enableddisabled(i915_tasklet_is_enabled(&engine->execlists.tasklet)));
 		if (read >= GEN8_CSB_ENTRIES)
 			read = 0;
 		if (write >= GEN8_CSB_ENTRIES)
@@ -1479,7 +1478,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 	if (!intel_engine_supports_stats(engine))
 		return -ENODEV;
 
-	tasklet_disable(&execlists->tasklet);
+	i915_tasklet_lock(&execlists->tasklet);
 	write_seqlock_irqsave(&engine->stats.lock, flags);
 
 	if (unlikely(engine->stats.enabled == ~0)) {
@@ -1505,7 +1504,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 
 unlock:
 	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-	tasklet_enable(&execlists->tasklet);
+	i915_tasklet_unlock(&execlists->tasklet);
 
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 2feb65096966..a7afc976c3b9 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -591,7 +591,7 @@ static void inject_preempt_context(struct work_struct *work)
 	if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) {
 		execlists_clear_active(&engine->execlists,
 				       EXECLISTS_ACTIVE_PREEMPT);
-		tasklet_schedule(&engine->execlists.tasklet);
+		i915_tasklet_schedule(&engine->execlists.tasklet);
 	}
 }
 
@@ -1263,10 +1263,10 @@ int intel_guc_submission_enable(struct intel_guc *guc)
 	guc_interrupts_capture(dev_priv);
 
 	for_each_engine(engine, dev_priv, id) {
-		struct intel_engine_execlists * const execlists =
-			&engine->execlists;
+		i915_tasklet_set_func(&engine->execlists.tasklet,
+				      guc_submission_tasklet,
+				      (unsigned long)engine);
 
-		execlists->tasklet.func = guc_submission_tasklet;
 		engine->park = guc_submission_park;
 		engine->unpark = guc_submission_unpark;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7f98dda3c929..539fa03d7600 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1165,7 +1165,7 @@ static void queue_request(struct intel_engine_cs *engine,
 static void __submit_queue(struct intel_engine_cs *engine, int prio)
 {
 	engine->execlists.queue_priority = prio;
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+	i915_tasklet_schedule(&engine->execlists.tasklet);
 }
 
 static void submit_queue(struct intel_engine_cs *engine, int prio)
@@ -1778,7 +1778,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
 
 	/* After a GPU reset, we may have requests to replay */
 	if (execlists->first)
-		tasklet_schedule(&execlists->tasklet);
+		i915_tasklet_schedule(&execlists->tasklet);
 
 	return 0;
 }
@@ -2182,9 +2182,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
 	 * Tasklet cannot be active at this point due intel_mark_active/idle
 	 * so this is just for documentation.
 	 */
-	if (WARN_ON(test_bit(TASKLET_STATE_SCHED,
-			     &engine->execlists.tasklet.state)))
-		tasklet_kill(&engine->execlists.tasklet);
+	if (WARN_ON(i915_tasklet_is_scheduled(&engine->execlists.tasklet)))
+		i915_tasklet_flush(&engine->execlists.tasklet);
 
 	dev_priv = engine->i915;
 
@@ -2209,7 +2208,10 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->submit_request = execlists_submit_request;
 	engine->cancel_requests = execlists_cancel_requests;
 	engine->schedule = execlists_schedule;
-	engine->execlists.tasklet.func = execlists_submission_tasklet;
+
+	i915_tasklet_set_func(&engine->execlists.tasklet,
+			      execlists_submission_tasklet,
+			      (unsigned long)engine);
 
 	engine->park = NULL;
 	engine->unpark = NULL;
@@ -2303,8 +2305,8 @@ logical_ring_setup(struct intel_engine_cs *engine)
 
 	engine->execlists.fw_domains = fw_domains;
 
-	tasklet_init(&engine->execlists.tasklet,
-		     execlists_submission_tasklet, (unsigned long)engine);
+	i915_tasklet_init(&engine->execlists.tasklet,
+			  execlists_submission_tasklet, (unsigned long)engine);
 
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 010750e8ee44..f6ba354faf89 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -11,6 +11,7 @@
 #include "i915_pmu.h"
 #include "i915_request.h"
 #include "i915_selftest.h"
+#include "i915_tasklet.h"
 #include "i915_timeline.h"
 #include "intel_gpu_commands.h"
 
@@ -202,7 +203,7 @@ struct intel_engine_execlists {
 	/**
 	 * @tasklet: softirq tasklet for bottom handler
 	 */
-	struct tasklet_struct tasklet;
+	struct i915_tasklet tasklet;
 
 	/**
 	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines
  2018-05-09 14:27 [PATCH 1/5] drm/i915: Remove tasklet flush before disable Chris Wilson
  2018-05-09 14:27 ` [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse Chris Wilson
@ 2018-05-09 14:27 ` Chris Wilson
  2018-05-10 16:09   ` Tvrtko Ursulin
  2018-05-09 14:28 ` [PATCH 4/5] drm/i915/execlists: Direct submission from irq handler Chris Wilson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2018-05-09 14:27 UTC (permalink / raw)
  To: intel-gfx

Bypass using the tasklet to submit the first request to HW, as the
tasklet may be deferred unto ksoftirqd and at a minimum will add in
excess of 10us (and maybe tens of milliseconds) to our execution
latency. This latency reduction is most notable when execution flows
between engines.

v2: Beware handling preemption completion from the direct submit path as
well.
v3: Make the abuse clear and track our extra state inside i915_tasklet.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
 drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
 drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
 3 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
index 42b002b88edb..99e2fa2241ba 100644
--- a/drivers/gpu/drm/i915/i915_tasklet.h
+++ b/drivers/gpu/drm/i915/i915_tasklet.h
@@ -8,8 +8,11 @@
 #define _I915_TASKLET_H_
 
 #include <linux/atomic.h>
+#include <linux/bitops.h>
 #include <linux/interrupt.h>
 
+#include "i915_gem.h"
+
 /**
  * struct i915_tasklet - wrapper around tasklet_struct
  *
@@ -19,6 +22,8 @@
  */
 struct i915_tasklet {
 	struct tasklet_struct base;
+	unsigned long flags;
+#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
 };
 
 static inline void i915_tasklet_init(struct i915_tasklet *t,
@@ -43,6 +48,14 @@ static inline bool i915_tasklet_is_enabled(const struct i915_tasklet *t)
 	return likely(!atomic_read(&t->base.count));
 }
 
+static inline bool i915_tasklet_is_direct_submit(const struct i915_tasklet *t)
+{
+	/* Only legal to be checked from inside the tasklet. */
+	GEM_BUG_ON(!i915_tasklet_is_running(t));
+
+	return t->flags & I915_TASKLET_DIRECT_SUBMIT;
+}
+
 static inline void i915_tasklet_schedule(struct i915_tasklet *t)
 {
 	tasklet_hi_schedule(&t->base);
@@ -75,4 +88,15 @@ static inline void i915_tasklet_set_func(struct i915_tasklet *t,
 	i915_tasklet_unlock(t);
 }
 
+static inline void __i915_tasklet_run(const struct i915_tasklet *t)
+{
+	t->base.func(t->base.data);
+}
+
+static inline void i915_tasklet_run(const struct i915_tasklet *t)
+{
+	GEM_BUG_ON(!i915_tasklet_is_running(t));
+	__i915_tasklet_run(t);
+}
+
 #endif /* _I915_TASKLET_H_ */
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index a7afc976c3b9..f2ded1796523 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -754,14 +754,18 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 
 static void guc_dequeue(struct intel_engine_cs *engine)
 {
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 	bool submit;
 
 	local_irq_save(flags);
 
-	spin_lock(&engine->timeline.lock);
+	if (!i915_tasklet_is_direct_submit(&engine->execlists.tasklet))
+		spin_lock(&engine->timeline.lock);
+
 	submit = __guc_dequeue(engine);
-	spin_unlock(&engine->timeline.lock);
+
+	if (!i915_tasklet_is_direct_submit(&engine->execlists.tasklet))
+		spin_unlock(&engine->timeline.lock);
 
 	if (submit)
 		guc_submit(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 539fa03d7600..09fded9d409f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -356,13 +356,15 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
 {
 	struct intel_engine_cs *engine =
 		container_of(execlists, typeof(*engine), execlists);
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
+		spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	__unwind_incomplete_requests(engine);
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
+		spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
 static inline void
@@ -601,6 +603,8 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 		 */
 		GEM_BUG_ON(!execlists_is_active(execlists,
 						EXECLISTS_ACTIVE_USER));
+		GEM_BUG_ON(execlists_is_active(execlists,
+					       EXECLISTS_ACTIVE_PREEMPT));
 		GEM_BUG_ON(!port_count(&port[0]));
 		if (port_count(&port[0]) > 1)
 			return false;
@@ -757,12 +761,16 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 	bool submit;
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
+		spin_lock_irqsave(&engine->timeline.lock, flags);
+
 	submit = __execlists_dequeue(engine);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
+	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
+		spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
 	if (submit)
 		execlists_submit_ports(engine);
@@ -1162,16 +1170,52 @@ static void queue_request(struct intel_engine_cs *engine,
 		      &lookup_priolist(engine, prio)->requests);
 }
 
-static void __submit_queue(struct intel_engine_cs *engine, int prio)
+static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
 {
 	engine->execlists.queue_priority = prio;
+}
+
+static void __schedule_queue(struct intel_engine_cs *engine)
+{
 	i915_tasklet_schedule(&engine->execlists.tasklet);
 }
 
+static bool __direct_submit(struct intel_engine_execlists *const execlists)
+{
+	struct i915_tasklet * const t = &execlists->tasklet;
+
+	if (!tasklet_trylock(&t->base))
+		return false;
+
+	t->flags |= I915_TASKLET_DIRECT_SUBMIT;
+	i915_tasklet_run(t);
+	t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
+
+	tasklet_unlock(&t->base);
+	return true;
+}
+
+static void __submit_queue(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+
+	GEM_BUG_ON(!engine->i915->gt.awake);
+
+	/* If inside GPU reset, the tasklet will be queued later. */
+	if (!i915_tasklet_is_enabled(&execlists->tasklet))
+		return;
+
+	/* Directly submit the first request to reduce the initial latency */
+	if (port_isset(execlists->port) || !__direct_submit(execlists))
+		__schedule_queue(engine);
+}
+
 static void submit_queue(struct intel_engine_cs *engine, int prio)
 {
-	if (prio > engine->execlists.queue_priority)
-		__submit_queue(engine, prio);
+	if (prio > engine->execlists.queue_priority) {
+		__wakeup_queue(engine, prio);
+		__submit_queue(engine);
+	}
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1183,10 +1227,9 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	queue_request(engine, &request->sched, rq_prio(request));
-	submit_queue(engine, rq_prio(request));
-
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
+	submit_queue(engine, rq_prio(request));
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
@@ -1314,8 +1357,10 @@ static void execlists_schedule(struct i915_request *request,
 		}
 
 		if (prio > engine->execlists.queue_priority &&
-		    i915_sw_fence_done(&sched_to_request(node)->submit))
-			__submit_queue(engine, prio);
+		    i915_sw_fence_done(&sched_to_request(node)->submit)) {
+			__wakeup_queue(engine, prio);
+			__schedule_queue(engine);
+		}
 	}
 
 	spin_unlock_irq(&engine->timeline.lock);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/5] drm/i915/execlists: Direct submission from irq handler
  2018-05-09 14:27 [PATCH 1/5] drm/i915: Remove tasklet flush before disable Chris Wilson
  2018-05-09 14:27 ` [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse Chris Wilson
  2018-05-09 14:27 ` [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
@ 2018-05-09 14:28 ` Chris Wilson
  2018-05-10 12:02   ` Chris Wilson
  2018-05-09 14:28 ` [PATCH 5/5] drm/i915: Speed up idle detection by kicking the tasklets Chris Wilson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2018-05-09 14:28 UTC (permalink / raw)
  To: intel-gfx

Continuing the theme of bypassing ksoftirqd latency, also first try to
directly submit from the CS interrupt handler to clear the ELSP and
queue the next.

In the past, we have been hesitant to do this as the context switch
processing has been quite heavy, requiring forcewaked mmio. However, as
we now can read the GPU state from the cacheable HWSP, it is relatively
cheap!

v2: Explain why we test_bit(IRQ_EXECLIST) after doing notify_ring (it's
because the notify_ring() may itself trigger direct submission clearing
the bit)

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c             | 21 ++++++++++++++-------
 drivers/gpu/drm/i915/i915_tasklet.h         | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc_submission.c |  2 ++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f8aff5a5aa83..e1b3a7575fe7 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1465,19 +1465,26 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	bool tasklet = false;
 
-	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-		if (READ_ONCE(engine->execlists.active))
-			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-						    &engine->irq_posted);
-	}
+	if (iir & GT_CONTEXT_SWITCH_INTERRUPT && READ_ONCE(execlists->active))
+		tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
+					    &engine->irq_posted);
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
 		notify_ring(engine);
-		tasklet |= USES_GUC_SUBMISSION(engine->i915);
+		/*
+		 * notify_ring() may trigger direct submission onto this
+		 * engine, clearing the ENGINE_IRQ_EXECLIST bit. In that
+		 * case, we don't want to resubmit and so clear the tasklet
+		 * boolean. GuC never sets the ENGINE_IRQ_EXECLIST bit and
+		 * so when using the GuC this equates to an unconditional
+		 * setting of tasklet to true.
+		 */
+		if (!test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+			tasklet = USES_GUC_SUBMISSION(engine->i915);
 	}
 
 	if (tasklet)
-		i915_tasklet_schedule(&execlists->tasklet);
+		i915_tasklet(&execlists->tasklet);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
index 99e2fa2241ba..c532282551b7 100644
--- a/drivers/gpu/drm/i915/i915_tasklet.h
+++ b/drivers/gpu/drm/i915/i915_tasklet.h
@@ -99,4 +99,25 @@ static inline void i915_tasklet_run(const struct i915_tasklet *t)
 	__i915_tasklet_run(t);
 }
 
+static inline bool i915_tasklet_try(struct i915_tasklet *t)
+{
+	if (unlikely(!tasklet_trylock(&t->base)))
+		return false;
+
+	if (i915_tasklet_is_enabled(t))
+		i915_tasklet_run(t);
+
+	tasklet_unlock(&t->base);
+	return true;
+}
+
+static inline void i915_tasklet(struct i915_tasklet *t)
+{
+	if (!i915_tasklet_is_enabled(t)) /* GPU reset active */
+		return;
+
+	if (!i915_tasklet_try(t))
+		i915_tasklet_schedule(t);
+}
+
 #endif /* _I915_TASKLET_H_ */
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index f2ded1796523..4e09abf7e206 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -780,6 +780,8 @@ static void guc_submission_tasklet(unsigned long data)
 	struct execlist_port *port = execlists->port;
 	struct i915_request *rq;
 
+	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+
 	rq = port_request(port);
 	while (rq && i915_request_completed(rq)) {
 		trace_i915_request_out(rq);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/5] drm/i915: Speed up idle detection by kicking the tasklets
  2018-05-09 14:27 [PATCH 1/5] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (2 preceding siblings ...)
  2018-05-09 14:28 ` [PATCH 4/5] drm/i915/execlists: Direct submission from irq handler Chris Wilson
@ 2018-05-09 14:28 ` Chris Wilson
  2018-05-09 14:57 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/5] drm/i915: Remove tasklet flush before disable Patchwork
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2018-05-09 14:28 UTC (permalink / raw)
  To: intel-gfx

We rely on ksoftirqd to run in a timely fashion in order to drain the
execlists queue. Quite frequently, it does not. In some cases we may see
latencies of over 200ms triggering our idle timeouts and forcing us to
declare the driver wedged!

Thus we can speed up idle detection by bypassing ksoftirqd in these
cases and flush our tasklet to confirm if we are indeed still waiting
for the ELSP to drain.

v2: Put the execlists.first check back; it is required for handling
reset!
v3: Follow Mika's suggestion to try and limit kicking the tasklet to
only when we expect it to make a difference, i.e. in catch up after a CS
interrupt, and not just execute it everytime as that is likely just to
cover over our own bugs.

References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 3c8a0c3245f3..f975091c5498 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -944,11 +944,20 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 	if (I915_SELFTEST_ONLY(engine->breadcrumbs.mock))
 		return true;
 
+	/*
+	 * ksoftirqd has notorious latency that may cause us to
+	 * timeout while waiting for the engine to idle as we wait for
+	 * ksoftirqd to run the execlists tasklet to drain the ELSP.
+	 * If we are expecting a context switch from the GPU, check now.
+	 */
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+		i915_tasklet_try(&engine->execlists.tasklet);
+
 	/* Waiting to drain ELSP? */
 	if (READ_ONCE(engine->execlists.active))
 		return false;
 
-	/* ELSP is empty, but there are ready requests? */
+	/* ELSP is empty, but there are ready requests? E.g. after reset */
 	if (READ_ONCE(engine->execlists.first))
 		return false;
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/5] drm/i915: Remove tasklet flush before disable
  2018-05-09 14:27 [PATCH 1/5] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (3 preceding siblings ...)
  2018-05-09 14:28 ` [PATCH 5/5] drm/i915: Speed up idle detection by kicking the tasklets Chris Wilson
@ 2018-05-09 14:57 ` Patchwork
  2018-05-09 14:59 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2018-05-09 14:57 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/5] drm/i915: Remove tasklet flush before disable
URL   : https://patchwork.freedesktop.org/series/42950/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
8083271a8a5b drm/i915: Remove tasklet flush before disable
23f6b88c5a19 drm/i915: Wrap tasklet_struct for abuse
-:49: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#49: 
new file mode 100644

-:54: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#54: FILE: drivers/gpu/drm/i915/i915_tasklet.h:1:
+/*

total: 0 errors, 2 warnings, 0 checks, 221 lines checked
236c6eeffbad drm/i915/execlists: Direct submit onto idle engines
-:85: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#85: FILE: drivers/gpu/drm/i915/intel_guc_submission.c:757:
+	unsigned long uninitialized_var(flags);

-:111: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#111: FILE: drivers/gpu/drm/i915/intel_lrc.c:359:
+	unsigned long uninitialized_var(flags);

-:139: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#139: FILE: drivers/gpu/drm/i915/intel_lrc.c:764:
+	unsigned long uninitialized_var(flags);

total: 0 errors, 3 warnings, 0 checks, 192 lines checked
d351b279d678 drm/i915/execlists: Direct submission from irq handler
5eb66ebc88e5 drm/i915: Speed up idle detection by kicking the tasklets

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse
  2018-05-09 14:27 ` [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse Chris Wilson
@ 2018-05-09 14:58   ` Chris Wilson
  2018-05-10 15:49   ` Tvrtko Ursulin
  1 sibling, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2018-05-09 14:58 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2018-05-09 15:27:58)
> In the next few patches, we want to abuse tasklet to avoid ksoftirqd
> latency along critical paths. To make that abuse easily to swallow,
> first coat the tasklet in a little syntactic sugar.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c             |  4 +-
>  drivers/gpu/drm/i915/i915_irq.c             |  2 +-
>  drivers/gpu/drm/i915/i915_tasklet.h         | 78 +++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_engine_cs.c      | 11 ++-
>  drivers/gpu/drm/i915/intel_guc_submission.c |  8 +--
>  drivers/gpu/drm/i915/intel_lrc.c            | 18 ++---
>  drivers/gpu/drm/i915/intel_ringbuffer.h     |  3 +-
>  7 files changed, 102 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_tasklet.h
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 63c96c2b8fcf..59e04387a27c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3036,7 +3036,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>          * Turning off the execlists->tasklet until the reset is over
>          * prevents the race.
>          */
> -       tasklet_disable(&engine->execlists.tasklet);
> +       i915_tasklet_lock(&engine->execlists.tasklet);

Hmm, probably sensible to stick to disable/enable:

 - better match to tasklet_interface (no arbitrary impedance mismatch)

 - they are counting locks rather than mutex which we commonly think of
   for lock/unlock (more like a semaphore).

After dropping the custom flush+disable wrapper, there's no good reason
to have a custom name. Thoughts?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [1/5] drm/i915: Remove tasklet flush before disable
  2018-05-09 14:27 [PATCH 1/5] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (4 preceding siblings ...)
  2018-05-09 14:57 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/5] drm/i915: Remove tasklet flush before disable Patchwork
@ 2018-05-09 14:59 ` Patchwork
  2018-05-09 15:14 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-05-09 17:50 ` ✓ Fi.CI.IGT: " Patchwork
  7 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2018-05-09 14:59 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/5] drm/i915: Remove tasklet flush before disable
URL   : https://patchwork.freedesktop.org/series/42950/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Remove tasklet flush before disable
Okay!

Commit: drm/i915: Wrap tasklet_struct for abuse
Okay!

Commit: drm/i915/execlists: Direct submit onto idle engines
+drivers/gpu/drm/i915/intel_guc_submission.c:768:28: warning: context imbalance in 'guc_dequeue' - unexpected unlock
+drivers/gpu/drm/i915/intel_lrc.c:367:39: warning: context imbalance in 'execlists_unwind_incomplete_requests' - unexpected unlock
+drivers/gpu/drm/i915/intel_lrc.c:773:39: warning: context imbalance in 'execlists_dequeue' - unexpected unlock

Commit: drm/i915/execlists: Direct submission from irq handler
Okay!

Commit: drm/i915: Speed up idle detection by kicking the tasklets
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Remove tasklet flush before disable
  2018-05-09 14:27 [PATCH 1/5] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (5 preceding siblings ...)
  2018-05-09 14:59 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-05-09 15:14 ` Patchwork
  2018-05-09 17:50 ` ✓ Fi.CI.IGT: " Patchwork
  7 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2018-05-09 15:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/5] drm/i915: Remove tasklet flush before disable
URL   : https://patchwork.freedesktop.org/series/42950/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4163 -> Patchwork_8963 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_8963 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_8963, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42950/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_8963:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_8963 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-cnl-y3:          PASS -> DMESG-WARN (fdo#104951)

    
    ==== Possible fixes ====

    igt@drv_module_reload@basic-reload-inject:
      fi-glk-j4005:       DMESG-WARN (fdo#106248) -> PASS

    igt@kms_flip@basic-flip-vs-wf_vblank:
      fi-hsw-4770r:       FAIL (fdo#100368) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#104951 https://bugs.freedesktop.org/show_bug.cgi?id=104951
  fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248


== Participating hosts (41 -> 37) ==

  Missing    (4): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4163 -> Patchwork_8963

  CI_DRM_4163: 8e1dab6e913be7d014eb9bc355ec65b6b56dcd56 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4468: 548a894dc904c4628522dbbc77cb179a4c965ebc @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8963: 5eb66ebc88e5235c97725e9d0dc08d1f6820aed0 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4468: 1e60f1499e5b71b6d5a747189d7c28f57359a87f @ git://anongit.freedesktop.org/piglit


== Linux commits ==

5eb66ebc88e5 drm/i915: Speed up idle detection by kicking the tasklets
d351b279d678 drm/i915/execlists: Direct submission from irq handler
236c6eeffbad drm/i915/execlists: Direct submit onto idle engines
23f6b88c5a19 drm/i915: Wrap tasklet_struct for abuse
8083271a8a5b drm/i915: Remove tasklet flush before disable

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8963/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [1/5] drm/i915: Remove tasklet flush before disable
  2018-05-09 14:27 [PATCH 1/5] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (6 preceding siblings ...)
  2018-05-09 15:14 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-09 17:50 ` Patchwork
  7 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2018-05-09 17:50 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/5] drm/i915: Remove tasklet flush before disable
URL   : https://patchwork.freedesktop.org/series/42950/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4163_full -> Patchwork_8963_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_8963_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_8963_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42950/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_8963_full:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_schedule@deep-bsd2:
      shard-kbl:          SKIP -> PASS +3

    igt@gem_mocs_settings@mocs-rc6-vebox:
      shard-kbl:          PASS -> SKIP +1

    igt@kms_force_connector_basic@force-connector-state:
      shard-snb:          PASS -> SKIP

    igt@kms_properties@plane-properties-legacy:
      shard-snb:          SKIP -> PASS +4

    
== Known issues ==

  Here are the changes found in Patchwork_8963_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible:
      shard-glk:          PASS -> FAIL (fdo#100368)

    igt@perf_pmu@enable-race-vcs0:
      shard-glk:          PASS -> INCOMPLETE (fdo#103359, k.org#198133) +1

    igt@perf_pmu@enable-race-vecs0:
      shard-apl:          PASS -> INCOMPLETE (fdo#103927) +1
      shard-kbl:          PASS -> INCOMPLETE (fdo#103665) +1

    
    ==== Possible fixes ====

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-apl:          FAIL (fdo#105363, fdo#102887) -> PASS

    igt@kms_flip@wf_vblank-ts-check-interruptible:
      shard-glk:          FAIL (fdo#100368) -> PASS
      shard-apl:          FAIL (fdo#100368) -> PASS

    igt@kms_setmode@basic:
      shard-apl:          FAIL (fdo#99912) -> PASS

    igt@kms_sysfs_edid_timing:
      shard-apl:          WARN (fdo#100047) -> PASS

    
  fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (9 -> 9) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4163 -> Patchwork_8963

  CI_DRM_4163: 8e1dab6e913be7d014eb9bc355ec65b6b56dcd56 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4468: 548a894dc904c4628522dbbc77cb179a4c965ebc @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8963: 5eb66ebc88e5235c97725e9d0dc08d1f6820aed0 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4468: 1e60f1499e5b71b6d5a747189d7c28f57359a87f @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8963/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] drm/i915/execlists: Direct submission from irq handler
  2018-05-09 14:28 ` [PATCH 4/5] drm/i915/execlists: Direct submission from irq handler Chris Wilson
@ 2018-05-10 12:02   ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2018-05-10 12:02 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2018-05-09 15:28:00)
> Continuing the theme of bypassing ksoftirqd latency, also first try to
> directly submit from the CS interrupt handler to clear the ELSP and
> queue the next.
> 
> In the past, we have been hesitant to do this as the context switch
> processing has been quite heavy, requiring forcewaked mmio. However, as
> we now can read the GPU state from the cacheable HWSP, it is relatively
> cheap!
> 
> v2: Explain why we test_bit(IRQ_EXECLIST) after doing notify_ring (it's
> because the notify_ring() may itself trigger direct submission clearing
> the bit)
> 
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Tvrtko has some extra tests for igt/gem_exec_latency that show the
impact of triggering ksoftirqd has on submission latency. It's not
pretty. Fortunately, this pair of patches makes a huge difference.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse
  2018-05-09 14:27 ` [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse Chris Wilson
  2018-05-09 14:58   ` Chris Wilson
@ 2018-05-10 15:49   ` Tvrtko Ursulin
  2018-05-10 16:03     ` Chris Wilson
  1 sibling, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2018-05-10 15:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 09/05/2018 15:27, Chris Wilson wrote:
> In the next few patches, we want to abuse tasklet to avoid ksoftirqd
> latency along critical paths. To make that abuse easily to swallow,
> first coat the tasklet in a little syntactic sugar.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c             |  4 +-
>   drivers/gpu/drm/i915/i915_irq.c             |  2 +-
>   drivers/gpu/drm/i915/i915_tasklet.h         | 78 +++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_engine_cs.c      | 11 ++-
>   drivers/gpu/drm/i915/intel_guc_submission.c |  8 +--
>   drivers/gpu/drm/i915/intel_lrc.c            | 18 ++---
>   drivers/gpu/drm/i915/intel_ringbuffer.h     |  3 +-
>   7 files changed, 102 insertions(+), 22 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/i915_tasklet.h
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 63c96c2b8fcf..59e04387a27c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3036,7 +3036,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>   	 * Turning off the execlists->tasklet until the reset is over
>   	 * prevents the race.
>   	 */
> -	tasklet_disable(&engine->execlists.tasklet);
> +	i915_tasklet_lock(&engine->execlists.tasklet);
>   
>   	/*
>   	 * We're using worker to queue preemption requests from the tasklet in
> @@ -3256,7 +3256,7 @@ void i915_gem_reset(struct drm_i915_private *dev_priv,
>   
>   void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
>   {
> -	tasklet_enable(&engine->execlists.tasklet);
> +	i915_tasklet_unlock(&engine->execlists.tasklet);
>   	kthread_unpark(engine->breadcrumbs.signaler);
>   
>   	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f9bc3aaa90d0..f8aff5a5aa83 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1477,7 +1477,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>   	}
>   
>   	if (tasklet)
> -		tasklet_hi_schedule(&execlists->tasklet);
> +		i915_tasklet_schedule(&execlists->tasklet);
>   }
>   
>   static void gen8_gt_irq_ack(struct drm_i915_private *i915,
> diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
> new file mode 100644
> index 000000000000..42b002b88edb
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_tasklet.h
> @@ -0,0 +1,78 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +#ifndef _I915_TASKLET_H_
> +#define _I915_TASKLET_H_
> +
> +#include <linux/atomic.h>
> +#include <linux/interrupt.h>
> +
> +/**
> + * struct i915_tasklet - wrapper around tasklet_struct
> + *
> + * We want to abuse tasklets slightly, such as calling them directly. In some
> + * cases, this requires some auxiliary tracking so subclass the tasklet_struct
> + * so that we have a central place and helpers.
> + */
> +struct i915_tasklet {
> +	struct tasklet_struct base;
> +};
> +
> +static inline void i915_tasklet_init(struct i915_tasklet *t,
> +				     void (*func)(unsigned long),
> +				     unsigned long data)
> +{
> +	tasklet_init(&t->base, func, data);
> +}
> +
> +static inline bool i915_tasklet_is_scheduled(const struct i915_tasklet *t)
> +{
> +	return test_bit(TASKLET_STATE_SCHED, &t->base.state);
> +}
> +
> +static inline bool i915_tasklet_is_running(const struct i915_tasklet *t)
> +{
> +	return test_bit(TASKLET_STATE_RUN, &t->base.state);
> +}
> +
> +static inline bool i915_tasklet_is_enabled(const struct i915_tasklet *t)
> +{
> +	return likely(!atomic_read(&t->base.count));
> +}
> +
> +static inline void i915_tasklet_schedule(struct i915_tasklet *t)
> +{
> +	tasklet_hi_schedule(&t->base);
> +}
> +
> +static inline void i915_tasklet_flush(struct i915_tasklet *t)
> +{
> +	tasklet_kill(&t->base);
> +}
> +
> +static inline void i915_tasklet_lock(struct i915_tasklet *t)
> +{
> +	tasklet_disable(&t->base);
> +}
> +
> +static inline void i915_tasklet_unlock(struct i915_tasklet *t)
> +{
> +	tasklet_enable(&t->base);
> +}

I agree keeping the original naming for kill/enable/disable would be better.

> +
> +static inline void i915_tasklet_set_func(struct i915_tasklet *t,
> +					 void (*func)(unsigned long data),
> +					 unsigned long data)
> +{
> +	i915_tasklet_lock(t);
> +
> +	t->base.func = func;
> +	t->base.data = data;
> +
> +	i915_tasklet_unlock(t);

I kind of remember something about issues we had with placing 
tasklet_disable placed in some contexts. Hm.. if only I could recall the 
details. It's probably fine. I cannot come up with ideas on how to 
protect against that. GEM_BUG_ON(in_irq() || in_softirq())? Too far 
fetched probably.

> +}
> +
> +#endif /* _I915_TASKLET_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 70325e0824e3..3c8a0c3245f3 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1032,7 +1032,7 @@ void intel_engines_park(struct drm_i915_private *i915)
>   	for_each_engine(engine, i915, id) {
>   		/* Flush the residual irq tasklets first. */
>   		intel_engine_disarm_breadcrumbs(engine);
> -		tasklet_kill(&engine->execlists.tasklet);
> +		i915_tasklet_flush(&engine->execlists.tasklet);
>   
>   		/*
>   		 * We are committed now to parking the engines, make sure there
> @@ -1249,9 +1249,8 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
>   			   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)),
>   			   yesno(test_bit(ENGINE_IRQ_EXECLIST,
>   					  &engine->irq_posted)),
> -			   yesno(test_bit(TASKLET_STATE_SCHED,
> -					  &engine->execlists.tasklet.state)),
> -			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)));
> +			   yesno(i915_tasklet_is_scheduled(&engine->execlists.tasklet)),
> +			   enableddisabled(i915_tasklet_is_enabled(&engine->execlists.tasklet)));
>   		if (read >= GEN8_CSB_ENTRIES)
>   			read = 0;
>   		if (write >= GEN8_CSB_ENTRIES)
> @@ -1479,7 +1478,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
>   	if (!intel_engine_supports_stats(engine))
>   		return -ENODEV;
>   
> -	tasklet_disable(&execlists->tasklet);
> +	i915_tasklet_lock(&execlists->tasklet);
>   	write_seqlock_irqsave(&engine->stats.lock, flags);
>   
>   	if (unlikely(engine->stats.enabled == ~0)) {
> @@ -1505,7 +1504,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
>   
>   unlock:
>   	write_sequnlock_irqrestore(&engine->stats.lock, flags);
> -	tasklet_enable(&execlists->tasklet);
> +	i915_tasklet_unlock(&execlists->tasklet);
>   
>   	return err;
>   }
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 2feb65096966..a7afc976c3b9 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -591,7 +591,7 @@ static void inject_preempt_context(struct work_struct *work)
>   	if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) {
>   		execlists_clear_active(&engine->execlists,
>   				       EXECLISTS_ACTIVE_PREEMPT);
> -		tasklet_schedule(&engine->execlists.tasklet);
> +		i915_tasklet_schedule(&engine->execlists.tasklet);
>   	}
>   }
>   
> @@ -1263,10 +1263,10 @@ int intel_guc_submission_enable(struct intel_guc *guc)
>   	guc_interrupts_capture(dev_priv);
>   
>   	for_each_engine(engine, dev_priv, id) {
> -		struct intel_engine_execlists * const execlists =
> -			&engine->execlists;
> +		i915_tasklet_set_func(&engine->execlists.tasklet,
> +				      guc_submission_tasklet,
> +				      (unsigned long)engine);
>   
> -		execlists->tasklet.func = guc_submission_tasklet;
>   		engine->park = guc_submission_park;
>   		engine->unpark = guc_submission_unpark;
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 7f98dda3c929..539fa03d7600 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1165,7 +1165,7 @@ static void queue_request(struct intel_engine_cs *engine,
>   static void __submit_queue(struct intel_engine_cs *engine, int prio)
>   {
>   	engine->execlists.queue_priority = prio;
> -	tasklet_hi_schedule(&engine->execlists.tasklet);
> +	i915_tasklet_schedule(&engine->execlists.tasklet);
>   }
>   
>   static void submit_queue(struct intel_engine_cs *engine, int prio)
> @@ -1778,7 +1778,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
>   
>   	/* After a GPU reset, we may have requests to replay */
>   	if (execlists->first)
> -		tasklet_schedule(&execlists->tasklet);
> +		i915_tasklet_schedule(&execlists->tasklet);
>   
>   	return 0;
>   }
> @@ -2182,9 +2182,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>   	 * Tasklet cannot be active at this point due intel_mark_active/idle
>   	 * so this is just for documentation.
>   	 */
> -	if (WARN_ON(test_bit(TASKLET_STATE_SCHED,
> -			     &engine->execlists.tasklet.state)))
> -		tasklet_kill(&engine->execlists.tasklet);
> +	if (WARN_ON(i915_tasklet_is_scheduled(&engine->execlists.tasklet)))
> +		i915_tasklet_flush(&engine->execlists.tasklet);
>   
>   	dev_priv = engine->i915;
>   
> @@ -2209,7 +2208,10 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
>   	engine->submit_request = execlists_submit_request;
>   	engine->cancel_requests = execlists_cancel_requests;
>   	engine->schedule = execlists_schedule;
> -	engine->execlists.tasklet.func = execlists_submission_tasklet;
> +
> +	i915_tasklet_set_func(&engine->execlists.tasklet,
> +			      execlists_submission_tasklet,
> +			      (unsigned long)engine);

Or eliminate the above dilemma by removing the data parameter from 
i915_tasklet_set_func since it looks it is not needed at the moment?

Regards,

Tvrtko

>   
>   	engine->park = NULL;
>   	engine->unpark = NULL;
> @@ -2303,8 +2305,8 @@ logical_ring_setup(struct intel_engine_cs *engine)
>   
>   	engine->execlists.fw_domains = fw_domains;
>   
> -	tasklet_init(&engine->execlists.tasklet,
> -		     execlists_submission_tasklet, (unsigned long)engine);
> +	i915_tasklet_init(&engine->execlists.tasklet,
> +			  execlists_submission_tasklet, (unsigned long)engine);
>   
>   	logical_ring_default_vfuncs(engine);
>   	logical_ring_default_irqs(engine);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 010750e8ee44..f6ba354faf89 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -11,6 +11,7 @@
>   #include "i915_pmu.h"
>   #include "i915_request.h"
>   #include "i915_selftest.h"
> +#include "i915_tasklet.h"
>   #include "i915_timeline.h"
>   #include "intel_gpu_commands.h"
>   
> @@ -202,7 +203,7 @@ struct intel_engine_execlists {
>   	/**
>   	 * @tasklet: softirq tasklet for bottom handler
>   	 */
> -	struct tasklet_struct tasklet;
> +	struct i915_tasklet tasklet;
>   
>   	/**
>   	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse
  2018-05-10 15:49   ` Tvrtko Ursulin
@ 2018-05-10 16:03     ` Chris Wilson
  2018-05-10 16:15       ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2018-05-10 16:03 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-10 16:49:03)
> 
> On 09/05/2018 15:27, Chris Wilson wrote:
> > In the next few patches, we want to abuse tasklet to avoid ksoftirqd
> > latency along critical paths. To make that abuse easily to swallow,
> > first coat the tasklet in a little syntactic sugar.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> > +static inline void i915_tasklet_unlock(struct i915_tasklet *t)
> > +{
> > +     tasklet_enable(&t->base);
> > +}
> 
> I agree keeping the original naming for kill/enable/disable would be better.
> 
> > +
> > +static inline void i915_tasklet_set_func(struct i915_tasklet *t,
> > +                                      void (*func)(unsigned long data),
> > +                                      unsigned long data)
> > +{
> > +     i915_tasklet_lock(t);
> > +
> > +     t->base.func = func;
> > +     t->base.data = data;
> > +
> > +     i915_tasklet_unlock(t);
> 
> I kind of remember something about issues we had with placing 
> tasklet_disable placed in some contexts. Hm.. if only I could recall the 
> details. It's probably fine. I cannot come up with ideas on how to 
> protect against that. GEM_BUG_ON(in_irq() || in_softirq())? Too far 
> fetched probably.

When we get it wrong, CI complains with a lock up. It cannot be used in
atomic context as it uses yield() to kick the tasklet (as it may be on
the local cpu).

> > @@ -2209,7 +2208,10 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
> >       engine->submit_request = execlists_submit_request;
> >       engine->cancel_requests = execlists_cancel_requests;
> >       engine->schedule = execlists_schedule;
> > -     engine->execlists.tasklet.func = execlists_submission_tasklet;
> > +
> > +     i915_tasklet_set_func(&engine->execlists.tasklet,
> > +                           execlists_submission_tasklet,
> > +                           (unsigned long)engine);
> 
> Or eliminate the above dilemma by removing the data parameter from 
> i915_tasklet_set_func since it looks it is not needed at the moment?

It doesn't eliminate the dilemma, updating the func itself raises the
question of what if the tasklet is running at that time, what was the
caller thinking would happen?

I expect tasklets will start passing themselves to the func and
eliminate the data parameters themselves at some point in the mid term.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines
  2018-05-09 14:27 ` [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
@ 2018-05-10 16:09   ` Tvrtko Ursulin
  2018-05-10 16:25     ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2018-05-10 16:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 09/05/2018 15:27, Chris Wilson wrote:
> Bypass using the tasklet to submit the first request to HW, as the
> tasklet may be deferred unto ksoftirqd and at a minimum will add in
> excess of 10us (and maybe tens of milliseconds) to our execution
> latency. This latency reduction is most notable when execution flows
> between engines.
> 
> v2: Beware handling preemption completion from the direct submit path as
> well.
> v3: Make the abuse clear and track our extra state inside i915_tasklet.
> 
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
>   drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
>   drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
>   3 files changed, 89 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
> index 42b002b88edb..99e2fa2241ba 100644
> --- a/drivers/gpu/drm/i915/i915_tasklet.h
> +++ b/drivers/gpu/drm/i915/i915_tasklet.h
> @@ -8,8 +8,11 @@
>   #define _I915_TASKLET_H_
>   
>   #include <linux/atomic.h>
> +#include <linux/bitops.h>
>   #include <linux/interrupt.h>
>   
> +#include "i915_gem.h"
> +
>   /**
>    * struct i915_tasklet - wrapper around tasklet_struct
>    *
> @@ -19,6 +22,8 @@
>    */
>   struct i915_tasklet {
>   	struct tasklet_struct base;
> +	unsigned long flags;
> +#define I915_TASKLET_DIRECT_SUBMIT BIT(0)

I would suggest a more generic name for the bit since i915_tasklet is 
generic-ish. For instance simply I915_TASKLET_DIRECT would signify the 
callback has been invoked directly and not (necessarily) from softirq 
context. Then it is for each user to know what that means for them 
specifically.

>   };
>   
>   static inline void i915_tasklet_init(struct i915_tasklet *t,
> @@ -43,6 +48,14 @@ static inline bool i915_tasklet_is_enabled(const struct i915_tasklet *t)
>   	return likely(!atomic_read(&t->base.count));
>   }
>   
> +static inline bool i915_tasklet_is_direct_submit(const struct i915_tasklet *t)
> +{
> +	/* Only legal to be checked from inside the tasklet. */
> +	GEM_BUG_ON(!i915_tasklet_is_running(t));
> +
> +	return t->flags & I915_TASKLET_DIRECT_SUBMIT;
> +}

Or maybe i915_tasklet_direct_invocation?

> +
>   static inline void i915_tasklet_schedule(struct i915_tasklet *t)
>   {
>   	tasklet_hi_schedule(&t->base);
> @@ -75,4 +88,15 @@ static inline void i915_tasklet_set_func(struct i915_tasklet *t,
>   	i915_tasklet_unlock(t);
>   }
>   
> +static inline void __i915_tasklet_run(const struct i915_tasklet *t)
> +{
> +	t->base.func(t->base.data);
> +}
> +
> +static inline void i915_tasklet_run(const struct i915_tasklet *t)
> +{
> +	GEM_BUG_ON(!i915_tasklet_is_running(t));
> +	__i915_tasklet_run(t);
> +}
> +
>   #endif /* _I915_TASKLET_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index a7afc976c3b9..f2ded1796523 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -754,14 +754,18 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
>   
>   static void guc_dequeue(struct intel_engine_cs *engine)
>   {
> -	unsigned long flags;
> +	unsigned long uninitialized_var(flags);
>   	bool submit;
>   
>   	local_irq_save(flags);
>   
> -	spin_lock(&engine->timeline.lock);
> +	if (!i915_tasklet_is_direct_submit(&engine->execlists.tasklet))
> +		spin_lock(&engine->timeline.lock);
> +
>   	submit = __guc_dequeue(engine);
> -	spin_unlock(&engine->timeline.lock);
> +
> +	if (!i915_tasklet_is_direct_submit(&engine->execlists.tasklet))
> +		spin_unlock(&engine->timeline.lock);
>   
>   	if (submit)
>   		guc_submit(engine);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 539fa03d7600..09fded9d409f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -356,13 +356,15 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
>   {
>   	struct intel_engine_cs *engine =
>   		container_of(execlists, typeof(*engine), execlists);
> -	unsigned long flags;
> +	unsigned long uninitialized_var(flags);
>   
> -	spin_lock_irqsave(&engine->timeline.lock, flags);
> +	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
> +		spin_lock_irqsave(&engine->timeline.lock, flags);
>   
>   	__unwind_incomplete_requests(engine);
>   
> -	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
> +		spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>   
>   static inline void
> @@ -601,6 +603,8 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   		 */
>   		GEM_BUG_ON(!execlists_is_active(execlists,
>   						EXECLISTS_ACTIVE_USER));
> +		GEM_BUG_ON(execlists_is_active(execlists,
> +					       EXECLISTS_ACTIVE_PREEMPT));
>   		GEM_BUG_ON(!port_count(&port[0]));
>   		if (port_count(&port[0]) > 1)
>   			return false;
> @@ -757,12 +761,16 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   static void execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	unsigned long flags;
> +	unsigned long uninitialized_var(flags);
>   	bool submit;
>   
> -	spin_lock_irqsave(&engine->timeline.lock, flags);
> +	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
> +		spin_lock_irqsave(&engine->timeline.lock, flags);
> +
>   	submit = __execlists_dequeue(engine);
> -	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +
> +	if (!i915_tasklet_is_direct_submit(&execlists->tasklet))
> +		spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   
>   	if (submit)
>   		execlists_submit_ports(engine);
> @@ -1162,16 +1170,52 @@ static void queue_request(struct intel_engine_cs *engine,
>   		      &lookup_priolist(engine, prio)->requests);
>   }
>   
> -static void __submit_queue(struct intel_engine_cs *engine, int prio)
> +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
>   {
>   	engine->execlists.queue_priority = prio;
> +}

Why is this called wakeup? Plans to add something in it later?

> +
> +static void __schedule_queue(struct intel_engine_cs *engine)
> +{
>   	i915_tasklet_schedule(&engine->execlists.tasklet);
>   }
>   
> +static bool __direct_submit(struct intel_engine_execlists *const execlists)
> +{
> +	struct i915_tasklet * const t = &execlists->tasklet;
> +
> +	if (!tasklet_trylock(&t->base))
> +		return false;
> +
> +	t->flags |= I915_TASKLET_DIRECT_SUBMIT;
> +	i915_tasklet_run(t);
> +	t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
> +
> +	tasklet_unlock(&t->base);

Feels like this whole sequence belongs to i915_tasklet since it touches 
the internals. Maybe i915_tasklet_try_run, or i915_tasklet_run_or_schedule?

> +	return true;
> +}
> +
> +static void __submit_queue(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_execlists * const execlists = &engine->execlists;
> +
> +	GEM_BUG_ON(!engine->i915->gt.awake);
> +
> +	/* If inside GPU reset, the tasklet will be queued later. */
> +	if (!i915_tasklet_is_enabled(&execlists->tasklet))
> +		return;
> +
> +	/* Directly submit the first request to reduce the initial latency */
> +	if (port_isset(execlists->port) || !__direct_submit(execlists))
> +		__schedule_queue(engine);

Hmm a bit evil to maybe invoke in the condition. Would it be acceptable to:

if (!port_isset(...))
	i915_tasklet_run_or_schedule(...);
else
	i915_tasklet_schedule(...);

It's not ideal but maybe a bit better.

> +}
> +
>   static void submit_queue(struct intel_engine_cs *engine, int prio)
>   {
> -	if (prio > engine->execlists.queue_priority)
> -		__submit_queue(engine, prio);
> +	if (prio > engine->execlists.queue_priority) {
> +		__wakeup_queue(engine, prio);
> +		__submit_queue(engine);
> +	}
>   }
>   
>   static void execlists_submit_request(struct i915_request *request)
> @@ -1183,10 +1227,9 @@ static void execlists_submit_request(struct i915_request *request)
>   	spin_lock_irqsave(&engine->timeline.lock, flags);
>   
>   	queue_request(engine, &request->sched, rq_prio(request));
> -	submit_queue(engine, rq_prio(request));
> -
>   	GEM_BUG_ON(!engine->execlists.first);
>   	GEM_BUG_ON(list_empty(&request->sched.link));
> +	submit_queue(engine, rq_prio(request));
>   
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
> @@ -1314,8 +1357,10 @@ static void execlists_schedule(struct i915_request *request,
>   		}
>   
>   		if (prio > engine->execlists.queue_priority &&
> -		    i915_sw_fence_done(&sched_to_request(node)->submit))
> -			__submit_queue(engine, prio);
> +		    i915_sw_fence_done(&sched_to_request(node)->submit)) {
> +			__wakeup_queue(engine, prio);
> +			__schedule_queue(engine);
> +		}
>   	}
>   
>   	spin_unlock_irq(&engine->timeline.lock);
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse
  2018-05-10 16:03     ` Chris Wilson
@ 2018-05-10 16:15       ` Tvrtko Ursulin
  2018-05-10 16:19         ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2018-05-10 16:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 10/05/2018 17:03, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-10 16:49:03)
>>
>> On 09/05/2018 15:27, Chris Wilson wrote:
>>> In the next few patches, we want to abuse tasklet to avoid ksoftirqd
>>> latency along critical paths. To make that abuse easily to swallow,
>>> first coat the tasklet in a little syntactic sugar.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>> +static inline void i915_tasklet_unlock(struct i915_tasklet *t)
>>> +{
>>> +     tasklet_enable(&t->base);
>>> +}
>>
>> I agree keeping the original naming for kill/enable/disable would be better.
>>
>>> +
>>> +static inline void i915_tasklet_set_func(struct i915_tasklet *t,
>>> +                                      void (*func)(unsigned long data),
>>> +                                      unsigned long data)
>>> +{
>>> +     i915_tasklet_lock(t);
>>> +
>>> +     t->base.func = func;
>>> +     t->base.data = data;
>>> +
>>> +     i915_tasklet_unlock(t);
>>
>> I kind of remember something about issues we had with placing
>> tasklet_disable placed in some contexts. Hm.. if only I could recall the
>> details. It's probably fine. I cannot come up with ideas on how to
>> protect against that. GEM_BUG_ON(in_irq() || in_softirq())? Too far
>> fetched probably.
> 
> When we get it wrong, CI complains with a lock up. It cannot be used in
> atomic context as it uses yield() to kick the tasklet (as it may be on
> the local cpu).
> 
>>> @@ -2209,7 +2208,10 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
>>>        engine->submit_request = execlists_submit_request;
>>>        engine->cancel_requests = execlists_cancel_requests;
>>>        engine->schedule = execlists_schedule;
>>> -     engine->execlists.tasklet.func = execlists_submission_tasklet;
>>> +
>>> +     i915_tasklet_set_func(&engine->execlists.tasklet,
>>> +                           execlists_submission_tasklet,
>>> +                           (unsigned long)engine);
>>
>> Or eliminate the above dilemma by removing the data parameter from
>> i915_tasklet_set_func since it looks it is not needed at the moment?
> 
> It doesn't eliminate the dilemma, updating the func itself raises the
> question of what if the tasklet is running at that time, what was the
> caller thinking would happen?

Well same as it can do now so you could drop tasklet_disable/enable 
then. At least I assumed the purpose is for atomicity of func/data updates.

> I expect tasklets will start passing themselves to the func and
> eliminate the data parameters themselves at some point in the mid term.

Needs to be tasklet_disable_nosync then, or it will hang. But then it 
loses its purpose anyway. So I am confused.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse
  2018-05-10 16:15       ` Tvrtko Ursulin
@ 2018-05-10 16:19         ` Chris Wilson
  2018-05-10 17:08           ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2018-05-10 16:19 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-10 17:15:46)
> 
> On 10/05/2018 17:03, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-10 16:49:03)
> >>
> >> On 09/05/2018 15:27, Chris Wilson wrote:
> >>> In the next few patches, we want to abuse tasklet to avoid ksoftirqd
> >>> latency along critical paths. To make that abuse easily to swallow,
> >>> first coat the tasklet in a little syntactic sugar.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>> +static inline void i915_tasklet_unlock(struct i915_tasklet *t)
> >>> +{
> >>> +     tasklet_enable(&t->base);
> >>> +}
> >>
> >> I agree keeping the original naming for kill/enable/disable would be better.
> >>
> >>> +
> >>> +static inline void i915_tasklet_set_func(struct i915_tasklet *t,
> >>> +                                      void (*func)(unsigned long data),
> >>> +                                      unsigned long data)
> >>> +{
> >>> +     i915_tasklet_lock(t);
> >>> +
> >>> +     t->base.func = func;
> >>> +     t->base.data = data;
> >>> +
> >>> +     i915_tasklet_unlock(t);
> >>
> >> I kind of remember something about issues we had with placing
> >> tasklet_disable placed in some contexts. Hm.. if only I could recall the
> >> details. It's probably fine. I cannot come up with ideas on how to
> >> protect against that. GEM_BUG_ON(in_irq() || in_softirq())? Too far
> >> fetched probably.
> > 
> > When we get it wrong, CI complains with a lock up. It cannot be used in
> > atomic context as it uses yield() to kick the tasklet (as it may be on
> > the local cpu).
> > 
> >>> @@ -2209,7 +2208,10 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
> >>>        engine->submit_request = execlists_submit_request;
> >>>        engine->cancel_requests = execlists_cancel_requests;
> >>>        engine->schedule = execlists_schedule;
> >>> -     engine->execlists.tasklet.func = execlists_submission_tasklet;
> >>> +
> >>> +     i915_tasklet_set_func(&engine->execlists.tasklet,
> >>> +                           execlists_submission_tasklet,
> >>> +                           (unsigned long)engine);
> >>
> >> Or eliminate the above dilemma by removing the data parameter from
> >> i915_tasklet_set_func since it looks it is not needed at the moment?
> > 
> > It doesn't eliminate the dilemma, updating the func itself raises the
> > question of what if the tasklet is running at that time, what was the
> > caller thinking would happen?
> 
> Well same as it can do now so you could drop tasklet_disable/enable 
> then. At least I assumed the purpose is for atomicity of func/data updates.

Sure, it mostly there to catch using it from silly circumstances where
the tasklet may be being called as the change occurs. Should never
happen, so I bet it does.
 
> > I expect tasklets will start passing themselves to the func and
> > eliminate the data parameters themselves at some point in the mid term.
> 
> Needs to be tasklet_disable_nosync then, or it will hang. But then it 
> loses its purpose anyway. So I am confused.

I was just referring to my hope that the unsigned long data parameter
will be moved to the great dustbin of history. If timers can make the
leap, so can tasklets!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines
  2018-05-10 16:09   ` Tvrtko Ursulin
@ 2018-05-10 16:25     ` Chris Wilson
  2018-05-10 17:26       ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2018-05-10 16:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-10 17:09:14)
> 
> On 09/05/2018 15:27, Chris Wilson wrote:
> > Bypass using the tasklet to submit the first request to HW, as the
> > tasklet may be deferred unto ksoftirqd and at a minimum will add in
> > excess of 10us (and maybe tens of milliseconds) to our execution
> > latency. This latency reduction is most notable when execution flows
> > between engines.
> > 
> > v2: Beware handling preemption completion from the direct submit path as
> > well.
> > v3: Make the abuse clear and track our extra state inside i915_tasklet.
> > 
> > Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
> >   drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
> >   drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
> >   3 files changed, 89 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
> > index 42b002b88edb..99e2fa2241ba 100644
> > --- a/drivers/gpu/drm/i915/i915_tasklet.h
> > +++ b/drivers/gpu/drm/i915/i915_tasklet.h
> > @@ -8,8 +8,11 @@
> >   #define _I915_TASKLET_H_
> >   
> >   #include <linux/atomic.h>
> > +#include <linux/bitops.h>
> >   #include <linux/interrupt.h>
> >   
> > +#include "i915_gem.h"
> > +
> >   /**
> >    * struct i915_tasklet - wrapper around tasklet_struct
> >    *
> > @@ -19,6 +22,8 @@
> >    */
> >   struct i915_tasklet {
> >       struct tasklet_struct base;
> > +     unsigned long flags;
> > +#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
> 
> I would suggest a more generic name for the bit since i915_tasklet is 
> generic-ish. For instance simply I915_TASKLET_DIRECT would signify the 
> callback has been invoked directly and not (necessarily) from softirq 
> context. Then it is for each user to know what that means for them 
> specifically.

Problem is we have two direct invocations, only one is special. It
really wants to be something like I915_TASKLET_ENGINE_IS_LOCKED - you can
see why I didn't propose that.

> > -static void __submit_queue(struct intel_engine_cs *engine, int prio)
> > +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
> >   {
> >       engine->execlists.queue_priority = prio;
> > +}
> 
> Why is this called wakeup? Plans to add something in it later?

Yes. It's called wakeup because it's setting the value that the dequeue
wakes up at. First name was kick_queue, but it doesn't kick either.

The later side-effect involves controlling timers.

__restart_queue()?

> > +static void __schedule_queue(struct intel_engine_cs *engine)
> > +{
> >       i915_tasklet_schedule(&engine->execlists.tasklet);
> >   }
> >   
> > +static bool __direct_submit(struct intel_engine_execlists *const execlists)
> > +{
> > +     struct i915_tasklet * const t = &execlists->tasklet;
> > +
> > +     if (!tasklet_trylock(&t->base))
> > +             return false;
> > +
> > +     t->flags |= I915_TASKLET_DIRECT_SUBMIT;
> > +     i915_tasklet_run(t);
> > +     t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
> > +
> > +     tasklet_unlock(&t->base);
> 
> Feels like this whole sequence belongs to i915_tasklet since it touches 
> the internals. Maybe i915_tasklet_try_run, or i915_tasklet_run_or_schedule?

Keep reading the series and you'll see just why this is so special and
confined to execlists.

> > +     return true;
> > +}
> > +
> > +static void __submit_queue(struct intel_engine_cs *engine)
> > +{
> > +     struct intel_engine_execlists * const execlists = &engine->execlists;
> > +
> > +     GEM_BUG_ON(!engine->i915->gt.awake);
> > +
> > +     /* If inside GPU reset, the tasklet will be queued later. */
> > +     if (!i915_tasklet_is_enabled(&execlists->tasklet))
> > +             return;
> > +
> > +     /* Directly submit the first request to reduce the initial latency */
> > +     if (port_isset(execlists->port) || !__direct_submit(execlists))
> > +             __schedule_queue(engine);
> 
> Hmm a bit evil to maybe invoke in the condition. Would it be acceptable to:
> 
> if (!port_isset(...))
>         i915_tasklet_run_or_schedule(...);
> else
>         i915_tasklet_schedule(...);
> 
> It's not ideal but maybe a bit better.

Beauty is in the eye of the beholder, and that ain't beautiful :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse
  2018-05-10 16:19         ` Chris Wilson
@ 2018-05-10 17:08           ` Tvrtko Ursulin
  0 siblings, 0 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2018-05-10 17:08 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 10/05/2018 17:19, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-10 17:15:46)
>>
>> On 10/05/2018 17:03, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-05-10 16:49:03)
>>>>
>>>> On 09/05/2018 15:27, Chris Wilson wrote:
>>>>> In the next few patches, we want to abuse tasklet to avoid ksoftirqd
>>>>> latency along critical paths. To make that abuse easily to swallow,
>>>>> first coat the tasklet in a little syntactic sugar.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>> +static inline void i915_tasklet_unlock(struct i915_tasklet *t)
>>>>> +{
>>>>> +     tasklet_enable(&t->base);
>>>>> +}
>>>>
>>>> I agree keeping the original naming for kill/enable/disable would be better.
>>>>
>>>>> +
>>>>> +static inline void i915_tasklet_set_func(struct i915_tasklet *t,
>>>>> +                                      void (*func)(unsigned long data),
>>>>> +                                      unsigned long data)
>>>>> +{
>>>>> +     i915_tasklet_lock(t);
>>>>> +
>>>>> +     t->base.func = func;
>>>>> +     t->base.data = data;
>>>>> +
>>>>> +     i915_tasklet_unlock(t);
>>>>
>>>> I kind of remember something about issues we had with placing
>>>> tasklet_disable placed in some contexts. Hm.. if only I could recall the
>>>> details. It's probably fine. I cannot come up with ideas on how to
>>>> protect against that. GEM_BUG_ON(in_irq() || in_softirq())? Too far
>>>> fetched probably.
>>>
>>> When we get it wrong, CI complains with a lock up. It cannot be used in
>>> atomic context as it uses yield() to kick the tasklet (as it may be on
>>> the local cpu).
>>>
>>>>> @@ -2209,7 +2208,10 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
>>>>>         engine->submit_request = execlists_submit_request;
>>>>>         engine->cancel_requests = execlists_cancel_requests;
>>>>>         engine->schedule = execlists_schedule;
>>>>> -     engine->execlists.tasklet.func = execlists_submission_tasklet;
>>>>> +
>>>>> +     i915_tasklet_set_func(&engine->execlists.tasklet,
>>>>> +                           execlists_submission_tasklet,
>>>>> +                           (unsigned long)engine);
>>>>
>>>> Or eliminate the above dilemma by removing the data parameter from
>>>> i915_tasklet_set_func since it looks it is not needed at the moment?
>>>
>>> It doesn't eliminate the dilemma, updating the func itself raises the
>>> question of what if the tasklet is running at that time, what was the
>>> caller thinking would happen?
>>
>> Well same as it can do now so you could drop tasklet_disable/enable
>> then. At least I assumed the purpose is for atomicity of func/data updates.
> 
> Sure, it mostly there to catch using it from silly circumstances where
> the tasklet may be being called as the change occurs. Should never
> happen, so I bet it does.

You could do similar by asserting tasklet is either disabled, or at 
least not running.

>>> I expect tasklets will start passing themselves to the func and
>>> eliminate the data parameters themselves at some point in the mid term.
>>
>> Needs to be tasklet_disable_nosync then, or it will hang. But then it
>> loses its purpose anyway. So I am confused.
> 
> I was just referring to my hope that the unsigned long data parameter
> will be moved to the great dustbin of history. If timers can make the
> leap, so can tasklets!

I just don't see the point in adding the second parameter now. I mean 
it's not a big deal, but I don't see it useful.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines
  2018-05-10 16:25     ` Chris Wilson
@ 2018-05-10 17:26       ` Tvrtko Ursulin
  2018-05-10 17:40         ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2018-05-10 17:26 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 10/05/2018 17:25, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-10 17:09:14)
>>
>> On 09/05/2018 15:27, Chris Wilson wrote:
>>> Bypass using the tasklet to submit the first request to HW, as the
>>> tasklet may be deferred unto ksoftirqd and at a minimum will add in
>>> excess of 10us (and maybe tens of milliseconds) to our execution
>>> latency. This latency reduction is most notable when execution flows
>>> between engines.
>>>
>>> v2: Beware handling preemption completion from the direct submit path as
>>> well.
>>> v3: Make the abuse clear and track our extra state inside i915_tasklet.
>>>
>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
>>>    drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
>>>    drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
>>>    3 files changed, 89 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
>>> index 42b002b88edb..99e2fa2241ba 100644
>>> --- a/drivers/gpu/drm/i915/i915_tasklet.h
>>> +++ b/drivers/gpu/drm/i915/i915_tasklet.h
>>> @@ -8,8 +8,11 @@
>>>    #define _I915_TASKLET_H_
>>>    
>>>    #include <linux/atomic.h>
>>> +#include <linux/bitops.h>
>>>    #include <linux/interrupt.h>
>>>    
>>> +#include "i915_gem.h"
>>> +
>>>    /**
>>>     * struct i915_tasklet - wrapper around tasklet_struct
>>>     *
>>> @@ -19,6 +22,8 @@
>>>     */
>>>    struct i915_tasklet {
>>>        struct tasklet_struct base;
>>> +     unsigned long flags;
>>> +#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
>>
>> I would suggest a more generic name for the bit since i915_tasklet is
>> generic-ish. For instance simply I915_TASKLET_DIRECT would signify the
>> callback has been invoked directly and not (necessarily) from softirq
>> context. Then it is for each user to know what that means for them
>> specifically.
> 
> Problem is we have two direct invocations, only one is special. It
> really wants to be something like I915_TASKLET_ENGINE_IS_LOCKED - you can
> see why I didn't propose that.

TBC...

>>> -static void __submit_queue(struct intel_engine_cs *engine, int prio)
>>> +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
>>>    {
>>>        engine->execlists.queue_priority = prio;
>>> +}
>>
>> Why is this called wakeup? Plans to add something in it later?
> 
> Yes. It's called wakeup because it's setting the value that the dequeue
> wakes up at. First name was kick_queue, but it doesn't kick either.
> 
> The later side-effect involves controlling timers.
> 
> __restart_queue()?

__update_queue_priority? :)

> 
>>> +static void __schedule_queue(struct intel_engine_cs *engine)
>>> +{
>>>        i915_tasklet_schedule(&engine->execlists.tasklet);
>>>    }
>>>    
>>> +static bool __direct_submit(struct intel_engine_execlists *const execlists)
>>> +{
>>> +     struct i915_tasklet * const t = &execlists->tasklet;
>>> +
>>> +     if (!tasklet_trylock(&t->base))
>>> +             return false;
>>> +
>>> +     t->flags |= I915_TASKLET_DIRECT_SUBMIT;
>>> +     i915_tasklet_run(t);
>>> +     t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
>>> +
>>> +     tasklet_unlock(&t->base);
>>
>> Feels like this whole sequence belongs to i915_tasklet since it touches
>> the internals. Maybe i915_tasklet_try_run, or i915_tasklet_run_or_schedule?
> 
> Keep reading the series and you'll see just why this is so special and
> confined to execlists.

... TBC here.

Having peeked ahead, it feels a bit not generic enough as it is, a bit 
too hacky.

Would it work to pass context together with the invocation. Like:

i915_tasklet_try(..., I915_TASKLET_SUBMIT_IDLE);
i915_tasklet_try(..., I915_TASKLET_SUBMIT_IRQ);

i915_tasklet.flags field namespace would then be owned by the caller 
completely. And the tasklet func itself would have more context on what 
to do.

Following form that, i915_tasklet_run_or_schedule(.., flags).

bool i915_taskle_try(tasklet, flags)
{
	if (!trylock)
		return false;

	t->flags |= flags;
	i915_tasklet_run(...);
	t->flags &= ~flags;

	tasklet_unlock(...);

	return true;
}


void i915_tasklet_run_or_schedule(..., flags)
{
	if (!i915_tasklet_try(..., flags))
		i915_tasklet_schedule(...);
}

?

Leaves a question of a tasklet_is_enabled check in your tasklet_try, 
which I don't quite get since that check wasn't there before. So why it 
is needed?

> 
>>> +     return true;
>>> +}
>>> +
>>> +static void __submit_queue(struct intel_engine_cs *engine)
>>> +{
>>> +     struct intel_engine_execlists * const execlists = &engine->execlists;
>>> +
>>> +     GEM_BUG_ON(!engine->i915->gt.awake);
>>> +
>>> +     /* If inside GPU reset, the tasklet will be queued later. */
>>> +     if (!i915_tasklet_is_enabled(&execlists->tasklet))
>>> +             return;
>>> +
>>> +     /* Directly submit the first request to reduce the initial latency */
>>> +     if (port_isset(execlists->port) || !__direct_submit(execlists))
>>> +             __schedule_queue(engine);
>>
>> Hmm a bit evil to maybe invoke in the condition. Would it be acceptable to:
>>
>> if (!port_isset(...))
>>          i915_tasklet_run_or_schedule(...);
>> else
>>          i915_tasklet_schedule(...);
>>
>> It's not ideal but maybe a bit better.
> 
> Beauty is in the eye of the beholder, and that ain't beautiful :)

Did not say it was, just more obvious what's happening.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines
  2018-05-10 17:26       ` Tvrtko Ursulin
@ 2018-05-10 17:40         ` Chris Wilson
  2018-05-11  8:25           ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2018-05-10 17:40 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-10 18:26:31)
> 
> On 10/05/2018 17:25, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-10 17:09:14)
> >>
> >> On 09/05/2018 15:27, Chris Wilson wrote:
> >>> Bypass using the tasklet to submit the first request to HW, as the
> >>> tasklet may be deferred unto ksoftirqd and at a minimum will add in
> >>> excess of 10us (and maybe tens of milliseconds) to our execution
> >>> latency. This latency reduction is most notable when execution flows
> >>> between engines.
> >>>
> >>> v2: Beware handling preemption completion from the direct submit path as
> >>> well.
> >>> v3: Make the abuse clear and track our extra state inside i915_tasklet.
> >>>
> >>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
> >>>    drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
> >>>    drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
> >>>    3 files changed, 89 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
> >>> index 42b002b88edb..99e2fa2241ba 100644
> >>> --- a/drivers/gpu/drm/i915/i915_tasklet.h
> >>> +++ b/drivers/gpu/drm/i915/i915_tasklet.h
> >>> @@ -8,8 +8,11 @@
> >>>    #define _I915_TASKLET_H_
> >>>    
> >>>    #include <linux/atomic.h>
> >>> +#include <linux/bitops.h>
> >>>    #include <linux/interrupt.h>
> >>>    
> >>> +#include "i915_gem.h"
> >>> +
> >>>    /**
> >>>     * struct i915_tasklet - wrapper around tasklet_struct
> >>>     *
> >>> @@ -19,6 +22,8 @@
> >>>     */
> >>>    struct i915_tasklet {
> >>>        struct tasklet_struct base;
> >>> +     unsigned long flags;
> >>> +#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
> >>
> >> I would suggest a more generic name for the bit since i915_tasklet is
> >> generic-ish. For instance simply I915_TASKLET_DIRECT would signify the
> >> callback has been invoked directly and not (necessarily) from softirq
> >> context. Then it is for each user to know what that means for them
> >> specifically.
> > 
> > Problem is we have two direct invocations, only one is special. It
> > really wants to be something like I915_TASKLET_ENGINE_IS_LOCKED - you can
> > see why I didn't propose that.
> 
> TBC...
> 
> >>> -static void __submit_queue(struct intel_engine_cs *engine, int prio)
> >>> +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
> >>>    {
> >>>        engine->execlists.queue_priority = prio;
> >>> +}
> >>
> >> Why is this called wakeup? Plans to add something in it later?
> > 
> > Yes. It's called wakeup because it's setting the value that the dequeue
> > wakes up at. First name was kick_queue, but it doesn't kick either.
> > 
> > The later side-effect involves controlling timers.
> > 
> > __restart_queue()?
> 
> __update_queue_priority? :)

It doesn't just update the priority...

Now a choice between restart_queue and update_queue.

> >>> +static void __schedule_queue(struct intel_engine_cs *engine)
> >>> +{
> >>>        i915_tasklet_schedule(&engine->execlists.tasklet);
> >>>    }
> >>>    
> >>> +static bool __direct_submit(struct intel_engine_execlists *const execlists)
> >>> +{
> >>> +     struct i915_tasklet * const t = &execlists->tasklet;
> >>> +
> >>> +     if (!tasklet_trylock(&t->base))
> >>> +             return false;
> >>> +
> >>> +     t->flags |= I915_TASKLET_DIRECT_SUBMIT;
> >>> +     i915_tasklet_run(t);
> >>> +     t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
> >>> +
> >>> +     tasklet_unlock(&t->base);
> >>
> >> Feels like this whole sequence belongs to i915_tasklet since it touches
> >> the internals. Maybe i915_tasklet_try_run, or i915_tasklet_run_or_schedule?
> > 
> > Keep reading the series and you'll see just why this is so special and
> > confined to execlists.
> 
> ... TBC here.
> 
> Having peeked ahead, it feels a bit not generic enough as it is, a bit 
> too hacky.
> 
> Would it work to pass context together with the invocation. Like:
> 
> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IDLE);
> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IRQ);
> 
> i915_tasklet.flags field namespace would then be owned by the caller 
> completely. And the tasklet func itself would have more context on what 
> to do.

That doesn't apply very well to the use case either. It's not the
tasklet being called from irq/process that's significant but whether we
are calling it with the engine/data locked.

I keep wanting to use LOCKED, but that has no meaning to the tasklet,
and tasklet_trylock means something entirely different.
 
> Following form that, i915_tasklet_run_or_schedule(.., flags).
> 
> bool i915_taskle_try(tasklet, flags)
> {
>         if (!trylock)
>                 return false;
> 
>         t->flags |= flags;
>         i915_tasklet_run(...);
>         t->flags &= ~flags;
> 
>         tasklet_unlock(...);
> 
>         return true;
> }
> 
> 
> void i915_tasklet_run_or_schedule(..., flags)
> {
>         if (!i915_tasklet_try(..., flags))
>                 i915_tasklet_schedule(...);
> }
> 
> ?
> 
> Leaves a question of a tasklet_is_enabled check in your tasklet_try, 
> which I don't quite get since that check wasn't there before. So why it 
> is needed?

Concurrent reset can happen at irq time, but we know cannot happen from
the submit path.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines
  2018-05-10 17:40         ` Chris Wilson
@ 2018-05-11  8:25           ` Tvrtko Ursulin
  2018-05-11  8:31             ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2018-05-11  8:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 10/05/2018 18:40, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-10 18:26:31)
>>
>> On 10/05/2018 17:25, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-05-10 17:09:14)
>>>>
>>>> On 09/05/2018 15:27, Chris Wilson wrote:
>>>>> Bypass using the tasklet to submit the first request to HW, as the
>>>>> tasklet may be deferred unto ksoftirqd and at a minimum will add in
>>>>> excess of 10us (and maybe tens of milliseconds) to our execution
>>>>> latency. This latency reduction is most notable when execution flows
>>>>> between engines.
>>>>>
>>>>> v2: Beware handling preemption completion from the direct submit path as
>>>>> well.
>>>>> v3: Make the abuse clear and track our extra state inside i915_tasklet.
>>>>>
>>>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
>>>>>     drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
>>>>>     drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
>>>>>     3 files changed, 89 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
>>>>> index 42b002b88edb..99e2fa2241ba 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_tasklet.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_tasklet.h
>>>>> @@ -8,8 +8,11 @@
>>>>>     #define _I915_TASKLET_H_
>>>>>     
>>>>>     #include <linux/atomic.h>
>>>>> +#include <linux/bitops.h>
>>>>>     #include <linux/interrupt.h>
>>>>>     
>>>>> +#include "i915_gem.h"
>>>>> +
>>>>>     /**
>>>>>      * struct i915_tasklet - wrapper around tasklet_struct
>>>>>      *
>>>>> @@ -19,6 +22,8 @@
>>>>>      */
>>>>>     struct i915_tasklet {
>>>>>         struct tasklet_struct base;
>>>>> +     unsigned long flags;
>>>>> +#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
>>>>
>>>> I would suggest a more generic name for the bit since i915_tasklet is
>>>> generic-ish. For instance simply I915_TASKLET_DIRECT would signify the
>>>> callback has been invoked directly and not (necessarily) from softirq
>>>> context. Then it is for each user to know what that means for them
>>>> specifically.
>>>
>>> Problem is we have two direct invocations, only one is special. It
>>> really wants to be something like I915_TASKLET_ENGINE_IS_LOCKED - you can
>>> see why I didn't propose that.
>>
>> TBC...
>>
>>>>> -static void __submit_queue(struct intel_engine_cs *engine, int prio)
>>>>> +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
>>>>>     {
>>>>>         engine->execlists.queue_priority = prio;
>>>>> +}
>>>>
>>>> Why is this called wakeup? Plans to add something in it later?
>>>
>>> Yes. It's called wakeup because it's setting the value that the dequeue
>>> wakes up at. First name was kick_queue, but it doesn't kick either.
>>>
>>> The later side-effect involves controlling timers.
>>>
>>> __restart_queue()?
>>
>> __update_queue_priority? :)
> 
> It doesn't just update the priority...
> 
> Now a choice between restart_queue and update_queue.

Update sounds better match to me.

> 
>>>>> +static void __schedule_queue(struct intel_engine_cs *engine)
>>>>> +{
>>>>>         i915_tasklet_schedule(&engine->execlists.tasklet);
>>>>>     }
>>>>>     
>>>>> +static bool __direct_submit(struct intel_engine_execlists *const execlists)
>>>>> +{
>>>>> +     struct i915_tasklet * const t = &execlists->tasklet;
>>>>> +
>>>>> +     if (!tasklet_trylock(&t->base))
>>>>> +             return false;
>>>>> +
>>>>> +     t->flags |= I915_TASKLET_DIRECT_SUBMIT;
>>>>> +     i915_tasklet_run(t);
>>>>> +     t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
>>>>> +
>>>>> +     tasklet_unlock(&t->base);
>>>>
>>>> Feels like this whole sequence belongs to i915_tasklet since it touches
>>>> the internals. Maybe i915_tasklet_try_run, or i915_tasklet_run_or_schedule?
>>>
>>> Keep reading the series and you'll see just why this is so special and
>>> confined to execlists.
>>
>> ... TBC here.
>>
>> Having peeked ahead, it feels a bit not generic enough as it is, a bit
>> too hacky.
>>
>> Would it work to pass context together with the invocation. Like:
>>
>> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IDLE);
>> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IRQ);
>>
>> i915_tasklet.flags field namespace would then be owned by the caller
>> completely. And the tasklet func itself would have more context on what
>> to do.
> 
> That doesn't apply very well to the use case either. It's not the
> tasklet being called from irq/process that's significant but whether we
> are calling it with the engine/data locked.

That's why I am proposing to allow a generic mechanism to pass in a 
"token" to the API, which API will pass down to the user if invoking the 
tasklet directly.

The user then decides how to interpret the token.

I915_TASKLET_SUBMIT_IDLE would mean "I know this is the path with 
timeline lock already taken".

I915_TASKLET_SUBMIT_IRQ token would mean "I need to take the lock and I 
need an early abort if tasklet is disabled".

> 
> I keep wanting to use LOCKED, but that has no meaning to the tasklet,
> and tasklet_trylock means something entirely different.
>   
>> Following form that, i915_tasklet_run_or_schedule(.., flags).
>>
>> bool i915_taskle_try(tasklet, flags)
>> {
>>          if (!trylock)
>>                  return false;
>>
>>          t->flags |= flags;
>>          i915_tasklet_run(...);
>>          t->flags &= ~flags;
>>
>>          tasklet_unlock(...);
>>
>>          return true;
>> }
>>
>>
>> void i915_tasklet_run_or_schedule(..., flags)
>> {
>>          if (!i915_tasklet_try(..., flags))
>>                  i915_tasklet_schedule(...);
>> }
>>
>> ?
>>
>> Leaves a question of a tasklet_is_enabled check in your tasklet_try,
>> which I don't quite get since that check wasn't there before. So why it
>> is needed?
> 
> Concurrent reset can happen at irq time, but we know cannot happen from
> the submit path.

Right, yes, it is relevant for direct invocation only.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines
  2018-05-11  8:25           ` Tvrtko Ursulin
@ 2018-05-11  8:31             ` Chris Wilson
  2018-05-11  8:48               ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2018-05-11  8:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-11 09:25:00)
> 
> On 10/05/2018 18:40, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-10 18:26:31)
> >>
> >> On 10/05/2018 17:25, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-05-10 17:09:14)
> >>>>
> >>>> On 09/05/2018 15:27, Chris Wilson wrote:
> >>>>> Bypass using the tasklet to submit the first request to HW, as the
> >>>>> tasklet may be deferred unto ksoftirqd and at a minimum will add in
> >>>>> excess of 10us (and maybe tens of milliseconds) to our execution
> >>>>> latency. This latency reduction is most notable when execution flows
> >>>>> between engines.
> >>>>>
> >>>>> v2: Beware handling preemption completion from the direct submit path as
> >>>>> well.
> >>>>> v3: Make the abuse clear and track our extra state inside i915_tasklet.
> >>>>>
> >>>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>> ---
> >>>>>     drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
> >>>>>     drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
> >>>>>     drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
> >>>>>     3 files changed, 89 insertions(+), 16 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
> >>>>> index 42b002b88edb..99e2fa2241ba 100644
> >>>>> --- a/drivers/gpu/drm/i915/i915_tasklet.h
> >>>>> +++ b/drivers/gpu/drm/i915/i915_tasklet.h
> >>>>> @@ -8,8 +8,11 @@
> >>>>>     #define _I915_TASKLET_H_
> >>>>>     
> >>>>>     #include <linux/atomic.h>
> >>>>> +#include <linux/bitops.h>
> >>>>>     #include <linux/interrupt.h>
> >>>>>     
> >>>>> +#include "i915_gem.h"
> >>>>> +
> >>>>>     /**
> >>>>>      * struct i915_tasklet - wrapper around tasklet_struct
> >>>>>      *
> >>>>> @@ -19,6 +22,8 @@
> >>>>>      */
> >>>>>     struct i915_tasklet {
> >>>>>         struct tasklet_struct base;
> >>>>> +     unsigned long flags;
> >>>>> +#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
> >>>>
> >>>> I would suggest a more generic name for the bit since i915_tasklet is
> >>>> generic-ish. For instance simply I915_TASKLET_DIRECT would signify the
> >>>> callback has been invoked directly and not (necessarily) from softirq
> >>>> context. Then it is for each user to know what that means for them
> >>>> specifically.
> >>>
> >>> Problem is we have two direct invocations, only one is special. It
> >>> really wants to be something like I915_TASKLET_ENGINE_IS_LOCKED - you can
> >>> see why I didn't propose that.
> >>
> >> TBC...
> >>
> >>>>> -static void __submit_queue(struct intel_engine_cs *engine, int prio)
> >>>>> +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
> >>>>>     {
> >>>>>         engine->execlists.queue_priority = prio;
> >>>>> +}
> >>>>
> >>>> Why is this called wakeup? Plans to add something in it later?
> >>>
> >>> Yes. It's called wakeup because it's setting the value that the dequeue
> >>> wakes up at. First name was kick_queue, but it doesn't kick either.
> >>>
> >>> The later side-effect involves controlling timers.
> >>>
> >>> __restart_queue()?
> >>
> >> __update_queue_priority? :)
> > 
> > It doesn't just update the priority...
> > 
> > Now a choice between restart_queue and update_queue.
> 
> Update sounds better match to me.
> 
> > 
> >>>>> +static void __schedule_queue(struct intel_engine_cs *engine)
> >>>>> +{
> >>>>>         i915_tasklet_schedule(&engine->execlists.tasklet);
> >>>>>     }
> >>>>>     
> >>>>> +static bool __direct_submit(struct intel_engine_execlists *const execlists)
> >>>>> +{
> >>>>> +     struct i915_tasklet * const t = &execlists->tasklet;
> >>>>> +
> >>>>> +     if (!tasklet_trylock(&t->base))
> >>>>> +             return false;
> >>>>> +
> >>>>> +     t->flags |= I915_TASKLET_DIRECT_SUBMIT;
> >>>>> +     i915_tasklet_run(t);
> >>>>> +     t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
> >>>>> +
> >>>>> +     tasklet_unlock(&t->base);
> >>>>
> >>>> Feels like this whole sequence belongs to i915_tasklet since it touches
> >>>> the internals. Maybe i915_tasklet_try_run, or i915_tasklet_run_or_schedule?
> >>>
> >>> Keep reading the series and you'll see just why this is so special and
> >>> confined to execlists.
> >>
> >> ... TBC here.
> >>
> >> Having peeked ahead, it feels a bit not generic enough as it is, a bit
> >> too hacky.
> >>
> >> Would it work to pass context together with the invocation. Like:
> >>
> >> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IDLE);
> >> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IRQ);
> >>
> >> i915_tasklet.flags field namespace would then be owned by the caller
> >> completely. And the tasklet func itself would have more context on what
> >> to do.
> > 
> > That doesn't apply very well to the use case either. It's not the
> > tasklet being called from irq/process that's significant but whether we
> > are calling it with the engine/data locked.
> 
> That's why I am proposing to allow a generic mechanism to pass in a 
> "token" to the API, which API will pass down to the user if invoking the 
> tasklet directly.
> 
> The user then decides how to interpret the token.
> 
> I915_TASKLET_SUBMIT_IDLE would mean "I know this is the path with 
> timeline lock already taken".
> 
> I915_TASKLET_SUBMIT_IRQ token would mean "I need to take the lock and I 
> need an early abort if tasklet is disabled".

I don't see a reason to extend it to a generic mechanism yet.
direct-submit-onto-idle is the special case. I've 3 users for the normal
case (just calling into the tasklet directly), one for wanting to pass
along private state after claiming the tasklet for itself, and one that
does unspeakable things that doesn't match any of the above ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines
  2018-05-11  8:31             ` Chris Wilson
@ 2018-05-11  8:48               ` Tvrtko Ursulin
  0 siblings, 0 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2018-05-11  8:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/05/2018 09:31, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-11 09:25:00)
>>
>> On 10/05/2018 18:40, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-05-10 18:26:31)
>>>>
>>>> On 10/05/2018 17:25, Chris Wilson wrote:
>>>>> Quoting Tvrtko Ursulin (2018-05-10 17:09:14)
>>>>>>
>>>>>> On 09/05/2018 15:27, Chris Wilson wrote:
>>>>>>> Bypass using the tasklet to submit the first request to HW, as the
>>>>>>> tasklet may be deferred unto ksoftirqd and at a minimum will add in
>>>>>>> excess of 10us (and maybe tens of milliseconds) to our execution
>>>>>>> latency. This latency reduction is most notable when execution flows
>>>>>>> between engines.
>>>>>>>
>>>>>>> v2: Beware handling preemption completion from the direct submit path as
>>>>>>> well.
>>>>>>> v3: Make the abuse clear and track our extra state inside i915_tasklet.
>>>>>>>
>>>>>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
>>>>>>>      drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
>>>>>>>      drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
>>>>>>>      3 files changed, 89 insertions(+), 16 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
>>>>>>> index 42b002b88edb..99e2fa2241ba 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_tasklet.h
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_tasklet.h
>>>>>>> @@ -8,8 +8,11 @@
>>>>>>>      #define _I915_TASKLET_H_
>>>>>>>      
>>>>>>>      #include <linux/atomic.h>
>>>>>>> +#include <linux/bitops.h>
>>>>>>>      #include <linux/interrupt.h>
>>>>>>>      
>>>>>>> +#include "i915_gem.h"
>>>>>>> +
>>>>>>>      /**
>>>>>>>       * struct i915_tasklet - wrapper around tasklet_struct
>>>>>>>       *
>>>>>>> @@ -19,6 +22,8 @@
>>>>>>>       */
>>>>>>>      struct i915_tasklet {
>>>>>>>          struct tasklet_struct base;
>>>>>>> +     unsigned long flags;
>>>>>>> +#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
>>>>>>
>>>>>> I would suggest a more generic name for the bit since i915_tasklet is
>>>>>> generic-ish. For instance simply I915_TASKLET_DIRECT would signify the
>>>>>> callback has been invoked directly and not (necessarily) from softirq
>>>>>> context. Then it is for each user to know what that means for them
>>>>>> specifically.
>>>>>
>>>>> Problem is we have two direct invocations, only one is special. It
>>>>> really wants to be something like I915_TASKLET_ENGINE_IS_LOCKED - you can
>>>>> see why I didn't propose that.
>>>>
>>>> TBC...
>>>>
>>>>>>> -static void __submit_queue(struct intel_engine_cs *engine, int prio)
>>>>>>> +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
>>>>>>>      {
>>>>>>>          engine->execlists.queue_priority = prio;
>>>>>>> +}
>>>>>>
>>>>>> Why is this called wakeup? Plans to add something in it later?
>>>>>
>>>>> Yes. It's called wakeup because it's setting the value that the dequeue
>>>>> wakes up at. First name was kick_queue, but it doesn't kick either.
>>>>>
>>>>> The later side-effect involves controlling timers.
>>>>>
>>>>> __restart_queue()?
>>>>
>>>> __update_queue_priority? :)
>>>
>>> It doesn't just update the priority...
>>>
>>> Now a choice between restart_queue and update_queue.
>>
>> Update sounds better match to me.
>>
>>>
>>>>>>> +static void __schedule_queue(struct intel_engine_cs *engine)
>>>>>>> +{
>>>>>>>          i915_tasklet_schedule(&engine->execlists.tasklet);
>>>>>>>      }
>>>>>>>      
>>>>>>> +static bool __direct_submit(struct intel_engine_execlists *const execlists)
>>>>>>> +{
>>>>>>> +     struct i915_tasklet * const t = &execlists->tasklet;
>>>>>>> +
>>>>>>> +     if (!tasklet_trylock(&t->base))
>>>>>>> +             return false;
>>>>>>> +
>>>>>>> +     t->flags |= I915_TASKLET_DIRECT_SUBMIT;
>>>>>>> +     i915_tasklet_run(t);
>>>>>>> +     t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
>>>>>>> +
>>>>>>> +     tasklet_unlock(&t->base);
>>>>>>
>>>>>> Feels like this whole sequence belongs to i915_tasklet since it touches
>>>>>> the internals. Maybe i915_tasklet_try_run, or i915_tasklet_run_or_schedule?
>>>>>
>>>>> Keep reading the series and you'll see just why this is so special and
>>>>> confined to execlists.
>>>>
>>>> ... TBC here.
>>>>
>>>> Having peeked ahead, it feels a bit not generic enough as it is, a bit
>>>> too hacky.
>>>>
>>>> Would it work to pass context together with the invocation. Like:
>>>>
>>>> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IDLE);
>>>> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IRQ);
>>>>
>>>> i915_tasklet.flags field namespace would then be owned by the caller
>>>> completely. And the tasklet func itself would have more context on what
>>>> to do.
>>>
>>> That doesn't apply very well to the use case either. It's not the
>>> tasklet being called from irq/process that's significant but whether we
>>> are calling it with the engine/data locked.
>>
>> That's why I am proposing to allow a generic mechanism to pass in a
>> "token" to the API, which API will pass down to the user if invoking the
>> tasklet directly.
>>
>> The user then decides how to interpret the token.
>>
>> I915_TASKLET_SUBMIT_IDLE would mean "I know this is the path with
>> timeline lock already taken".
>>
>> I915_TASKLET_SUBMIT_IRQ token would mean "I need to take the lock and I
>> need an early abort if tasklet is disabled".
> 
> I don't see a reason to extend it to a generic mechanism yet.
> direct-submit-onto-idle is the special case. I've 3 users for the normal
> case (just calling into the tasklet directly), one for wanting to pass
> along private state after claiming the tasklet for itself, and one that
> does unspeakable things that doesn't match any of the above ;)

I don't know what is this last one, but, talking about the three call 
sites form this series.. not moving more of the logic to the 
i915_tasklet IMO diminishes the reason for it to exists. If it is only 
wrapping a single token, which the callers have to fiddle with manually 
for majority of (special) cases, for me it is a harder sell.

We could just as well just use a data field in struct intel_execlists, 
as I suggested earlier, and write our own i915_execlists_tasklet_try or 
something wrappers. It would be the same thing and much less code.

So for me either making it more generic, or not having it at all, seems 
that it would result in less code and the functional design would be the 
same.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-05-11  8:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-09 14:27 [PATCH 1/5] drm/i915: Remove tasklet flush before disable Chris Wilson
2018-05-09 14:27 ` [PATCH 2/5] drm/i915: Wrap tasklet_struct for abuse Chris Wilson
2018-05-09 14:58   ` Chris Wilson
2018-05-10 15:49   ` Tvrtko Ursulin
2018-05-10 16:03     ` Chris Wilson
2018-05-10 16:15       ` Tvrtko Ursulin
2018-05-10 16:19         ` Chris Wilson
2018-05-10 17:08           ` Tvrtko Ursulin
2018-05-09 14:27 ` [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
2018-05-10 16:09   ` Tvrtko Ursulin
2018-05-10 16:25     ` Chris Wilson
2018-05-10 17:26       ` Tvrtko Ursulin
2018-05-10 17:40         ` Chris Wilson
2018-05-11  8:25           ` Tvrtko Ursulin
2018-05-11  8:31             ` Chris Wilson
2018-05-11  8:48               ` Tvrtko Ursulin
2018-05-09 14:28 ` [PATCH 4/5] drm/i915/execlists: Direct submission from irq handler Chris Wilson
2018-05-10 12:02   ` Chris Wilson
2018-05-09 14:28 ` [PATCH 5/5] drm/i915: Speed up idle detection by kicking the tasklets Chris Wilson
2018-05-09 14:57 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/5] drm/i915: Remove tasklet flush before disable Patchwork
2018-05-09 14:59 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-05-09 15:14 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-09 17:50 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.