All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority
@ 2018-05-07 13:57 Chris Wilson
  2018-05-07 13:57 ` [PATCH v2 2/7] drm/i915: Disable tasklet scheduling across initial scheduling Chris Wilson
                   ` (14 more replies)
  0 siblings, 15 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-07 13:57 UTC (permalink / raw)
  To: intel-gfx

When called from process context tasklet_schedule() defers itself to
ksoftirqd. From experience this may cause unacceptable latencies of over
200ms in executing the submission tasklet, our goal is to reprioritise
the HW execution queue and trigger HW preemption immediately, so disable
bh over the call to schedule and force the tasklet to run afterwards if
scheduled.

v2: Keep rcu_read_lock() around for PREEMPT_RCU

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5ece6ae4bdff..89bf5d67cb74 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -578,10 +578,12 @@ static void __fence_set_priority(struct dma_fence *fence,
 	rq = to_request(fence);
 	engine = rq->engine;
 
-	rcu_read_lock();
+	local_bh_disable();
+	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
 	if (engine->schedule)
 		engine->schedule(rq, attr);
 	rcu_read_unlock();
+	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
 }
 
 static void fence_set_priority(struct dma_fence *fence,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 2/7] drm/i915: Disable tasklet scheduling across initial scheduling
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
@ 2018-05-07 13:57 ` Chris Wilson
  2018-05-08 10:02   ` Tvrtko Ursulin
  2018-05-07 13:57 ` [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe Chris Wilson
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-05-07 13:57 UTC (permalink / raw)
  To: intel-gfx

During request submission, we call the engine->schedule() function so
that we may reorder the active requests as required for inheriting the
new request's priority. This may schedule several tasklets to run on the
local CPU, but we will need to schedule the tasklets again for the new
request. Delay all the local tasklets until the end, so that we only
have to process the queue just once.

v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a
superset of rcu_read_lock().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e4cf76ec14a6..f336942229cf 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1110,12 +1110,11 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 * decide whether to preempt the entire chain so that it is ready to
 	 * run at the earliest possible convenience.
 	 */
-	rcu_read_lock();
+	local_bh_disable();
+	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
 	if (engine->schedule)
 		engine->schedule(request, &request->ctx->sched);
 	rcu_read_unlock();
-
-	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
  2018-05-07 13:57 ` [PATCH v2 2/7] drm/i915: Disable tasklet scheduling across initial scheduling Chris Wilson
@ 2018-05-07 13:57 ` Chris Wilson
  2018-05-08 10:10   ` Tvrtko Ursulin
                     ` (2 more replies)
  2018-05-07 13:57 ` [PATCH v2 4/7] drm/i915/guc: " Chris Wilson
                   ` (12 subsequent siblings)
  14 siblings, 3 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-07 13:57 UTC (permalink / raw)
  To: intel-gfx

Prepare to allow the execlists submission to be run from underneath a
hardirq timer context (and not just the current softirq context) as is
required for fast preemption resets and context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f9f4064dec0e..15c373ea5b7e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -357,10 +357,13 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
 {
 	struct intel_engine_cs *engine =
 		container_of(execlists, typeof(*engine), execlists);
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->timeline.lock, flags);
 
-	spin_lock_irq(&engine->timeline.lock);
 	__unwind_incomplete_requests(engine);
-	spin_unlock_irq(&engine->timeline.lock);
+
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
 static inline void
@@ -554,7 +557,7 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
 	execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
-static void execlists_dequeue(struct intel_engine_cs *engine)
+static bool __execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
@@ -564,6 +567,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	struct rb_node *rb;
 	bool submit = false;
 
+	lockdep_assert_held(&engine->timeline.lock);
+
 	/* Hardware submission is through 2 ports. Conceptually each port
 	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
 	 * static for a context, and unique to each, so we only execute
@@ -585,7 +590,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
-	spin_lock_irq(&engine->timeline.lock);
 	rb = execlists->first;
 	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
 
@@ -600,7 +604,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 						EXECLISTS_ACTIVE_USER));
 		GEM_BUG_ON(!port_count(&port[0]));
 		if (port_count(&port[0]) > 1)
-			goto unlock;
+			return false;
 
 		/*
 		 * If we write to ELSP a second time before the HW has had
@@ -610,11 +614,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		 * the HW to indicate that it has had a chance to respond.
 		 */
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
-			goto unlock;
+			return false;
 
 		if (need_preempt(engine, last, execlists->queue_priority)) {
 			inject_preempt_context(engine);
-			goto unlock;
+			return false;
 		}
 
 		/*
@@ -639,7 +643,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		 * priorities of the ports haven't been switch.
 		 */
 		if (port_count(&port[1]))
-			goto unlock;
+			return false;
 
 		/*
 		 * WaIdleLiteRestore:bdw,skl
@@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	/* We must always keep the beast fed if we have work piled up */
 	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
 
-unlock:
-	spin_unlock_irq(&engine->timeline.lock);
-
-	if (submit) {
+	/* Re-evaluate the executing context setup after each preemptive kick */
+	if (last)
 		execlists_user_begin(execlists, execlists->port);
+
+	return submit;
+}
+
+static void execlists_dequeue(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+	unsigned long flags;
+	bool submit;
+
+	spin_lock_irqsave(&engine->timeline.lock, flags);
+	submit = __execlists_dequeue(engine);
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
+	if (submit)
 		execlists_submit_ports(engine);
-	}
 
 	GEM_BUG_ON(port_isset(execlists->port) &&
 		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 4/7] drm/i915/guc: Make submission tasklet hardirq safe
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
  2018-05-07 13:57 ` [PATCH v2 2/7] drm/i915: Disable tasklet scheduling across initial scheduling Chris Wilson
  2018-05-07 13:57 ` [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe Chris Wilson
@ 2018-05-07 13:57 ` Chris Wilson
  2018-05-08 17:43   ` Tvrtko Ursulin
  2018-05-07 13:57 ` [PATCH v2 5/7] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-05-07 13:57 UTC (permalink / raw)
  To: intel-gfx

Prepare to allow the GuC submission to be run from underneath a
hardirq timer context (and not just the current softirq context) as is
required for fast preemption resets and context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 34 +++++++++++++++------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 62828e39ee26..2feb65096966 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -669,7 +669,7 @@ static inline int port_prio(const struct execlist_port *port)
 	return rq_prio(port_request(port));
 }
 
-static void guc_dequeue(struct intel_engine_cs *engine)
+static bool __guc_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
@@ -679,7 +679,8 @@ static void guc_dequeue(struct intel_engine_cs *engine)
 	bool submit = false;
 	struct rb_node *rb;
 
-	spin_lock_irq(&engine->timeline.lock);
+	lockdep_assert_held(&engine->timeline.lock);
+
 	rb = execlists->first;
 	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
 
@@ -694,13 +695,13 @@ static void guc_dequeue(struct intel_engine_cs *engine)
 						     EXECLISTS_ACTIVE_PREEMPT);
 				queue_work(engine->i915->guc.preempt_wq,
 					   &preempt_work->work);
-				goto unlock;
+				return false;
 			}
 		}
 
 		port++;
 		if (port_isset(port))
-			goto unlock;
+			return false;
 	}
 	GEM_BUG_ON(port_isset(port));
 
@@ -738,19 +739,34 @@ static void guc_dequeue(struct intel_engine_cs *engine)
 done:
 	execlists->queue_priority = rb ? to_priolist(rb)->priority : INT_MIN;
 	execlists->first = rb;
-	if (submit) {
+	if (submit)
 		port_assign(port, last);
+	if (last)
 		execlists_user_begin(execlists, execlists->port);
-		guc_submit(engine);
-	}
 
 	/* We must always keep the beast fed if we have work piled up */
 	GEM_BUG_ON(port_isset(execlists->port) &&
 		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
 	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
 
-unlock:
-	spin_unlock_irq(&engine->timeline.lock);
+	return submit;
+}
+
+static void guc_dequeue(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+	bool submit;
+
+	local_irq_save(flags);
+
+	spin_lock(&engine->timeline.lock);
+	submit = __guc_dequeue(engine);
+	spin_unlock(&engine->timeline.lock);
+
+	if (submit)
+		guc_submit(engine);
+
+	local_irq_restore(flags);
 }
 
 static void guc_submission_tasklet(unsigned long data)
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 5/7] drm/i915/execlists: Direct submit onto idle engines
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (2 preceding siblings ...)
  2018-05-07 13:57 ` [PATCH v2 4/7] drm/i915/guc: " Chris Wilson
@ 2018-05-07 13:57 ` Chris Wilson
  2018-05-08 10:23   ` Tvrtko Ursulin
  2018-05-07 13:57 ` [PATCH v2 6/7] drm/i915/execlists: Direct submission from irq handler Chris Wilson
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-05-07 13:57 UTC (permalink / raw)
  To: intel-gfx

Bypass using the tasklet to submit the first request to HW, as the
tasklet may be deferred unto ksoftirqd and at a minimum will add in
excess of 10us (and maybe tens of milliseconds) to our execution
latency. This latency reduction is most notable when execution flows
between engines.

v2: Beware handling preemption completion from the direct submit path as
well.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 12 +++-
 drivers/gpu/drm/i915/intel_lrc.c            | 66 +++++++++++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  7 +++
 3 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 2feb65096966..6bfe30af7826 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -754,14 +754,20 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 
 static void guc_dequeue(struct intel_engine_cs *engine)
 {
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 	bool submit;
 
 	local_irq_save(flags);
 
-	spin_lock(&engine->timeline.lock);
+	GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN,
+			     &engine->execlists.tasklet.state));
+	if (!intel_engine_direct_submit(engine))
+		spin_lock(&engine->timeline.lock);
+
 	submit = __guc_dequeue(engine);
-	spin_unlock(&engine->timeline.lock);
+
+	if (!intel_engine_direct_submit(engine))
+		spin_unlock(&engine->timeline.lock);
 
 	if (submit)
 		guc_submit(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 15c373ea5b7e..ac7c5edee4ee 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -357,13 +357,16 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
 {
 	struct intel_engine_cs *engine =
 		container_of(execlists, typeof(*engine), execlists);
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN, &execlists->tasklet.state));
+	if (!intel_engine_direct_submit(engine))
+		spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	__unwind_incomplete_requests(engine);
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	if (!intel_engine_direct_submit(engine))
+		spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
 static inline void
@@ -602,6 +605,8 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 		 */
 		GEM_BUG_ON(!execlists_is_active(execlists,
 						EXECLISTS_ACTIVE_USER));
+		GEM_BUG_ON(execlists_is_active(execlists,
+					       EXECLISTS_ACTIVE_PREEMPT));
 		GEM_BUG_ON(!port_count(&port[0]));
 		if (port_count(&port[0]) > 1)
 			return false;
@@ -758,12 +763,17 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	unsigned long flags;
+	unsigned long uninitialized_var(flags);
 	bool submit;
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN, &execlists->tasklet.state));
+	if (!intel_engine_direct_submit(engine))
+		spin_lock_irqsave(&engine->timeline.lock, flags);
+
 	submit = __execlists_dequeue(engine);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
+	if (!intel_engine_direct_submit(engine))
+		spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
 	if (submit)
 		execlists_submit_ports(engine);
@@ -1163,16 +1173,45 @@ static void queue_request(struct intel_engine_cs *engine,
 		      &lookup_priolist(engine, node, prio)->requests);
 }
 
-static void __submit_queue(struct intel_engine_cs *engine, int prio)
+static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
 {
 	engine->execlists.queue_priority = prio;
+}
+
+static void __schedule_queue(struct intel_engine_cs *engine)
+{
 	tasklet_hi_schedule(&engine->execlists.tasklet);
 }
 
+static void __submit_queue(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct tasklet_struct * const t = &execlists->tasklet;
+
+	GEM_BUG_ON(!engine->i915->gt.awake);
+
+	/* If inside GPU reset, the tasklet will be queued later. */
+	if (unlikely(atomic_read(&t->count)))
+		return;
+
+	/* Directly submit the first request to reduce the initial latency */
+	if (!port_isset(execlists->port) && tasklet_trylock(t)) {
+		engine->flags |= I915_ENGINE_DIRECT_SUBMIT;
+		t->func(t->data);
+		engine->flags &= ~I915_ENGINE_DIRECT_SUBMIT;
+		tasklet_unlock(t);
+		return;
+	}
+
+	__schedule_queue(engine);
+}
+
 static void submit_queue(struct intel_engine_cs *engine, int prio)
 {
-	if (prio > engine->execlists.queue_priority)
-		__submit_queue(engine, prio);
+	if (prio > engine->execlists.queue_priority) {
+		__wakeup_queue(engine, prio);
+		__submit_queue(engine);
+	}
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1184,10 +1223,9 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	queue_request(engine, &request->sched, rq_prio(request));
-	submit_queue(engine, rq_prio(request));
-
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
+	submit_queue(engine, rq_prio(request));
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
@@ -1309,8 +1347,10 @@ static void execlists_schedule(struct i915_request *request,
 		}
 
 		if (prio > engine->execlists.queue_priority &&
-		    i915_sw_fence_done(&sched_to_request(node)->submit))
-			__submit_queue(engine, prio);
+		    i915_sw_fence_done(&sched_to_request(node)->submit)) {
+			__wakeup_queue(engine, prio);
+			__schedule_queue(engine);
+		}
 	}
 
 	spin_unlock_irq(&engine->timeline.lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 010750e8ee44..f5545391d76a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -569,6 +569,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
+#define I915_ENGINE_DIRECT_SUBMIT    BIT(3)
 	unsigned int flags;
 
 	/*
@@ -646,6 +647,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
 }
 
+static inline bool
+intel_engine_direct_submit(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_DIRECT_SUBMIT;
+}
+
 static inline bool __execlists_need_preempt(int prio, int last)
 {
 	return prio > max(0, last);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 6/7] drm/i915/execlists: Direct submission from irq handler
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (3 preceding siblings ...)
  2018-05-07 13:57 ` [PATCH v2 5/7] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
@ 2018-05-07 13:57 ` Chris Wilson
  2018-05-08 10:54   ` Tvrtko Ursulin
  2018-05-08 12:17   ` [PATCH] " Chris Wilson
  2018-05-07 13:57 ` [PATCH v2 7/7] drm/i915: Speed up idle detection by kicking the tasklets Chris Wilson
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-07 13:57 UTC (permalink / raw)
  To: intel-gfx

Continuing the themem of bypassing ksoftirqd latency, also first try to
directly submit from the CS interrupt handler to clear the ELSP and
queue the next.

In the past, we have been hesitant to do this as the context switch
processing has been quite heavy, requiring forcewaked mmio. However, as
we now can read the GPU state from the cacheable HWSP, it is relatively
cheap!

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c             | 13 ++++++-------
 drivers/gpu/drm/i915/intel_guc_submission.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h     | 16 ++++++++++++++++
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f9bc3aaa90d0..775cf167d938 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1465,19 +1465,18 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	bool tasklet = false;
 
-	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-		if (READ_ONCE(engine->execlists.active))
-			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-						    &engine->irq_posted);
-	}
+	if (iir & GT_CONTEXT_SWITCH_INTERRUPT && READ_ONCE(execlists->active))
+		tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
+					    &engine->irq_posted);
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
 		notify_ring(engine);
-		tasklet |= USES_GUC_SUBMISSION(engine->i915);
+		if (!test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+			tasklet = USES_GUC_SUBMISSION(engine->i915);
 	}
 
 	if (tasklet)
-		tasklet_hi_schedule(&execlists->tasklet);
+		execlists_tasklet(execlists);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 6bfe30af7826..7d4542b46f5e 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -782,6 +782,8 @@ static void guc_submission_tasklet(unsigned long data)
 	struct execlist_port *port = execlists->port;
 	struct i915_request *rq;
 
+	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+
 	rq = port_request(port);
 	while (rq && i915_request_completed(rq)) {
 		trace_i915_request_out(rq);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f5545391d76a..da7e00ff2c6b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -717,6 +717,22 @@ execlists_port_complete(struct intel_engine_execlists * const execlists,
 	return port;
 }
 
+static inline void
+execlists_tasklet(struct intel_engine_execlists * const execlists)
+{
+	struct tasklet_struct * const t = &execlists->tasklet;
+
+	if (unlikely(atomic_read(&t->count))) /* GPU reset active */
+		return;
+
+	if (tasklet_trylock(t)) {
+		t->func(t->data);
+		tasklet_unlock(t);
+	} else {
+		tasklet_hi_schedule(t);
+	}
+}
+
 static inline unsigned int
 intel_engine_flag(const struct intel_engine_cs *engine)
 {
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 7/7] drm/i915: Speed up idle detection by kicking the tasklets
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (4 preceding siblings ...)
  2018-05-07 13:57 ` [PATCH v2 6/7] drm/i915/execlists: Direct submission from irq handler Chris Wilson
@ 2018-05-07 13:57 ` Chris Wilson
  2018-05-07 15:31 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority Patchwork
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-07 13:57 UTC (permalink / raw)
  To: intel-gfx

We rely on ksoftirqd to run in a timely fashion in order to drain the
execlists queue. Quite frequently, it does not. In some cases we may see
latencies of over 200ms triggering our idle timeouts and forcing us to
declare the driver wedged!

Thus we can speed up idle detection by bypassing ksoftirqd in these
cases and flush our tasklet to confirm if we are indeed still waiting
for the ELSP to drain.

v2: Put the execlists.first check back; it is required for handling
reset!
v3: Follow Mika's suggestion to try and limit kicking the tasklet to
only when we expect it to make a difference, i.e. in catch up after a CS
interrupt, and not just execute it everytime as that is likely just to
cover over our own bugs.

References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 70325e0824e3..27f6b30e032f 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -944,11 +944,20 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 	if (I915_SELFTEST_ONLY(engine->breadcrumbs.mock))
 		return true;
 
+	/*
+	 * ksoftirqd has notorious latency that may cause us to
+	 * timeout while waiting for the engine to idle as we wait for
+	 * ksoftirqd to run the execlists tasklet to drain the ELSP.
+	 * If we are expecting a context switch from the GPU, check now.
+	 */
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+		execlists_tasklet(&engine->execlists);
+
 	/* Waiting to drain ELSP? */
 	if (READ_ONCE(engine->execlists.active))
 		return false;
 
-	/* ELSP is empty, but there are ready requests? */
+	/* ELSP is empty, but there are ready requests? E.g. after reset */
 	if (READ_ONCE(engine->execlists.first))
 		return false;
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (5 preceding siblings ...)
  2018-05-07 13:57 ` [PATCH v2 7/7] drm/i915: Speed up idle detection by kicking the tasklets Chris Wilson
@ 2018-05-07 15:31 ` Patchwork
  2018-05-07 15:32 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-07 15:31 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority
URL   : https://patchwork.freedesktop.org/series/42815/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
8cef3a53edbb drm/i915: Flush submission tasklet after bumping priority
a41633339738 drm/i915: Disable tasklet scheduling across initial scheduling
73b5184f0846 drm/i915/execlists: Make submission tasklet hardirq safe
9c06e59f519a drm/i915/guc: Make submission tasklet hardirq safe
4fa5bf4dcf57 drm/i915/execlists: Direct submit onto idle engines
-:28: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#28: FILE: drivers/gpu/drm/i915/intel_guc_submission.c:757:
+	unsigned long uninitialized_var(flags);

-:56: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#56: FILE: drivers/gpu/drm/i915/intel_lrc.c:360:
+	unsigned long uninitialized_var(flags);

-:85: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#85: FILE: drivers/gpu/drm/i915/intel_lrc.c:766:
+	unsigned long uninitialized_var(flags);

total: 0 errors, 3 warnings, 0 checks, 160 lines checked
cc4737fd06e9 drm/i915/execlists: Direct submission from irq handler
f11f14adf4e6 drm/i915: Speed up idle detection by kicking the tasklets

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (6 preceding siblings ...)
  2018-05-07 15:31 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority Patchwork
@ 2018-05-07 15:32 ` Patchwork
  2018-05-07 15:46 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-07 15:32 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority
URL   : https://patchwork.freedesktop.org/series/42815/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Flush submission tasklet after bumping priority
Okay!

Commit: drm/i915: Disable tasklet scheduling across initial scheduling
Okay!

Commit: drm/i915/execlists: Make submission tasklet hardirq safe
Okay!

Commit: drm/i915/guc: Make submission tasklet hardirq safe
Okay!

Commit: drm/i915/execlists: Direct submit onto idle engines
+drivers/gpu/drm/i915/intel_guc_submission.c:770:28: warning: context imbalance in 'guc_dequeue' - unexpected unlock
+drivers/gpu/drm/i915/intel_lrc.c:369:39: warning: context imbalance in 'execlists_unwind_incomplete_requests' - unexpected unlock
+drivers/gpu/drm/i915/intel_lrc.c:776:39: warning: context imbalance in 'execlists_dequeue' - unexpected unlock
-O:drivers/gpu/drm/i915/intel_ringbuffer.h:651:23: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/intel_ringbuffer.h:651:23: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/intel_ringbuffer.h:658:23: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/intel_ringbuffer.h:658:23: warning: expression using sizeof(void)

Commit: drm/i915/execlists: Direct submission from irq handler
Okay!

Commit: drm/i915: Speed up idle detection by kicking the tasklets
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (7 preceding siblings ...)
  2018-05-07 15:32 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-05-07 15:46 ` Patchwork
  2018-05-07 17:56 ` ✓ Fi.CI.IGT: " Patchwork
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-07 15:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority
URL   : https://patchwork.freedesktop.org/series/42815/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4151 -> Patchwork_8927 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42815/revisions/1/mbox/

== Known issues ==

  Here are the changes found in Patchwork_8927 that come from known issues:

  === IGT changes ===

    ==== Possible fixes ====

    igt@drv_module_reload@basic-no-display:
      fi-bsw-n3050:       DMESG-FAIL (fdo#106373) -> PASS

    igt@gem_exec_suspend@basic-s4-devices:
      fi-kbl-7500u:       DMESG-WARN (fdo#105128) -> PASS

    igt@kms_chamelium@hdmi-hpd-fast:
      fi-kbl-7500u:       FAIL (fdo#102672, fdo#103841) -> SKIP

    igt@kms_flip@basic-flip-vs-wf_vblank:
      fi-elk-e7500:       INCOMPLETE (fdo#103989) -> PASS

    igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
      fi-cnl-psr:         FAIL (fdo#103481) -> PASS

    igt@prime_vgem@basic-fence-flip:
      fi-ilk-650:         FAIL (fdo#104008) -> PASS

    
  fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672
  fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#103989 https://bugs.freedesktop.org/show_bug.cgi?id=103989
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#105128 https://bugs.freedesktop.org/show_bug.cgi?id=105128
  fdo#106373 https://bugs.freedesktop.org/show_bug.cgi?id=106373


== Participating hosts (41 -> 37) ==

  Missing    (4): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4151 -> Patchwork_8927

  CI_DRM_4151: 176975baca37c5cdb1632b9aa2ba0170c9ab53df @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4464: 1bb318b32db003a377da14715c7b80675a712b6b @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8927: f11f14adf4e6010fc93ab44c0848ef846a522590 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4464: 33e58d5583eb7ed3966a1b905f875a1dfa959f6b @ git://anongit.freedesktop.org/piglit


== Linux commits ==

f11f14adf4e6 drm/i915: Speed up idle detection by kicking the tasklets
cc4737fd06e9 drm/i915/execlists: Direct submission from irq handler
4fa5bf4dcf57 drm/i915/execlists: Direct submit onto idle engines
9c06e59f519a drm/i915/guc: Make submission tasklet hardirq safe
73b5184f0846 drm/i915/execlists: Make submission tasklet hardirq safe
a41633339738 drm/i915: Disable tasklet scheduling across initial scheduling
8cef3a53edbb drm/i915: Flush submission tasklet after bumping priority

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8927/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (8 preceding siblings ...)
  2018-05-07 15:46 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-07 17:56 ` Patchwork
  2018-05-08  9:40 ` [PATCH v2 1/7] " Tvrtko Ursulin
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-07 17:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority
URL   : https://patchwork.freedesktop.org/series/42815/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4151_full -> Patchwork_8927_full =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42815/revisions/1/mbox/

== Known issues ==

  Here are the changes found in Patchwork_8927_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_softpin@noreloc-s3:
      shard-snb:          PASS -> DMESG-WARN (fdo#102365)

    igt@kms_chv_cursor_fail@pipe-b-64x64-bottom-edge:
      shard-apl:          PASS -> FAIL (fdo#104724, fdo#104671)

    igt@kms_flip@plain-flip-ts-check:
      shard-glk:          PASS -> FAIL (fdo#100368)

    igt@kms_rotation_crc@primary-rotation-270:
      shard-apl:          PASS -> FAIL (fdo#104724, fdo#103925)

    igt@kms_setmode@basic:
      shard-glk:          PASS -> FAIL (fdo#99912)

    igt@perf@blocking:
      shard-hsw:          PASS -> FAIL (fdo#102252)

    igt@perf_pmu@enable-race-bcs0:
      shard-glk:          PASS -> INCOMPLETE (k.org#198133, fdo#103359)

    
    ==== Possible fixes ====

    igt@drv_selftest@live_gtt:
      shard-apl:          INCOMPLETE (fdo#103927) -> PASS

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-glk:          FAIL (fdo#102887) -> PASS

    igt@kms_flip@modeset-vs-vblank-race-interruptible:
      shard-glk:          FAIL (fdo#103060) -> PASS

    igt@kms_flip@plain-flip-ts-check:
      shard-hsw:          FAIL (fdo#100368) -> PASS +1

    igt@kms_flip@wf_vblank-ts-check-interruptible:
      shard-hsw:          FAIL (fdo#103928) -> PASS

    igt@kms_rotation_crc@sprite-rotation-90-pos-100-0:
      shard-apl:          FAIL (fdo#104724, fdo#103925) -> PASS +1

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
  fdo#102365 https://bugs.freedesktop.org/show_bug.cgi?id=102365
  fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
  fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103925 https://bugs.freedesktop.org/show_bug.cgi?id=103925
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
  fdo#104671 https://bugs.freedesktop.org/show_bug.cgi?id=104671
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (6 -> 5) ==

  Missing    (1): shard-kbl 


== Build changes ==

    * Linux: CI_DRM_4151 -> Patchwork_8927

  CI_DRM_4151: 176975baca37c5cdb1632b9aa2ba0170c9ab53df @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4464: 1bb318b32db003a377da14715c7b80675a712b6b @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8927: f11f14adf4e6010fc93ab44c0848ef846a522590 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4464: 33e58d5583eb7ed3966a1b905f875a1dfa959f6b @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8927/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (9 preceding siblings ...)
  2018-05-07 17:56 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-05-08  9:40 ` Tvrtko Ursulin
  2018-05-08  9:45   ` Chris Wilson
  2018-05-08 14:11 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2) Patchwork
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08  9:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/05/2018 14:57, Chris Wilson wrote:
> When called from process context tasklet_schedule() defers itself to
> ksoftirqd. From experience this may cause unacceptable latencies of over
> 200ms in executing the submission tasklet, our goal is to reprioritise
> the HW execution queue and trigger HW preemption immediately, so disable
> bh over the call to schedule and force the tasklet to run afterwards if
> scheduled.
> 
> v2: Keep rcu_read_lock() around for PREEMPT_RCU
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5ece6ae4bdff..89bf5d67cb74 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -578,10 +578,12 @@ static void __fence_set_priority(struct dma_fence *fence,
>   	rq = to_request(fence);
>   	engine = rq->engine;
>   
> -	rcu_read_lock();
> +	local_bh_disable();
> +	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
>   	if (engine->schedule)
>   		engine->schedule(rq, attr);
>   	rcu_read_unlock();
> +	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
>   }
>   
>   static void fence_set_priority(struct dma_fence *fence,
> 

Is the sequence not the wrong way around? I would expect disable 
preemption, then disable softirq, order for disable, on ofc opposite for 
enable.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority
  2018-05-08  9:40 ` [PATCH v2 1/7] " Tvrtko Ursulin
@ 2018-05-08  9:45   ` Chris Wilson
  2018-05-08  9:57     ` Tvrtko Ursulin
  0 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-05-08  9:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-08 10:40:43)
> 
> On 07/05/2018 14:57, Chris Wilson wrote:
> > When called from process context tasklet_schedule() defers itself to
> > ksoftirqd. From experience this may cause unacceptable latencies of over
> > 200ms in executing the submission tasklet, our goal is to reprioritise
> > the HW execution queue and trigger HW preemption immediately, so disable
> > bh over the call to schedule and force the tasklet to run afterwards if
> > scheduled.
> > 
> > v2: Keep rcu_read_lock() around for PREEMPT_RCU
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 5ece6ae4bdff..89bf5d67cb74 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -578,10 +578,12 @@ static void __fence_set_priority(struct dma_fence *fence,
> >       rq = to_request(fence);
> >       engine = rq->engine;
> >   
> > -     rcu_read_lock();
> > +     local_bh_disable();
> > +     rcu_read_lock(); /* RCU serialisation for set-wedged protection */
> >       if (engine->schedule)
> >               engine->schedule(rq, attr);
> >       rcu_read_unlock();
> > +     local_bh_enable(); /* kick the tasklets if queues were reprioritised */
> >   }
> >   
> >   static void fence_set_priority(struct dma_fence *fence,
> > 
> 
> Is the sequence not the wrong way around? I would expect disable 
> preemption, then disable softirq, order for disable, on ofc opposite for 
> enable.

We disable preemption (i.e. softirq) then mark ->schedule as being RCU
protected; unwrap and re-enable preemption (kicking softirq tasklets).

I felt it better to keep the RCU tight to ->schedule than let it wrap
local_bh_enable() suggesting that the tasklets might need additional
protection.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority
  2018-05-08  9:45   ` Chris Wilson
@ 2018-05-08  9:57     ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08  9:57 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/05/2018 10:45, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-08 10:40:43)
>>
>> On 07/05/2018 14:57, Chris Wilson wrote:
>>> When called from process context tasklet_schedule() defers itself to
>>> ksoftirqd. From experience this may cause unacceptable latencies of over
>>> 200ms in executing the submission tasklet, our goal is to reprioritise
>>> the HW execution queue and trigger HW preemption immediately, so disable
>>> bh over the call to schedule and force the tasklet to run afterwards if
>>> scheduled.
>>>
>>> v2: Keep rcu_read_lock() around for PREEMPT_RCU
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 5ece6ae4bdff..89bf5d67cb74 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -578,10 +578,12 @@ static void __fence_set_priority(struct dma_fence *fence,
>>>        rq = to_request(fence);
>>>        engine = rq->engine;
>>>    
>>> -     rcu_read_lock();
>>> +     local_bh_disable();
>>> +     rcu_read_lock(); /* RCU serialisation for set-wedged protection */
>>>        if (engine->schedule)
>>>                engine->schedule(rq, attr);
>>>        rcu_read_unlock();
>>> +     local_bh_enable(); /* kick the tasklets if queues were reprioritised */
>>>    }
>>>    
>>>    static void fence_set_priority(struct dma_fence *fence,
>>>
>>
>> Is the sequence not the wrong way around? I would expect disable
>> preemption, then disable softirq, order for disable, on ofc opposite for
>> enable.
> 
> We disable preemption (i.e. softirq) then mark ->schedule as being RCU
> protected; unwrap and re-enable preemption (kicking softirq tasklets).
> 
> I felt it better to keep the RCU tight to ->schedule than let it wrap
> local_bh_enable() suggesting that the tasklets might need additional
> protection.

Later I noticed than in a different thread Mika pointed out there is 
preemptible RCU as well so my argument about ordering falls apart a bit.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/7] drm/i915: Disable tasklet scheduling across initial scheduling
  2018-05-07 13:57 ` [PATCH v2 2/7] drm/i915: Disable tasklet scheduling across initial scheduling Chris Wilson
@ 2018-05-08 10:02   ` Tvrtko Ursulin
  2018-05-08 10:31     ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 10:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/05/2018 14:57, Chris Wilson wrote:
> During request submission, we call the engine->schedule() function so
> that we may reorder the active requests as required for inheriting the
> new request's priority. This may schedule several tasklets to run on the
> local CPU, but we will need to schedule the tasklets again for the new
> request. Delay all the local tasklets until the end, so that we only
> have to process the queue just once.
> 
> v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a
> superset of rcu_read_lock().
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index e4cf76ec14a6..f336942229cf 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1110,12 +1110,11 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
>   	 * decide whether to preempt the entire chain so that it is ready to
>   	 * run at the earliest possible convenience.
>   	 */
> -	rcu_read_lock();
> +	local_bh_disable();
> +	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
>   	if (engine->schedule)
>   		engine->schedule(request, &request->ctx->sched);
>   	rcu_read_unlock();
> -
> -	local_bh_disable();
>   	i915_sw_fence_commit(&request->submit);
>   	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>   
> 

Before I wasn't sure about this one, since it lengthens the softirq off 
section and still doesn't make the re-schedule atomic. But today I am 
thinking that in normal use dependency chains should not be that deep 
for it to become an issue. In the light of that, go for it:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-07 13:57 ` [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe Chris Wilson
@ 2018-05-08 10:10   ` Tvrtko Ursulin
  2018-05-08 10:24     ` Chris Wilson
  2018-05-08 17:38   ` Tvrtko Ursulin
  2018-05-08 17:45   ` Tvrtko Ursulin
  2 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 10:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/05/2018 14:57, Chris Wilson wrote:
> Prepare to allow the execlists submission to be run from underneath a
> hardirq timer context (and not just the current softirq context) as is
> required for fast preemption resets and context switches.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++----------
>   1 file changed, 29 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f9f4064dec0e..15c373ea5b7e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -357,10 +357,13 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
>   {
>   	struct intel_engine_cs *engine =
>   		container_of(execlists, typeof(*engine), execlists);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
>   
> -	spin_lock_irq(&engine->timeline.lock);
>   	__unwind_incomplete_requests(engine);
> -	spin_unlock_irq(&engine->timeline.lock);
> +
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>   
>   static inline void
> @@ -554,7 +557,7 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
>   	execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
>   }
>   
> -static void execlists_dequeue(struct intel_engine_cs *engine)
> +static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct execlist_port *port = execlists->port;
> @@ -564,6 +567,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	struct rb_node *rb;
>   	bool submit = false;
>   
> +	lockdep_assert_held(&engine->timeline.lock);
> +
>   	/* Hardware submission is through 2 ports. Conceptually each port
>   	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
>   	 * static for a context, and unique to each, so we only execute
> @@ -585,7 +590,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * and context switches) submission.
>   	 */
>   
> -	spin_lock_irq(&engine->timeline.lock);
>   	rb = execlists->first;
>   	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
>   
> @@ -600,7 +604,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   						EXECLISTS_ACTIVE_USER));
>   		GEM_BUG_ON(!port_count(&port[0]));
>   		if (port_count(&port[0]) > 1)
> -			goto unlock;
> +			return false;
>   
>   		/*
>   		 * If we write to ELSP a second time before the HW has had
> @@ -610,11 +614,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		 * the HW to indicate that it has had a chance to respond.
>   		 */
>   		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
> -			goto unlock;
> +			return false;
>   
>   		if (need_preempt(engine, last, execlists->queue_priority)) {
>   			inject_preempt_context(engine);
> -			goto unlock;
> +			return false;
>   		}
>   
>   		/*
> @@ -639,7 +643,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		 * priorities of the ports haven't been switch.
>   		 */
>   		if (port_count(&port[1]))
> -			goto unlock;
> +			return false;
>   
>   		/*
>   		 * WaIdleLiteRestore:bdw,skl
> @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	/* We must always keep the beast fed if we have work piled up */
>   	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
>   
> -unlock:
> -	spin_unlock_irq(&engine->timeline.lock);
> -
> -	if (submit) {
> +	/* Re-evaluate the executing context setup after each preemptive kick */
> +	if (last)
>   		execlists_user_begin(execlists, execlists->port);

Last can be non-null and submit false, so this is not equivalent.

By the looks of it makes no difference since it is OK to set the 
execlists user active bit multiple times. Even though the helper is 
called execlists_set_active_once. But the return value is not looked at.

Still, why not keep doing this when submit is true?

Regards,

Tvrtko

> +
> +	return submit;
> +}
> +
> +static void execlists_dequeue(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	unsigned long flags;
> +	bool submit;
> +
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
> +	submit = __execlists_dequeue(engine);
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +
> +	if (submit)
>   		execlists_submit_ports(engine);
> -	}
>   
>   	GEM_BUG_ON(port_isset(execlists->port) &&
>   		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 5/7] drm/i915/execlists: Direct submit onto idle engines
  2018-05-07 13:57 ` [PATCH v2 5/7] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
@ 2018-05-08 10:23   ` Tvrtko Ursulin
  2018-05-08 10:40     ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 10:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/05/2018 14:57, Chris Wilson wrote:
> Bypass using the tasklet to submit the first request to HW, as the
> tasklet may be deferred unto ksoftirqd and at a minimum will add in
> excess of 10us (and maybe tens of milliseconds) to our execution
> latency. This latency reduction is most notable when execution flows
> between engines.
> 
> v2: Beware handling preemption completion from the direct submit path as
> well.
> 
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_guc_submission.c | 12 +++-
>   drivers/gpu/drm/i915/intel_lrc.c            | 66 +++++++++++++++++----
>   drivers/gpu/drm/i915/intel_ringbuffer.h     |  7 +++
>   3 files changed, 69 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 2feb65096966..6bfe30af7826 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -754,14 +754,20 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
>   
>   static void guc_dequeue(struct intel_engine_cs *engine)
>   {
> -	unsigned long flags;
> +	unsigned long uninitialized_var(flags);
>   	bool submit;
>   
>   	local_irq_save(flags);
>   
> -	spin_lock(&engine->timeline.lock);
> +	GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN,
> +			     &engine->execlists.tasklet.state));

Soon it will be time for i915_tasklet. :)

> +	if (!intel_engine_direct_submit(engine))
> +		spin_lock(&engine->timeline.lock);

A bit ugly both on the conditional locking and using engine->flags for 
transient purposes.

Since you are locking the tasklet and own it (and open coding the call) 
completely when calling directly, you could just the same cheat and call 
a different function?

> +
>   	submit = __guc_dequeue(engine);
> -	spin_unlock(&engine->timeline.lock);
> +
> +	if (!intel_engine_direct_submit(engine))
> +		spin_unlock(&engine->timeline.lock);
>   
>   	if (submit)
>   		guc_submit(engine);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 15c373ea5b7e..ac7c5edee4ee 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -357,13 +357,16 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
>   {
>   	struct intel_engine_cs *engine =
>   		container_of(execlists, typeof(*engine), execlists);
> -	unsigned long flags;
> +	unsigned long uninitialized_var(flags);
>   
> -	spin_lock_irqsave(&engine->timeline.lock, flags);
> +	GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN, &execlists->tasklet.state));
> +	if (!intel_engine_direct_submit(engine))
> +		spin_lock_irqsave(&engine->timeline.lock, flags);
>   
>   	__unwind_incomplete_requests(engine);
>   
> -	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +	if (!intel_engine_direct_submit(engine))
> +		spin_unlock_irqrestore(&engine->timeline.lock, flags);

Hm ok yes, this one would be a problem..

Maybe at least use some bit under execlists state instead of engine flags?

Regards,

Tvrtko

>   }
>   
>   static inline void
> @@ -602,6 +605,8 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   		 */
>   		GEM_BUG_ON(!execlists_is_active(execlists,
>   						EXECLISTS_ACTIVE_USER));
> +		GEM_BUG_ON(execlists_is_active(execlists,
> +					       EXECLISTS_ACTIVE_PREEMPT));
>   		GEM_BUG_ON(!port_count(&port[0]));
>   		if (port_count(&port[0]) > 1)
>   			return false;
> @@ -758,12 +763,17 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   static void execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	unsigned long flags;
> +	unsigned long uninitialized_var(flags);
>   	bool submit;
>   
> -	spin_lock_irqsave(&engine->timeline.lock, flags);
> +	GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN, &execlists->tasklet.state));
> +	if (!intel_engine_direct_submit(engine))
> +		spin_lock_irqsave(&engine->timeline.lock, flags);
> +
>   	submit = __execlists_dequeue(engine);
> -	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +
> +	if (!intel_engine_direct_submit(engine))
> +		spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   
>   	if (submit)
>   		execlists_submit_ports(engine);
> @@ -1163,16 +1173,45 @@ static void queue_request(struct intel_engine_cs *engine,
>   		      &lookup_priolist(engine, node, prio)->requests);
>   }
>   
> -static void __submit_queue(struct intel_engine_cs *engine, int prio)
> +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
>   {
>   	engine->execlists.queue_priority = prio;
> +}
> +
> +static void __schedule_queue(struct intel_engine_cs *engine)
> +{
>   	tasklet_hi_schedule(&engine->execlists.tasklet);
>   }
>   
> +static void __submit_queue(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	struct tasklet_struct * const t = &execlists->tasklet;
> +
> +	GEM_BUG_ON(!engine->i915->gt.awake);
> +
> +	/* If inside GPU reset, the tasklet will be queued later. */
> +	if (unlikely(atomic_read(&t->count)))
> +		return;
> +
> +	/* Directly submit the first request to reduce the initial latency */
> +	if (!port_isset(execlists->port) && tasklet_trylock(t)) {
> +		engine->flags |= I915_ENGINE_DIRECT_SUBMIT;
> +		t->func(t->data);
> +		engine->flags &= ~I915_ENGINE_DIRECT_SUBMIT;
> +		tasklet_unlock(t);
> +		return;
> +	}
> +
> +	__schedule_queue(engine);
> +}
> +
>   static void submit_queue(struct intel_engine_cs *engine, int prio)
>   {
> -	if (prio > engine->execlists.queue_priority)
> -		__submit_queue(engine, prio);
> +	if (prio > engine->execlists.queue_priority) {
> +		__wakeup_queue(engine, prio);
> +		__submit_queue(engine);
> +	}
>   }
>   
>   static void execlists_submit_request(struct i915_request *request)
> @@ -1184,10 +1223,9 @@ static void execlists_submit_request(struct i915_request *request)
>   	spin_lock_irqsave(&engine->timeline.lock, flags);
>   
>   	queue_request(engine, &request->sched, rq_prio(request));
> -	submit_queue(engine, rq_prio(request));
> -
>   	GEM_BUG_ON(!engine->execlists.first);
>   	GEM_BUG_ON(list_empty(&request->sched.link));
> +	submit_queue(engine, rq_prio(request));
>   
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
> @@ -1309,8 +1347,10 @@ static void execlists_schedule(struct i915_request *request,
>   		}
>   
>   		if (prio > engine->execlists.queue_priority &&
> -		    i915_sw_fence_done(&sched_to_request(node)->submit))
> -			__submit_queue(engine, prio);
> +		    i915_sw_fence_done(&sched_to_request(node)->submit)) {
> +			__wakeup_queue(engine, prio);
> +			__schedule_queue(engine);
> +		}
>   	}
>   
>   	spin_unlock_irq(&engine->timeline.lock);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 010750e8ee44..f5545391d76a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -569,6 +569,7 @@ struct intel_engine_cs {
>   #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
>   #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
>   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> +#define I915_ENGINE_DIRECT_SUBMIT    BIT(3)
>   	unsigned int flags;
>   
>   	/*
> @@ -646,6 +647,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
>   }
>   
> +static inline bool
> +intel_engine_direct_submit(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_DIRECT_SUBMIT;
> +}
> +
>   static inline bool __execlists_need_preempt(int prio, int last)
>   {
>   	return prio > max(0, last);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-08 10:10   ` Tvrtko Ursulin
@ 2018-05-08 10:24     ` Chris Wilson
  2018-05-08 10:56       ` Tvrtko Ursulin
  0 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-05-08 10:24 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-08 11:10:41)
> On 07/05/2018 14:57, Chris Wilson wrote:
> > @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >       /* We must always keep the beast fed if we have work piled up */
> >       GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
> >   
> > -unlock:
> > -     spin_unlock_irq(&engine->timeline.lock);
> > -
> > -     if (submit) {
> > +     /* Re-evaluate the executing context setup after each preemptive kick */
> > +     if (last)
> >               execlists_user_begin(execlists, execlists->port);
> 
> Last can be non-null and submit false, so this is not equivalent.
> 
> By the looks of it makes no difference since it is OK to set the 
> execlists user active bit multiple times. Even though the helper is 
> called execlists_set_active_once. But the return value is not looked at.
> 
> Still, why not keep doing this when submit is true?

It's a subtle difference, in that we want the context reevaluated every
time we kick the queue as we may have changed state that we want to
reload, and not just ELSP. Sometimes we need inheritance of more than
just priority...
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/7] drm/i915: Disable tasklet scheduling across initial scheduling
  2018-05-08 10:02   ` Tvrtko Ursulin
@ 2018-05-08 10:31     ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-08 10:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-08 11:02:38)
> 
> On 07/05/2018 14:57, Chris Wilson wrote:
> > During request submission, we call the engine->schedule() function so
> > that we may reorder the active requests as required for inheriting the
> > new request's priority. This may schedule several tasklets to run on the
> > local CPU, but we will need to schedule the tasklets again for the new
> > request. Delay all the local tasklets until the end, so that we only
> > have to process the queue just once.
> > 
> > v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a
> > superset of rcu_read_lock().
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index e4cf76ec14a6..f336942229cf 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1110,12 +1110,11 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
> >        * decide whether to preempt the entire chain so that it is ready to
> >        * run at the earliest possible convenience.
> >        */
> > -     rcu_read_lock();
> > +     local_bh_disable();
> > +     rcu_read_lock(); /* RCU serialisation for set-wedged protection */
> >       if (engine->schedule)
> >               engine->schedule(request, &request->ctx->sched);
> >       rcu_read_unlock();
> > -
> > -     local_bh_disable();
> >       i915_sw_fence_commit(&request->submit);
> >       local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
> >   
> > 
> 
> Before I wasn't sure about this one, since it lengthens the softirq off 
> section and still doesn't make the re-schedule atomic. But today I am 
> thinking that in normal use dependency chains should not be that deep 
> for it to become an issue. In the light of that, go for it:

We have big problems when the dependency chain grows, simply because we
recursively build a linear list and then walk it again. It's not the
worst nightmare in the code, and I don't think anyone has complained
about it yet, but we do have a few pathological igt (gem_exec_schedule
deep/wide) to show that it can be the ratelimiting step.

As far as blocking tasklets across this; only the local tasklet is
blocked and for good reason as we reorder the queues. A tasklet on
another CPU is also (mostly) blocked by the spinlock (and local irq
off). Overall I don't think the problem is exacerbated at all by the
local_bh_disable().
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 5/7] drm/i915/execlists: Direct submit onto idle engines
  2018-05-08 10:23   ` Tvrtko Ursulin
@ 2018-05-08 10:40     ` Chris Wilson
  2018-05-08 11:00       ` Tvrtko Ursulin
  0 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-05-08 10:40 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-08 11:23:09)
> 
> On 07/05/2018 14:57, Chris Wilson wrote:
> > Bypass using the tasklet to submit the first request to HW, as the
> > tasklet may be deferred unto ksoftirqd and at a minimum will add in
> > excess of 10us (and maybe tens of milliseconds) to our execution
> > latency. This latency reduction is most notable when execution flows
> > between engines.
> > 
> > v2: Beware handling preemption completion from the direct submit path as
> > well.
> > 
> > Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_guc_submission.c | 12 +++-
> >   drivers/gpu/drm/i915/intel_lrc.c            | 66 +++++++++++++++++----
> >   drivers/gpu/drm/i915/intel_ringbuffer.h     |  7 +++
> >   3 files changed, 69 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> > index 2feb65096966..6bfe30af7826 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> > @@ -754,14 +754,20 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
> >   
> >   static void guc_dequeue(struct intel_engine_cs *engine)
> >   {
> > -     unsigned long flags;
> > +     unsigned long uninitialized_var(flags);
> >       bool submit;
> >   
> >       local_irq_save(flags);
> >   
> > -     spin_lock(&engine->timeline.lock);
> > +     GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN,
> > +                          &engine->execlists.tasklet.state));
> 
> Soon it will be time for i915_tasklet. :)
> 
> > +     if (!intel_engine_direct_submit(engine))
> > +             spin_lock(&engine->timeline.lock);
> 
> A bit ugly both on the conditional locking and using engine->flags for 
> transient purposes.
> 
> Since you are locking the tasklet and own it (and open coding the call) 
> completely when calling directly, you could just the same cheat and call 
> a different function?

My first attempt was to call __execlists_dequeue() directly and not
tasklet->func(). But that then has this nasty
  if (tasklet->func == execlists_submission_tasklet)
or some such in the middle of otherwise generic code.
https://patchwork.freedesktop.org/patch/221105/

I was less happy about that. At least this does have the making of
something more generic like i915_tasklet ;)

> >       submit = __guc_dequeue(engine);
> > -     spin_unlock(&engine->timeline.lock);
> > +
> > +     if (!intel_engine_direct_submit(engine))
> > +             spin_unlock(&engine->timeline.lock);
> >   
> >       if (submit)
> >               guc_submit(engine);
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 15c373ea5b7e..ac7c5edee4ee 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -357,13 +357,16 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
> >   {
> >       struct intel_engine_cs *engine =
> >               container_of(execlists, typeof(*engine), execlists);
> > -     unsigned long flags;
> > +     unsigned long uninitialized_var(flags);
> >   
> > -     spin_lock_irqsave(&engine->timeline.lock, flags);
> > +     GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN, &execlists->tasklet.state));
> > +     if (!intel_engine_direct_submit(engine))
> > +             spin_lock_irqsave(&engine->timeline.lock, flags);
> >   
> >       __unwind_incomplete_requests(engine);
> >   
> > -     spin_unlock_irqrestore(&engine->timeline.lock, flags);
> > +     if (!intel_engine_direct_submit(engine))
> > +             spin_unlock_irqrestore(&engine->timeline.lock, flags);
> 
> Hm ok yes, this one would be a problem..
> 
> Maybe at least use some bit under execlists state instead of engine flags?

But I have engine->flags :-p Could I steal a bit from tasklet.state? I
tend to get funny looks everytime I ask for TASKLET_STATE_USER ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 6/7] drm/i915/execlists: Direct submission from irq handler
  2018-05-07 13:57 ` [PATCH v2 6/7] drm/i915/execlists: Direct submission from irq handler Chris Wilson
@ 2018-05-08 10:54   ` Tvrtko Ursulin
  2018-05-08 11:10     ` Chris Wilson
  2018-05-08 12:17   ` [PATCH] " Chris Wilson
  1 sibling, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 10:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/05/2018 14:57, Chris Wilson wrote:
> Continuing the themem of bypassing ksoftirqd latency, also first try to
> directly submit from the CS interrupt handler to clear the ELSP and
> queue the next.
> 
> In the past, we have been hesitant to do this as the context switch
> processing has been quite heavy, requiring forcewaked mmio. However, as
> we now can read the GPU state from the cacheable HWSP, it is relatively
> cheap!
> 
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_irq.c             | 13 ++++++-------
>   drivers/gpu/drm/i915/intel_guc_submission.c |  2 ++
>   drivers/gpu/drm/i915/intel_ringbuffer.h     | 16 ++++++++++++++++
>   3 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f9bc3aaa90d0..775cf167d938 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1465,19 +1465,18 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	bool tasklet = false;
>   
> -	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
> -		if (READ_ONCE(engine->execlists.active))
> -			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> -						    &engine->irq_posted);
> -	}
> +	if (iir & GT_CONTEXT_SWITCH_INTERRUPT && READ_ONCE(execlists->active))
> +		tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> +					    &engine->irq_posted);
>   
>   	if (iir & GT_RENDER_USER_INTERRUPT) {
>   		notify_ring(engine);
> -		tasklet |= USES_GUC_SUBMISSION(engine->i915);
> +		if (!test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
> +			tasklet = USES_GUC_SUBMISSION(engine->i915);

I don't understand this change. In the GuC case IRQ_EXECLISTS is never 
set so the conditional is pointeless. In execlist mode user interrupt 
has nothing to do with scheduling the tasklet.

>   	}
>   
>   	if (tasklet)
> -		tasklet_hi_schedule(&execlists->tasklet);
> +		execlists_tasklet(execlists);
>   }
>   
>   static void gen8_gt_irq_ack(struct drm_i915_private *i915,
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 6bfe30af7826..7d4542b46f5e 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -782,6 +782,8 @@ static void guc_submission_tasklet(unsigned long data)
>   	struct execlist_port *port = execlists->port;
>   	struct i915_request *rq;
>   
> +	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +

I don't understand this either - there is no changed code path which 
sets this in GuC mode.

Regards,

Tvrtko

>   	rq = port_request(port);
>   	while (rq && i915_request_completed(rq)) {
>   		trace_i915_request_out(rq);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index f5545391d76a..da7e00ff2c6b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -717,6 +717,22 @@ execlists_port_complete(struct intel_engine_execlists * const execlists,
>   	return port;
>   }
>   
> +static inline void
> +execlists_tasklet(struct intel_engine_execlists * const execlists)
> +{
> +	struct tasklet_struct * const t = &execlists->tasklet;
> +
> +	if (unlikely(atomic_read(&t->count))) /* GPU reset active */
> +		return;
> +
> +	if (tasklet_trylock(t)) {
> +		t->func(t->data);
> +		tasklet_unlock(t);
> +	} else {
> +		tasklet_hi_schedule(t);
> +	}
> +}
> +
>   static inline unsigned int
>   intel_engine_flag(const struct intel_engine_cs *engine)
>   {
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-08 10:24     ` Chris Wilson
@ 2018-05-08 10:56       ` Tvrtko Ursulin
  2018-05-08 11:05         ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 10:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/05/2018 11:24, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-08 11:10:41)
>> On 07/05/2018 14:57, Chris Wilson wrote:
>>> @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>        /* We must always keep the beast fed if we have work piled up */
>>>        GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
>>>    
>>> -unlock:
>>> -     spin_unlock_irq(&engine->timeline.lock);
>>> -
>>> -     if (submit) {
>>> +     /* Re-evaluate the executing context setup after each preemptive kick */
>>> +     if (last)
>>>                execlists_user_begin(execlists, execlists->port);
>>
>> Last can be non-null and submit false, so this is not equivalent.
>>
>> By the looks of it makes no difference since it is OK to set the
>> execlists user active bit multiple times. Even though the helper is
>> called execlists_set_active_once. But the return value is not looked at.
>>
>> Still, why not keep doing this when submit is true?
> 
> It's a subtle difference, in that we want the context reevaluated every
> time we kick the queue as we may have changed state that we want to
> reload, and not just ELSP. Sometimes we need inheritance of more than
> just priority...

What do you mean by context re-evaluated?

ACTIVE_USER is set from first to last request, no? I don't understand 
what would change if you would set it multiple times while it is already 
set.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 5/7] drm/i915/execlists: Direct submit onto idle engines
  2018-05-08 10:40     ` Chris Wilson
@ 2018-05-08 11:00       ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 11:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/05/2018 11:40, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-08 11:23:09)
>>
>> On 07/05/2018 14:57, Chris Wilson wrote:
>>> Bypass using the tasklet to submit the first request to HW, as the
>>> tasklet may be deferred unto ksoftirqd and at a minimum will add in
>>> excess of 10us (and maybe tens of milliseconds) to our execution
>>> latency. This latency reduction is most notable when execution flows
>>> between engines.
>>>
>>> v2: Beware handling preemption completion from the direct submit path as
>>> well.
>>>
>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/intel_guc_submission.c | 12 +++-
>>>    drivers/gpu/drm/i915/intel_lrc.c            | 66 +++++++++++++++++----
>>>    drivers/gpu/drm/i915/intel_ringbuffer.h     |  7 +++
>>>    3 files changed, 69 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
>>> index 2feb65096966..6bfe30af7826 100644
>>> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
>>> @@ -754,14 +754,20 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
>>>    
>>>    static void guc_dequeue(struct intel_engine_cs *engine)
>>>    {
>>> -     unsigned long flags;
>>> +     unsigned long uninitialized_var(flags);
>>>        bool submit;
>>>    
>>>        local_irq_save(flags);
>>>    
>>> -     spin_lock(&engine->timeline.lock);
>>> +     GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN,
>>> +                          &engine->execlists.tasklet.state));
>>
>> Soon it will be time for i915_tasklet. :)
>>
>>> +     if (!intel_engine_direct_submit(engine))
>>> +             spin_lock(&engine->timeline.lock);
>>
>> A bit ugly both on the conditional locking and using engine->flags for
>> transient purposes.
>>
>> Since you are locking the tasklet and own it (and open coding the call)
>> completely when calling directly, you could just the same cheat and call
>> a different function?
> 
> My first attempt was to call __execlists_dequeue() directly and not
> tasklet->func(). But that then has this nasty
>    if (tasklet->func == execlists_submission_tasklet)

I thought not call the t->func but func directly, well a special flavour 
of the func. But the unwind as noticed a bit later is the only one which 
throws the spanner in those works.

Unfortunately I have no ideas at the moment on how to elegantly solve that.

> or some such in the middle of otherwise generic code.
> https://patchwork.freedesktop.org/patch/221105/
> 
> I was less happy about that. At least this does have the making of
> something more generic like i915_tasklet ;)
> 
>>>        submit = __guc_dequeue(engine);
>>> -     spin_unlock(&engine->timeline.lock);
>>> +
>>> +     if (!intel_engine_direct_submit(engine))
>>> +             spin_unlock(&engine->timeline.lock);
>>>    
>>>        if (submit)
>>>                guc_submit(engine);
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 15c373ea5b7e..ac7c5edee4ee 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -357,13 +357,16 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
>>>    {
>>>        struct intel_engine_cs *engine =
>>>                container_of(execlists, typeof(*engine), execlists);
>>> -     unsigned long flags;
>>> +     unsigned long uninitialized_var(flags);
>>>    
>>> -     spin_lock_irqsave(&engine->timeline.lock, flags);
>>> +     GEM_BUG_ON(!test_bit(TASKLET_STATE_RUN, &execlists->tasklet.state));
>>> +     if (!intel_engine_direct_submit(engine))
>>> +             spin_lock_irqsave(&engine->timeline.lock, flags);
>>>    
>>>        __unwind_incomplete_requests(engine);
>>>    
>>> -     spin_unlock_irqrestore(&engine->timeline.lock, flags);
>>> +     if (!intel_engine_direct_submit(engine))
>>> +             spin_unlock_irqrestore(&engine->timeline.lock, flags);
>>
>> Hm ok yes, this one would be a problem..
>>
>> Maybe at least use some bit under execlists state instead of engine flags?
> 
> But I have engine->flags :-p Could I steal a bit from tasklet.state? I > tend to get funny looks everytime I ask for TASKLET_STATE_USER ;)

We intended engine->flags to be stable for engine lifetime 
(effectively). So I don't like using it for this. Put a new flag/boolean 
to intel_execlists_state?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-08 10:56       ` Tvrtko Ursulin
@ 2018-05-08 11:05         ` Chris Wilson
  2018-05-08 11:38           ` Tvrtko Ursulin
  0 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-05-08 11:05 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-08 11:56:37)
> 
> On 08/05/2018 11:24, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-08 11:10:41)
> >> On 07/05/2018 14:57, Chris Wilson wrote:
> >>> @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >>>        /* We must always keep the beast fed if we have work piled up */
> >>>        GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
> >>>    
> >>> -unlock:
> >>> -     spin_unlock_irq(&engine->timeline.lock);
> >>> -
> >>> -     if (submit) {
> >>> +     /* Re-evaluate the executing context setup after each preemptive kick */
> >>> +     if (last)
> >>>                execlists_user_begin(execlists, execlists->port);
> >>
> >> Last can be non-null and submit false, so this is not equivalent.
> >>
> >> By the looks of it makes no difference since it is OK to set the
> >> execlists user active bit multiple times. Even though the helper is
> >> called execlists_set_active_once. But the return value is not looked at.
> >>
> >> Still, why not keep doing this when submit is true?
> > 
> > It's a subtle difference, in that we want the context reevaluated every
> > time we kick the queue as we may have changed state that we want to
> > reload, and not just ELSP. Sometimes we need inheritance of more than
> > just priority...
> 
> What do you mean by context re-evaluated?
> 
> ACTIVE_USER is set from first to last request, no? I don't understand 
> what would change if you would set it multiple times while it is already 
> set.

It's not about ACTIVE_USER. This is a hook to indicate a change in
context state being executed on the GPU. It's to be hooked into by the
cpufreq/pstate code, gpufreq code, etc. Later realisation is that they
need to be notified for all context changes here.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 6/7] drm/i915/execlists: Direct submission from irq handler
  2018-05-08 10:54   ` Tvrtko Ursulin
@ 2018-05-08 11:10     ` Chris Wilson
  2018-05-08 11:53       ` Tvrtko Ursulin
  0 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-05-08 11:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-08 11:54:27)
> 
> On 07/05/2018 14:57, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index f9bc3aaa90d0..775cf167d938 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1465,19 +1465,18 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
> >       struct intel_engine_execlists * const execlists = &engine->execlists;
> >       bool tasklet = false;
> >   
> > -     if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
> > -             if (READ_ONCE(engine->execlists.active))
> > -                     tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> > -                                                 &engine->irq_posted);
> > -     }
> > +     if (iir & GT_CONTEXT_SWITCH_INTERRUPT && READ_ONCE(execlists->active))
> > +             tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> > +                                         &engine->irq_posted);
> >   
> >       if (iir & GT_RENDER_USER_INTERRUPT) {
> >               notify_ring(engine);
> > -             tasklet |= USES_GUC_SUBMISSION(engine->i915);
> > +             if (!test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
> > +                     tasklet = USES_GUC_SUBMISSION(engine->i915);
> 
> I don't understand this change. In the GuC case IRQ_EXECLISTS is never 
> set so the conditional is pointeless. In execlist mode user interrupt 
> has nothing to do with scheduling the tasklet.

Because notify_ring() may have just executed the tasklet and cleared the
bit from irq_posted. I didn't want to then do a second dequeue.

> >   static void gen8_gt_irq_ack(struct drm_i915_private *i915,
> > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> > index 6bfe30af7826..7d4542b46f5e 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> > @@ -782,6 +782,8 @@ static void guc_submission_tasklet(unsigned long data)
> >       struct execlist_port *port = execlists->port;
> >       struct i915_request *rq;
> >   
> > +     clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> > +
> 
> I don't understand this either - there is no changed code path which 
> sets this in GuC mode.

The guc may takeover with the bit set. And since we aren't particularly
careful with parking before takeover, it was prudent to always clear it
here as a direct analogue to the execlists context switch handler.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-08 11:05         ` Chris Wilson
@ 2018-05-08 11:38           ` Tvrtko Ursulin
  2018-05-08 11:43             ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 11:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/05/2018 12:05, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-08 11:56:37)
>>
>> On 08/05/2018 11:24, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-05-08 11:10:41)
>>>> On 07/05/2018 14:57, Chris Wilson wrote:
>>>>> @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>>>         /* We must always keep the beast fed if we have work piled up */
>>>>>         GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
>>>>>     
>>>>> -unlock:
>>>>> -     spin_unlock_irq(&engine->timeline.lock);
>>>>> -
>>>>> -     if (submit) {
>>>>> +     /* Re-evaluate the executing context setup after each preemptive kick */
>>>>> +     if (last)
>>>>>                 execlists_user_begin(execlists, execlists->port);
>>>>
>>>> Last can be non-null and submit false, so this is not equivalent.
>>>>
>>>> By the looks of it makes no difference since it is OK to set the
>>>> execlists user active bit multiple times. Even though the helper is
>>>> called execlists_set_active_once. But the return value is not looked at.
>>>>
>>>> Still, why not keep doing this when submit is true?
>>>
>>> It's a subtle difference, in that we want the context reevaluated every
>>> time we kick the queue as we may have changed state that we want to
>>> reload, and not just ELSP. Sometimes we need inheritance of more than
>>> just priority...
>>
>> What do you mean by context re-evaluated?
>>
>> ACTIVE_USER is set from first to last request, no? I don't understand
>> what would change if you would set it multiple times while it is already
>> set.
> 
> It's not about ACTIVE_USER. This is a hook to indicate a change in
> context state being executed on the GPU. It's to be hooked into by the
> cpufreq/pstate code, gpufreq code, etc. Later realisation is that they
> need to be notified for all context changes here.

Ok so nothing as such relating to making tasklets hardirq safe.

Is the execlists_user_begin still the right name then?

Any future use for execlists_set_active_once or we could drop it 
(replacing with execlists_set_active)?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-08 11:38           ` Tvrtko Ursulin
@ 2018-05-08 11:43             ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-08 11:43 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-08 12:38:06)
> 
> On 08/05/2018 12:05, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-08 11:56:37)
> >>
> >> On 08/05/2018 11:24, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-05-08 11:10:41)
> >>>> On 07/05/2018 14:57, Chris Wilson wrote:
> >>>>> @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >>>>>         /* We must always keep the beast fed if we have work piled up */
> >>>>>         GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
> >>>>>     
> >>>>> -unlock:
> >>>>> -     spin_unlock_irq(&engine->timeline.lock);
> >>>>> -
> >>>>> -     if (submit) {
> >>>>> +     /* Re-evaluate the executing context setup after each preemptive kick */
> >>>>> +     if (last)
> >>>>>                 execlists_user_begin(execlists, execlists->port);
> >>>>
> >>>> Last can be non-null and submit false, so this is not equivalent.
> >>>>
> >>>> By the looks of it makes no difference since it is OK to set the
> >>>> execlists user active bit multiple times. Even though the helper is
> >>>> called execlists_set_active_once. But the return value is not looked at.
> >>>>
> >>>> Still, why not keep doing this when submit is true?
> >>>
> >>> It's a subtle difference, in that we want the context reevaluated every
> >>> time we kick the queue as we may have changed state that we want to
> >>> reload, and not just ELSP. Sometimes we need inheritance of more than
> >>> just priority...
> >>
> >> What do you mean by context re-evaluated?
> >>
> >> ACTIVE_USER is set from first to last request, no? I don't understand
> >> what would change if you would set it multiple times while it is already
> >> set.
> > 
> > It's not about ACTIVE_USER. This is a hook to indicate a change in
> > context state being executed on the GPU. It's to be hooked into by the
> > cpufreq/pstate code, gpufreq code, etc. Later realisation is that they
> > need to be notified for all context changes here.
> 
> Ok so nothing as such relating to making tasklets hardirq safe.

Ssh.
 
> Is the execlists_user_begin still the right name then?

It's never been quite the right name. Just the best I could come up
with. execlists_user_busy()/execlists_user_idle() might be more
sensible.
 
> Any future use for execlists_set_active_once or we could drop it 
> (replacing with execlists_set_active)?

set_active_once is intended to be used by the cpufreq/pstate code, or
anything else that only cares about the off->on/on->off transition and
not every reload. I'm waiting for that code to be merged...
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 6/7] drm/i915/execlists: Direct submission from irq handler
  2018-05-08 11:10     ` Chris Wilson
@ 2018-05-08 11:53       ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 11:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/05/2018 12:10, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-08 11:54:27)
>>
>> On 07/05/2018 14:57, Chris Wilson wrote:
>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>> index f9bc3aaa90d0..775cf167d938 100644
>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>> @@ -1465,19 +1465,18 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>>>        struct intel_engine_execlists * const execlists = &engine->execlists;
>>>        bool tasklet = false;
>>>    
>>> -     if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
>>> -             if (READ_ONCE(engine->execlists.active))
>>> -                     tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
>>> -                                                 &engine->irq_posted);
>>> -     }
>>> +     if (iir & GT_CONTEXT_SWITCH_INTERRUPT && READ_ONCE(execlists->active))
>>> +             tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
>>> +                                         &engine->irq_posted);
>>>    
>>>        if (iir & GT_RENDER_USER_INTERRUPT) {
>>>                notify_ring(engine);
>>> -             tasklet |= USES_GUC_SUBMISSION(engine->i915);
>>> +             if (!test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
>>> +                     tasklet = USES_GUC_SUBMISSION(engine->i915);
>>
>> I don't understand this change. In the GuC case IRQ_EXECLISTS is never
>> set so the conditional is pointeless. In execlist mode user interrupt
>> has nothing to do with scheduling the tasklet.
> 
> Because notify_ring() may have just executed the tasklet and cleared the
> bit from irq_posted. I didn't want to then do a second dequeue.

But IRQ_EXECLISTS is never set in GuC mode. So set-if-clear in this case 
is equivalent to unconditional-or.

If you want to clear the tasklet bool in execlist mode then this is 
extremely non-obvious. More readable owuld be something like

if (iir & USER_IRQ)
	tasklet = notify_ring(...) ? 0 : USES_GUC(...);

Where notify_ring would return true if it signalled anything.

We wouldn't know though if that means the tasklet actually ran. :(

Oh well.. put a comment please, because it really is non-obvious.

>>>    static void gen8_gt_irq_ack(struct drm_i915_private *i915,
>>> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
>>> index 6bfe30af7826..7d4542b46f5e 100644
>>> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
>>> @@ -782,6 +782,8 @@ static void guc_submission_tasklet(unsigned long data)
>>>        struct execlist_port *port = execlists->port;
>>>        struct i915_request *rq;
>>>    
>>> +     clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>>> +
>>
>> I don't understand this either - there is no changed code path which
>> sets this in GuC mode.
> 
> The guc may takeover with the bit set. And since we aren't particularly
> careful with parking before takeover, it was prudent to always clear it
> here as a direct analogue to the execlists context switch handler.

Okay.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] drm/i915/execlists: Direct submission from irq handler
  2018-05-07 13:57 ` [PATCH v2 6/7] drm/i915/execlists: Direct submission from irq handler Chris Wilson
  2018-05-08 10:54   ` Tvrtko Ursulin
@ 2018-05-08 12:17   ` Chris Wilson
  1 sibling, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-08 12:17 UTC (permalink / raw)
  To: intel-gfx

Continuing the themem of bypassing ksoftirqd latency, also first try to
directly submit from the CS interrupt handler to clear the ELSP and
queue the next.

In the past, we have been hesitant to do this as the context switch
processing has been quite heavy, requiring forcewaked mmio. However, as
we now can read the GPU state from the cacheable HWSP, it is relatively
cheap!

v2: Explain why we test_bit(IRQ_EXECLIST) after doing notify_ring (it's
because the notify_ring() may itself trigger direct submission clearing
the bit)

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c             | 21 ++++++++++++++-------
 drivers/gpu/drm/i915/intel_guc_submission.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h     | 16 ++++++++++++++++
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f9bc3aaa90d0..eecfeabbb006 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1465,19 +1465,26 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	bool tasklet = false;
 
-	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-		if (READ_ONCE(engine->execlists.active))
-			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-						    &engine->irq_posted);
-	}
+	if (iir & GT_CONTEXT_SWITCH_INTERRUPT && READ_ONCE(execlists->active))
+		tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
+					    &engine->irq_posted);
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
 		notify_ring(engine);
-		tasklet |= USES_GUC_SUBMISSION(engine->i915);
+		/*
+		 * notify_ring() may trigger direct submission onto this
+		 * engine, clearing the ENGINE_IRQ_EXECLIST bit. In that
+		 * case, we don't want to resubmit and so clear the tasklet
+		 * boolean. GuC never sets the ENGINE_IRQ_EXECLIST bit and
+		 * so when using the GuC this equates to an unconditional
+		 * setting of tasklet to true.
+		 */
+		if (!test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+			tasklet = USES_GUC_SUBMISSION(engine->i915);
 	}
 
 	if (tasklet)
-		tasklet_hi_schedule(&execlists->tasklet);
+		execlists_tasklet(execlists);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 6bfe30af7826..7d4542b46f5e 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -782,6 +782,8 @@ static void guc_submission_tasklet(unsigned long data)
 	struct execlist_port *port = execlists->port;
 	struct i915_request *rq;
 
+	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+
 	rq = port_request(port);
 	while (rq && i915_request_completed(rq)) {
 		trace_i915_request_out(rq);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f5545391d76a..da7e00ff2c6b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -717,6 +717,22 @@ execlists_port_complete(struct intel_engine_execlists * const execlists,
 	return port;
 }
 
+static inline void
+execlists_tasklet(struct intel_engine_execlists * const execlists)
+{
+	struct tasklet_struct * const t = &execlists->tasklet;
+
+	if (unlikely(atomic_read(&t->count))) /* GPU reset active */
+		return;
+
+	if (tasklet_trylock(t)) {
+		t->func(t->data);
+		tasklet_unlock(t);
+	} else {
+		tasklet_hi_schedule(t);
+	}
+}
+
 static inline unsigned int
 intel_engine_flag(const struct intel_engine_cs *engine)
 {
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2)
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (10 preceding siblings ...)
  2018-05-08  9:40 ` [PATCH v2 1/7] " Tvrtko Ursulin
@ 2018-05-08 14:11 ` Patchwork
  2018-05-08 14:13 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-08 14:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2)
URL   : https://patchwork.freedesktop.org/series/42815/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
699def5526fd drm/i915: Flush submission tasklet after bumping priority
bda8b3054416 drm/i915: Disable tasklet scheduling across initial scheduling
ebbf8fabee33 drm/i915/execlists: Make submission tasklet hardirq safe
24e925250527 drm/i915/guc: Make submission tasklet hardirq safe
d560ae29a13f drm/i915/execlists: Direct submit onto idle engines
-:28: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#28: FILE: drivers/gpu/drm/i915/intel_guc_submission.c:757:
+	unsigned long uninitialized_var(flags);

-:56: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#56: FILE: drivers/gpu/drm/i915/intel_lrc.c:359:
+	unsigned long uninitialized_var(flags);

-:85: WARNING:FUNCTION_ARGUMENTS: function definition argument 'flags' should also have an identifier name
#85: FILE: drivers/gpu/drm/i915/intel_lrc.c:765:
+	unsigned long uninitialized_var(flags);

total: 0 errors, 3 warnings, 0 checks, 160 lines checked
74c81231cbb7 drm/i915/execlists: Direct submission from irq handler
3882e58fc20a drm/i915: Speed up idle detection by kicking the tasklets

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2)
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (11 preceding siblings ...)
  2018-05-08 14:11 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2) Patchwork
@ 2018-05-08 14:13 ` Patchwork
  2018-05-08 14:28 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-05-08 16:27 ` ✓ Fi.CI.IGT: " Patchwork
  14 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-08 14:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2)
URL   : https://patchwork.freedesktop.org/series/42815/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Flush submission tasklet after bumping priority
Okay!

Commit: drm/i915: Disable tasklet scheduling across initial scheduling
Okay!

Commit: drm/i915/execlists: Make submission tasklet hardirq safe
Okay!

Commit: drm/i915/guc: Make submission tasklet hardirq safe
Okay!

Commit: drm/i915/execlists: Direct submit onto idle engines
+drivers/gpu/drm/i915/intel_guc_submission.c:770:28: warning: context imbalance in 'guc_dequeue' - unexpected unlock
+drivers/gpu/drm/i915/intel_lrc.c:368:39: warning: context imbalance in 'execlists_unwind_incomplete_requests' - unexpected unlock
+drivers/gpu/drm/i915/intel_lrc.c:775:39: warning: context imbalance in 'execlists_dequeue' - unexpected unlock
-O:drivers/gpu/drm/i915/intel_ringbuffer.h:651:23: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/intel_ringbuffer.h:651:23: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/intel_ringbuffer.h:658:23: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/intel_ringbuffer.h:658:23: warning: expression using sizeof(void)

Commit: drm/i915/execlists: Direct submission from irq handler
Okay!

Commit: drm/i915: Speed up idle detection by kicking the tasklets
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2)
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (12 preceding siblings ...)
  2018-05-08 14:13 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-05-08 14:28 ` Patchwork
  2018-05-08 16:27 ` ✓ Fi.CI.IGT: " Patchwork
  14 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-08 14:28 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2)
URL   : https://patchwork.freedesktop.org/series/42815/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4157 -> Patchwork_8941 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42815/revisions/2/mbox/

== Known issues ==

  Here are the changes found in Patchwork_8941 that come from known issues:

  === IGT changes ===

    ==== Possible fixes ====

    igt@gem_exec_suspend@basic-s3:
      fi-skl-guc:         FAIL (fdo#105900, fdo#104699) -> PASS

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         FAIL (fdo#102575) -> PASS

    igt@kms_flip@basic-flip-vs-dpms:
      fi-bxt-dsi:         INCOMPLETE (fdo#103927) -> PASS

    
  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#104699 https://bugs.freedesktop.org/show_bug.cgi?id=104699
  fdo#105900 https://bugs.freedesktop.org/show_bug.cgi?id=105900


== Participating hosts (41 -> 37) ==

  Missing    (4): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4157 -> Patchwork_8941

  CI_DRM_4157: 580a7e5eafe15dd21f0dfc92d860a57a404e622d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4464: 1bb318b32db003a377da14715c7b80675a712b6b @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8941: 3882e58fc20a6394b6479857b7521ecdafba2864 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4464: 33e58d5583eb7ed3966a1b905f875a1dfa959f6b @ git://anongit.freedesktop.org/piglit


== Linux commits ==

3882e58fc20a drm/i915: Speed up idle detection by kicking the tasklets
74c81231cbb7 drm/i915/execlists: Direct submission from irq handler
d560ae29a13f drm/i915/execlists: Direct submit onto idle engines
24e925250527 drm/i915/guc: Make submission tasklet hardirq safe
ebbf8fabee33 drm/i915/execlists: Make submission tasklet hardirq safe
bda8b3054416 drm/i915: Disable tasklet scheduling across initial scheduling
699def5526fd drm/i915: Flush submission tasklet after bumping priority

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8941/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2)
  2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
                   ` (13 preceding siblings ...)
  2018-05-08 14:28 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-08 16:27 ` Patchwork
  14 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-08 16:27 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2)
URL   : https://patchwork.freedesktop.org/series/42815/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4157_full -> Patchwork_8941_full =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42815/revisions/2/mbox/

== Known issues ==

  Here are the changes found in Patchwork_8941_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@modeset-vs-vblank-race:
      shard-hsw:          PASS -> FAIL (fdo#103060)

    igt@kms_flip@plain-flip-ts-check-interruptible:
      shard-hsw:          PASS -> FAIL (fdo#100368)

    igt@perf@blocking:
      shard-hsw:          PASS -> FAIL (fdo#102252)

    igt@perf_pmu@enable-race-vcs0:
      shard-glk:          PASS -> INCOMPLETE (k.org#198133, fdo#103359)

    
    ==== Possible fixes ====

    igt@kms_atomic_transition@2x-modeset-transitions-nonblocking:
      shard-hsw:          DMESG-FAIL (fdo#104724) -> PASS

    igt@kms_flip@2x-plain-flip-fb-recreate-interruptible:
      shard-hsw:          FAIL (fdo#103928) -> PASS

    igt@kms_flip@2x-plain-flip-ts-check:
      shard-hsw:          FAIL (fdo#100368) -> PASS

    igt@kms_flip@busy-flip-interruptible:
      shard-apl:          FAIL -> PASS

    igt@kms_flip@dpms-vs-vblank-race:
      shard-hsw:          FAIL (fdo#103060) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
  fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (5 -> 4) ==

  Missing    (1): shard-kbl 


== Build changes ==

    * Linux: CI_DRM_4157 -> Patchwork_8941

  CI_DRM_4157: 580a7e5eafe15dd21f0dfc92d860a57a404e622d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4464: 1bb318b32db003a377da14715c7b80675a712b6b @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8941: 3882e58fc20a6394b6479857b7521ecdafba2864 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4464: 33e58d5583eb7ed3966a1b905f875a1dfa959f6b @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8941/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-07 13:57 ` [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe Chris Wilson
  2018-05-08 10:10   ` Tvrtko Ursulin
@ 2018-05-08 17:38   ` Tvrtko Ursulin
  2018-05-08 17:45   ` Tvrtko Ursulin
  2 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 17:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/05/2018 14:57, Chris Wilson wrote:
> Prepare to allow the execlists submission to be run from underneath a
> hardirq timer context (and not just the current softirq context) as is
> required for fast preemption resets and context switches.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++----------
>   1 file changed, 29 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f9f4064dec0e..15c373ea5b7e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -357,10 +357,13 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
>   {
>   	struct intel_engine_cs *engine =
>   		container_of(execlists, typeof(*engine), execlists);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
>   
> -	spin_lock_irq(&engine->timeline.lock);
>   	__unwind_incomplete_requests(engine);
> -	spin_unlock_irq(&engine->timeline.lock);
> +
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>   
>   static inline void
> @@ -554,7 +557,7 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
>   	execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
>   }
>   
> -static void execlists_dequeue(struct intel_engine_cs *engine)
> +static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct execlist_port *port = execlists->port;
> @@ -564,6 +567,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	struct rb_node *rb;
>   	bool submit = false;
>   
> +	lockdep_assert_held(&engine->timeline.lock);
> +
>   	/* Hardware submission is through 2 ports. Conceptually each port
>   	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
>   	 * static for a context, and unique to each, so we only execute
> @@ -585,7 +590,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * and context switches) submission.
>   	 */
>   
> -	spin_lock_irq(&engine->timeline.lock);
>   	rb = execlists->first;
>   	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
>   
> @@ -600,7 +604,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   						EXECLISTS_ACTIVE_USER));
>   		GEM_BUG_ON(!port_count(&port[0]));
>   		if (port_count(&port[0]) > 1)
> -			goto unlock;
> +			return false;
>   
>   		/*
>   		 * If we write to ELSP a second time before the HW has had
> @@ -610,11 +614,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		 * the HW to indicate that it has had a chance to respond.
>   		 */
>   		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
> -			goto unlock;
> +			return false;
>   
>   		if (need_preempt(engine, last, execlists->queue_priority)) {
>   			inject_preempt_context(engine);
> -			goto unlock;
> +			return false;
>   		}
>   
>   		/*
> @@ -639,7 +643,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		 * priorities of the ports haven't been switch.
>   		 */
>   		if (port_count(&port[1]))
> -			goto unlock;
> +			return false;
>   
>   		/*
>   		 * WaIdleLiteRestore:bdw,skl
> @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	/* We must always keep the beast fed if we have work piled up */
>   	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
>   
> -unlock:
> -	spin_unlock_irq(&engine->timeline.lock);
> -
> -	if (submit) {
> +	/* Re-evaluate the executing context setup after each preemptive kick */
> +	if (last)
>   		execlists_user_begin(execlists, execlists->port);
> +
> +	return submit;
> +}
> +
> +static void execlists_dequeue(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	unsigned long flags;
> +	bool submit;
> +
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
> +	submit = __execlists_dequeue(engine);
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +
> +	if (submit)
>   		execlists_submit_ports(engine);
> -	}
>   
>   	GEM_BUG_ON(port_isset(execlists->port) &&
>   		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/7] drm/i915/guc: Make submission tasklet hardirq safe
  2018-05-07 13:57 ` [PATCH v2 4/7] drm/i915/guc: " Chris Wilson
@ 2018-05-08 17:43   ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 17:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/05/2018 14:57, Chris Wilson wrote:
> Prepare to allow the GuC submission to be run from underneath a
> hardirq timer context (and not just the current softirq context) as is
> required for fast preemption resets and context switches.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_guc_submission.c | 34 +++++++++++++++------
>   1 file changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 62828e39ee26..2feb65096966 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -669,7 +669,7 @@ static inline int port_prio(const struct execlist_port *port)
>   	return rq_prio(port_request(port));
>   }
>   
> -static void guc_dequeue(struct intel_engine_cs *engine)
> +static bool __guc_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct execlist_port *port = execlists->port;
> @@ -679,7 +679,8 @@ static void guc_dequeue(struct intel_engine_cs *engine)
>   	bool submit = false;
>   	struct rb_node *rb;
>   
> -	spin_lock_irq(&engine->timeline.lock);
> +	lockdep_assert_held(&engine->timeline.lock);
> +
>   	rb = execlists->first;
>   	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
>   
> @@ -694,13 +695,13 @@ static void guc_dequeue(struct intel_engine_cs *engine)
>   						     EXECLISTS_ACTIVE_PREEMPT);
>   				queue_work(engine->i915->guc.preempt_wq,
>   					   &preempt_work->work);
> -				goto unlock;
> +				return false;
>   			}
>   		}
>   
>   		port++;
>   		if (port_isset(port))
> -			goto unlock;
> +			return false;
>   	}
>   	GEM_BUG_ON(port_isset(port));
>   
> @@ -738,19 +739,34 @@ static void guc_dequeue(struct intel_engine_cs *engine)
>   done:
>   	execlists->queue_priority = rb ? to_priolist(rb)->priority : INT_MIN;
>   	execlists->first = rb;
> -	if (submit) {
> +	if (submit)
>   		port_assign(port, last);
> +	if (last)
>   		execlists_user_begin(execlists, execlists->port);
> -		guc_submit(engine);
> -	}
>   
>   	/* We must always keep the beast fed if we have work piled up */
>   	GEM_BUG_ON(port_isset(execlists->port) &&
>   		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
>   	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
>   
> -unlock:
> -	spin_unlock_irq(&engine->timeline.lock);
> +	return submit;
> +}
> +
> +static void guc_dequeue(struct intel_engine_cs *engine)
> +{
> +	unsigned long flags;
> +	bool submit;
> +
> +	local_irq_save(flags);
> +
> +	spin_lock(&engine->timeline.lock);
> +	submit = __guc_dequeue(engine);
> +	spin_unlock(&engine->timeline.lock);
> +
> +	if (submit)
> +		guc_submit(engine);
> +
> +	local_irq_restore(flags);
>   }
>   
>   static void guc_submission_tasklet(unsigned long data)
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-07 13:57 ` [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe Chris Wilson
  2018-05-08 10:10   ` Tvrtko Ursulin
  2018-05-08 17:38   ` Tvrtko Ursulin
@ 2018-05-08 17:45   ` Tvrtko Ursulin
  2018-05-08 20:59     ` Chris Wilson
  2 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08 17:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/05/2018 14:57, Chris Wilson wrote:
> Prepare to allow the execlists submission to be run from underneath a
> hardirq timer context (and not just the current softirq context) as is
> required for fast preemption resets and context switches.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++----------
>   1 file changed, 29 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f9f4064dec0e..15c373ea5b7e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -357,10 +357,13 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
>   {
>   	struct intel_engine_cs *engine =
>   		container_of(execlists, typeof(*engine), execlists);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
>   
> -	spin_lock_irq(&engine->timeline.lock);
>   	__unwind_incomplete_requests(engine);
> -	spin_unlock_irq(&engine->timeline.lock);
> +
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>   
>   static inline void
> @@ -554,7 +557,7 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
>   	execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
>   }
>   
> -static void execlists_dequeue(struct intel_engine_cs *engine)
> +static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct execlist_port *port = execlists->port;
> @@ -564,6 +567,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	struct rb_node *rb;
>   	bool submit = false;
>   
> +	lockdep_assert_held(&engine->timeline.lock);
> +
>   	/* Hardware submission is through 2 ports. Conceptually each port
>   	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
>   	 * static for a context, and unique to each, so we only execute
> @@ -585,7 +590,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * and context switches) submission.
>   	 */
>   
> -	spin_lock_irq(&engine->timeline.lock);
>   	rb = execlists->first;
>   	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
>   
> @@ -600,7 +604,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   						EXECLISTS_ACTIVE_USER));
>   		GEM_BUG_ON(!port_count(&port[0]));
>   		if (port_count(&port[0]) > 1)
> -			goto unlock;
> +			return false;
>   
>   		/*
>   		 * If we write to ELSP a second time before the HW has had
> @@ -610,11 +614,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		 * the HW to indicate that it has had a chance to respond.
>   		 */
>   		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
> -			goto unlock;
> +			return false;
>   
>   		if (need_preempt(engine, last, execlists->queue_priority)) {
>   			inject_preempt_context(engine);
> -			goto unlock;
> +			return false;
>   		}
>   
>   		/*
> @@ -639,7 +643,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		 * priorities of the ports haven't been switch.
>   		 */
>   		if (port_count(&port[1]))
> -			goto unlock;
> +			return false;
>   
>   		/*
>   		 * WaIdleLiteRestore:bdw,skl
> @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	/* We must always keep the beast fed if we have work piled up */
>   	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
>   
> -unlock:
> -	spin_unlock_irq(&engine->timeline.lock);
> -
> -	if (submit) {
> +	/* Re-evaluate the executing context setup after each preemptive kick */
> +	if (last)
>   		execlists_user_begin(execlists, execlists->port);
> +
> +	return submit;
> +}
> +
> +static void execlists_dequeue(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	unsigned long flags;
> +	bool submit;
> +
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
> +	submit = __execlists_dequeue(engine);
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +
> +	if (submit)
>   		execlists_submit_ports(engine);
> -	}

Actually, having read the guc version, why doesn't 
execlists_submit_ports need to be hardirq safe?

Regards,

Tvrtko

>   
>   	GEM_BUG_ON(port_isset(execlists->port) &&
>   		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-08 17:45   ` Tvrtko Ursulin
@ 2018-05-08 20:59     ` Chris Wilson
  2018-05-09  9:23       ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-05-08 20:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-08 18:45:44)
> 
> On 07/05/2018 14:57, Chris Wilson wrote:
> > Prepare to allow the execlists submission to be run from underneath a
> > hardirq timer context (and not just the current softirq context) as is
> > required for fast preemption resets and context switches.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++----------
> >   1 file changed, 29 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index f9f4064dec0e..15c373ea5b7e 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -357,10 +357,13 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
> >   {
> >       struct intel_engine_cs *engine =
> >               container_of(execlists, typeof(*engine), execlists);
> > +     unsigned long flags;
> > +
> > +     spin_lock_irqsave(&engine->timeline.lock, flags);
> >   
> > -     spin_lock_irq(&engine->timeline.lock);
> >       __unwind_incomplete_requests(engine);
> > -     spin_unlock_irq(&engine->timeline.lock);
> > +
> > +     spin_unlock_irqrestore(&engine->timeline.lock, flags);
> >   }
> >   
> >   static inline void
> > @@ -554,7 +557,7 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
> >       execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
> >   }
> >   
> > -static void execlists_dequeue(struct intel_engine_cs *engine)
> > +static bool __execlists_dequeue(struct intel_engine_cs *engine)
> >   {
> >       struct intel_engine_execlists * const execlists = &engine->execlists;
> >       struct execlist_port *port = execlists->port;
> > @@ -564,6 +567,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >       struct rb_node *rb;
> >       bool submit = false;
> >   
> > +     lockdep_assert_held(&engine->timeline.lock);
> > +
> >       /* Hardware submission is through 2 ports. Conceptually each port
> >        * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
> >        * static for a context, and unique to each, so we only execute
> > @@ -585,7 +590,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >        * and context switches) submission.
> >        */
> >   
> > -     spin_lock_irq(&engine->timeline.lock);
> >       rb = execlists->first;
> >       GEM_BUG_ON(rb_first(&execlists->queue) != rb);
> >   
> > @@ -600,7 +604,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >                                               EXECLISTS_ACTIVE_USER));
> >               GEM_BUG_ON(!port_count(&port[0]));
> >               if (port_count(&port[0]) > 1)
> > -                     goto unlock;
> > +                     return false;
> >   
> >               /*
> >                * If we write to ELSP a second time before the HW has had
> > @@ -610,11 +614,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >                * the HW to indicate that it has had a chance to respond.
> >                */
> >               if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
> > -                     goto unlock;
> > +                     return false;
> >   
> >               if (need_preempt(engine, last, execlists->queue_priority)) {
> >                       inject_preempt_context(engine);
> > -                     goto unlock;
> > +                     return false;
> >               }
> >   
> >               /*
> > @@ -639,7 +643,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >                * priorities of the ports haven't been switch.
> >                */
> >               if (port_count(&port[1]))
> > -                     goto unlock;
> > +                     return false;
> >   
> >               /*
> >                * WaIdleLiteRestore:bdw,skl
> > @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >       /* We must always keep the beast fed if we have work piled up */
> >       GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
> >   
> > -unlock:
> > -     spin_unlock_irq(&engine->timeline.lock);
> > -
> > -     if (submit) {
> > +     /* Re-evaluate the executing context setup after each preemptive kick */
> > +     if (last)
> >               execlists_user_begin(execlists, execlists->port);
> > +
> > +     return submit;
> > +}
> > +
> > +static void execlists_dequeue(struct intel_engine_cs *engine)
> > +{
> > +     struct intel_engine_execlists * const execlists = &engine->execlists;
> > +     unsigned long flags;
> > +     bool submit;
> > +
> > +     spin_lock_irqsave(&engine->timeline.lock, flags);
> > +     submit = __execlists_dequeue(engine);
> > +     spin_unlock_irqrestore(&engine->timeline.lock, flags);
> > +
> > +     if (submit)
> >               execlists_submit_ports(engine);
> > -     }
> 
> Actually, having read the guc version, why doesn't 
> execlists_submit_ports need to be hardirq safe?

execlists->port[] and the ESLP register are guarded by the tasklet (they
are only accessed from inside the tasklet). guc caught me off guard
because it uses a shared wq (and spinlock) for all tasklets. So guc
requires extending the irq-off section across the shared wq.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-08 20:59     ` Chris Wilson
@ 2018-05-09  9:23       ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-09  9:23 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Chris Wilson (2018-05-08 21:59:20)
> Quoting Tvrtko Ursulin (2018-05-08 18:45:44)
> > 
> > On 07/05/2018 14:57, Chris Wilson wrote:
> > > Prepare to allow the execlists submission to be run from underneath a
> > > hardirq timer context (and not just the current softirq context) as is
> > > required for fast preemption resets and context switches.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >   drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++----------
> > >   1 file changed, 29 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > > index f9f4064dec0e..15c373ea5b7e 100644
> > > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > > @@ -357,10 +357,13 @@ execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
> > >   {
> > >       struct intel_engine_cs *engine =
> > >               container_of(execlists, typeof(*engine), execlists);
> > > +     unsigned long flags;
> > > +
> > > +     spin_lock_irqsave(&engine->timeline.lock, flags);
> > >   
> > > -     spin_lock_irq(&engine->timeline.lock);
> > >       __unwind_incomplete_requests(engine);
> > > -     spin_unlock_irq(&engine->timeline.lock);
> > > +
> > > +     spin_unlock_irqrestore(&engine->timeline.lock, flags);
> > >   }
> > >   
> > >   static inline void
> > > @@ -554,7 +557,7 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
> > >       execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
> > >   }
> > >   
> > > -static void execlists_dequeue(struct intel_engine_cs *engine)
> > > +static bool __execlists_dequeue(struct intel_engine_cs *engine)
> > >   {
> > >       struct intel_engine_execlists * const execlists = &engine->execlists;
> > >       struct execlist_port *port = execlists->port;
> > > @@ -564,6 +567,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > >       struct rb_node *rb;
> > >       bool submit = false;
> > >   
> > > +     lockdep_assert_held(&engine->timeline.lock);
> > > +
> > >       /* Hardware submission is through 2 ports. Conceptually each port
> > >        * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
> > >        * static for a context, and unique to each, so we only execute
> > > @@ -585,7 +590,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > >        * and context switches) submission.
> > >        */
> > >   
> > > -     spin_lock_irq(&engine->timeline.lock);
> > >       rb = execlists->first;
> > >       GEM_BUG_ON(rb_first(&execlists->queue) != rb);
> > >   
> > > @@ -600,7 +604,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > >                                               EXECLISTS_ACTIVE_USER));
> > >               GEM_BUG_ON(!port_count(&port[0]));
> > >               if (port_count(&port[0]) > 1)
> > > -                     goto unlock;
> > > +                     return false;
> > >   
> > >               /*
> > >                * If we write to ELSP a second time before the HW has had
> > > @@ -610,11 +614,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > >                * the HW to indicate that it has had a chance to respond.
> > >                */
> > >               if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
> > > -                     goto unlock;
> > > +                     return false;
> > >   
> > >               if (need_preempt(engine, last, execlists->queue_priority)) {
> > >                       inject_preempt_context(engine);
> > > -                     goto unlock;
> > > +                     return false;
> > >               }
> > >   
> > >               /*
> > > @@ -639,7 +643,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > >                * priorities of the ports haven't been switch.
> > >                */
> > >               if (port_count(&port[1]))
> > > -                     goto unlock;
> > > +                     return false;
> > >   
> > >               /*
> > >                * WaIdleLiteRestore:bdw,skl
> > > @@ -744,13 +748,25 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > >       /* We must always keep the beast fed if we have work piled up */
> > >       GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
> > >   
> > > -unlock:
> > > -     spin_unlock_irq(&engine->timeline.lock);
> > > -
> > > -     if (submit) {
> > > +     /* Re-evaluate the executing context setup after each preemptive kick */
> > > +     if (last)
> > >               execlists_user_begin(execlists, execlists->port);
> > > +
> > > +     return submit;
> > > +}
> > > +
> > > +static void execlists_dequeue(struct intel_engine_cs *engine)
> > > +{
> > > +     struct intel_engine_execlists * const execlists = &engine->execlists;
> > > +     unsigned long flags;
> > > +     bool submit;
> > > +
> > > +     spin_lock_irqsave(&engine->timeline.lock, flags);
> > > +     submit = __execlists_dequeue(engine);
> > > +     spin_unlock_irqrestore(&engine->timeline.lock, flags);
> > > +
> > > +     if (submit)
> > >               execlists_submit_ports(engine);
> > > -     }
> > 
> > Actually, having read the guc version, why doesn't 
> > execlists_submit_ports need to be hardirq safe?
> 
> execlists->port[] and the ESLP register are guarded by the tasklet (they
> are only accessed from inside the tasklet). guc caught me off guard
> because it uses a shared wq (and spinlock) for all tasklets. So guc
> requires extending the irq-off section across the shared wq.

I took your r-b and ran with it. The question you asked was exactly the
puzzle I few into with the guc, so I believe we're on the same page :)

Pushed the irqsafe patches, as they are useful for a couple of series.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-05-09  9:23 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-07 13:57 [PATCH v2 1/7] drm/i915: Flush submission tasklet after bumping priority Chris Wilson
2018-05-07 13:57 ` [PATCH v2 2/7] drm/i915: Disable tasklet scheduling across initial scheduling Chris Wilson
2018-05-08 10:02   ` Tvrtko Ursulin
2018-05-08 10:31     ` Chris Wilson
2018-05-07 13:57 ` [PATCH v2 3/7] drm/i915/execlists: Make submission tasklet hardirq safe Chris Wilson
2018-05-08 10:10   ` Tvrtko Ursulin
2018-05-08 10:24     ` Chris Wilson
2018-05-08 10:56       ` Tvrtko Ursulin
2018-05-08 11:05         ` Chris Wilson
2018-05-08 11:38           ` Tvrtko Ursulin
2018-05-08 11:43             ` Chris Wilson
2018-05-08 17:38   ` Tvrtko Ursulin
2018-05-08 17:45   ` Tvrtko Ursulin
2018-05-08 20:59     ` Chris Wilson
2018-05-09  9:23       ` Chris Wilson
2018-05-07 13:57 ` [PATCH v2 4/7] drm/i915/guc: " Chris Wilson
2018-05-08 17:43   ` Tvrtko Ursulin
2018-05-07 13:57 ` [PATCH v2 5/7] drm/i915/execlists: Direct submit onto idle engines Chris Wilson
2018-05-08 10:23   ` Tvrtko Ursulin
2018-05-08 10:40     ` Chris Wilson
2018-05-08 11:00       ` Tvrtko Ursulin
2018-05-07 13:57 ` [PATCH v2 6/7] drm/i915/execlists: Direct submission from irq handler Chris Wilson
2018-05-08 10:54   ` Tvrtko Ursulin
2018-05-08 11:10     ` Chris Wilson
2018-05-08 11:53       ` Tvrtko Ursulin
2018-05-08 12:17   ` [PATCH] " Chris Wilson
2018-05-07 13:57 ` [PATCH v2 7/7] drm/i915: Speed up idle detection by kicking the tasklets Chris Wilson
2018-05-07 15:31 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority Patchwork
2018-05-07 15:32 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-05-07 15:46 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-07 17:56 ` ✓ Fi.CI.IGT: " Patchwork
2018-05-08  9:40 ` [PATCH v2 1/7] " Tvrtko Ursulin
2018-05-08  9:45   ` Chris Wilson
2018-05-08  9:57     ` Tvrtko Ursulin
2018-05-08 14:11 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2,1/7] drm/i915: Flush submission tasklet after bumping priority (rev2) Patchwork
2018-05-08 14:13 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-05-08 14:28 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-08 16:27 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.