All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/7] drm/i915: Remove tasklet flush before disable
@ 2018-05-16  6:47 Chris Wilson
  2018-05-16  6:47 ` [PATCH 2/7] drm/i915: Only sync tasklets once for recursive reset preparation Chris Wilson
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  6:47 UTC (permalink / raw)
  To: intel-gfx

The idea was to try and let the existing tasklet run to completion
before we began the reset, but it involves a racy check against anything
else that tries to run the tasklet. Rather than acknowledge and ignore
the race, let it be and don't try and be too clever.

The tasklet will resume execution after reset (after spinning a bit
during reset), but before we allow it to resume we will have cleared all
the pending state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0a2070112b66..0dc369a9ec4d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3035,16 +3035,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 * calling engine->init_hw() and also writing the ELSP.
 	 * Turning off the execlists->tasklet until the reset is over
 	 * prevents the race.
-	 *
-	 * Note that this needs to be a single atomic operation on the
-	 * tasklet (flush existing tasks, prevent new tasks) to prevent
-	 * a race between reset and set-wedged. It is not, so we do the best
-	 * we can atm and make sure we don't lock the machine up in the more
-	 * common case of recursively being called from set-wedged from inside
-	 * i915_reset.
 	 */
-	if (!atomic_read(&engine->execlists.tasklet.count))
-		tasklet_kill(&engine->execlists.tasklet);
 	tasklet_disable(&engine->execlists.tasklet);
 
 	/*
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/7] drm/i915: Only sync tasklets once for recursive reset preparation
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
@ 2018-05-16  6:47 ` Chris Wilson
  2018-05-16 14:24   ` Mika Kuoppala
  2018-05-16  6:47 ` [PATCH 3/7] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  6:47 UTC (permalink / raw)
  To: intel-gfx

When setting up reset, we may need to recursively prepare an engine. In
which case we should only synchronously flush the tasklets on the outer
most call, the inner calls will then be inside an atomic section where
the tasklet will never be run (and so the sync will never complete).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 drivers/gpu/drm/i915/i915_gem.h | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0dc369a9ec4d..aab1f5e77ae9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3036,7 +3036,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 * Turning off the execlists->tasklet until the reset is over
 	 * prevents the race.
 	 */
-	tasklet_disable(&engine->execlists.tasklet);
+	__tasklet_disable_once(&engine->execlists.tasklet);
 
 	/*
 	 * We're using worker to queue preemption requests from the tasklet in
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 525920404ede..af033e7c1c56 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -26,6 +26,7 @@
 #define __I915_GEM_H__
 
 #include <linux/bug.h>
+#include <linux/interrupt.h>
 
 struct drm_i915_private;
 
@@ -72,4 +73,10 @@ struct drm_i915_private;
 void i915_gem_park(struct drm_i915_private *i915);
 void i915_gem_unpark(struct drm_i915_private *i915);
 
+static inline void __tasklet_disable_once(struct tasklet_struct *t)
+{
+	if (atomic_inc_return(&t->count) == 1)
+		tasklet_unlock_wait(t);
+}
+
 #endif /* __I915_GEM_H__ */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/7] drm/i915/execlists: Refactor out complete_preempt_context()
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
  2018-05-16  6:47 ` [PATCH 2/7] drm/i915: Only sync tasklets once for recursive reset preparation Chris Wilson
@ 2018-05-16  6:47 ` Chris Wilson
  2018-05-16  6:47 ` [PATCH 4/7] drm/i915: Move engine reset prepare/finish to backends Chris Wilson
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  6:47 UTC (permalink / raw)
  To: intel-gfx

As a complement to inject_preempt_context(), follow up with the function
to handle its completion. This will be useful should we wish to extend
the duties of the preempt-context for execlists.

v2: And do the same for the guc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com> #v1
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 26 ++++++++++++++-------
 drivers/gpu/drm/i915/intel_lrc.c            | 23 ++++++++++--------
 2 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 2feb65096966..ca38ac9ff4fa 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -623,6 +623,21 @@ static void wait_for_guc_preempt_report(struct intel_engine_cs *engine)
 	report->report_return_status = INTEL_GUC_REPORT_STATUS_UNKNOWN;
 }
 
+static void complete_preempt_context(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists *execlists = &engine->execlists;
+
+	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
+
+	execlists_cancel_port_requests(execlists);
+	execlists_unwind_incomplete_requests(execlists);
+
+	wait_for_guc_preempt_report(engine);
+	intel_write_status_page(engine, I915_GEM_HWS_PREEMPT_INDEX, 0);
+
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
+}
+
 /**
  * guc_submit() - Submit commands through GuC
  * @engine: engine associated with the commands
@@ -793,15 +808,8 @@ static void guc_submission_tasklet(unsigned long data)
 
 	if (execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT) &&
 	    intel_read_status_page(engine, I915_GEM_HWS_PREEMPT_INDEX) ==
-	    GUC_PREEMPT_FINISHED) {
-		execlists_cancel_port_requests(&engine->execlists);
-		execlists_unwind_incomplete_requests(execlists);
-
-		wait_for_guc_preempt_report(engine);
-
-		execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
-		intel_write_status_page(engine, I915_GEM_HWS_PREEMPT_INDEX, 0);
-	}
+	    GUC_PREEMPT_FINISHED)
+		complete_preempt_context(engine);
 
 	if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT))
 		guc_dequeue(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 15434cad5430..b2781f1bb91c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -552,8 +552,18 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
 	if (execlists->ctrl_reg)
 		writel(EL_CTRL_LOAD, execlists->ctrl_reg);
 
-	execlists_clear_active(&engine->execlists, EXECLISTS_ACTIVE_HWACK);
-	execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
+	execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
+}
+
+static void complete_preempt_context(struct intel_engine_execlists *execlists)
+{
+	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
+
+	execlists_cancel_port_requests(execlists);
+	execlists_unwind_incomplete_requests(execlists);
+
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
 static bool __execlists_dequeue(struct intel_engine_cs *engine)
@@ -1063,14 +1073,7 @@ static void execlists_submission_tasklet(unsigned long data)
 			if (status & GEN8_CTX_STATUS_COMPLETE &&
 			    buf[2*head + 1] == execlists->preempt_complete_status) {
 				GEM_TRACE("%s preempt-idle\n", engine->name);
-
-				execlists_cancel_port_requests(execlists);
-				execlists_unwind_incomplete_requests(execlists);
-
-				GEM_BUG_ON(!execlists_is_active(execlists,
-								EXECLISTS_ACTIVE_PREEMPT));
-				execlists_clear_active(execlists,
-						       EXECLISTS_ACTIVE_PREEMPT);
+				complete_preempt_context(execlists);
 				continue;
 			}
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/7] drm/i915: Move engine reset prepare/finish to backends
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
  2018-05-16  6:47 ` [PATCH 2/7] drm/i915: Only sync tasklets once for recursive reset preparation Chris Wilson
  2018-05-16  6:47 ` [PATCH 3/7] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
@ 2018-05-16  6:47 ` Chris Wilson
  2018-05-16  6:47 ` [PATCH 5/7] drm/i915: Split execlists/guc reset preparations Chris Wilson
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  6:47 UTC (permalink / raw)
  To: intel-gfx

In preparation to more carefully handling incomplete preemption during
reset by execlists, we move the existing code wholesale to the backends
under a couple of new reset vfuncs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 38 +++----------------
 drivers/gpu/drm/i915/i915_gem.h         |  5 +++
 drivers/gpu/drm/i915/intel_lrc.c        | 50 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 23 ++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  9 ++++-
 5 files changed, 84 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index aab1f5e77ae9..abf661d40641 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3004,7 +3004,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 struct i915_request *
 i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 {
-	struct i915_request *request = NULL;
+	struct i915_request *request;
 
 	/*
 	 * During the reset sequence, we must prevent the engine from
@@ -3027,31 +3027,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 */
 	kthread_park(engine->breadcrumbs.signaler);
 
-	/*
-	 * Prevent request submission to the hardware until we have
-	 * completed the reset in i915_gem_reset_finish(). If a request
-	 * is completed by one engine, it may then queue a request
-	 * to a second via its execlists->tasklet *just* as we are
-	 * calling engine->init_hw() and also writing the ELSP.
-	 * Turning off the execlists->tasklet until the reset is over
-	 * prevents the race.
-	 */
-	__tasklet_disable_once(&engine->execlists.tasklet);
-
-	/*
-	 * We're using worker to queue preemption requests from the tasklet in
-	 * GuC submission mode.
-	 * Even though tasklet was disabled, we may still have a worker queued.
-	 * Let's make sure that all workers scheduled before disabling the
-	 * tasklet are completed before continuing with the reset.
-	 */
-	if (engine->i915->guc.preempt_wq)
-		flush_workqueue(engine->i915->guc.preempt_wq);
-
-	if (engine->irq_seqno_barrier)
-		engine->irq_seqno_barrier(engine);
-
-	request = i915_gem_find_active_request(engine);
+	request = engine->reset.prepare(engine);
 	if (request && request->fence.error == -EIO)
 		request = ERR_PTR(-EIO); /* Previous reset failed! */
 
@@ -3202,13 +3178,8 @@ void i915_gem_reset_engine(struct intel_engine_cs *engine,
 	if (request)
 		request = i915_gem_reset_request(engine, request, stalled);
 
-	if (request) {
-		DRM_DEBUG_DRIVER("resetting %s to restart from tail of request 0x%x\n",
-				 engine->name, request->global_seqno);
-	}
-
 	/* Setup the CS to resume from the breadcrumb of the hung request */
-	engine->reset_hw(engine, request);
+	engine->reset.reset(engine, request);
 }
 
 void i915_gem_reset(struct drm_i915_private *dev_priv,
@@ -3256,7 +3227,8 @@ void i915_gem_reset(struct drm_i915_private *dev_priv,
 
 void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
 {
-	tasklet_enable(&engine->execlists.tasklet);
+	engine->reset.finish(engine);
+
 	kthread_unpark(engine->breadcrumbs.signaler);
 
 	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index af033e7c1c56..3bdebee41791 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -79,4 +79,9 @@ static inline void __tasklet_disable_once(struct tasklet_struct *t)
 		tasklet_unlock_wait(t);
 }
 
+static inline bool __tasklet_is_enabled(struct tasklet_struct *t)
+{
+	return !atomic_read(&t->count);
+}
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b2781f1bb91c..1548563e2b43 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1826,8 +1826,39 @@ static int gen9_init_render_ring(struct intel_engine_cs *engine)
 	return 0;
 }
 
-static void reset_common_ring(struct intel_engine_cs *engine,
-			      struct i915_request *request)
+static struct i915_request *
+execlists_reset_prepare(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+
+	GEM_TRACE("%s\n", engine->name);
+
+	/*
+	 * Prevent request submission to the hardware until we have
+	 * completed the reset in i915_gem_reset_finish(). If a request
+	 * is completed by one engine, it may then queue a request
+	 * to a second via its execlists->tasklet *just* as we are
+	 * calling engine->init_hw() and also writing the ELSP.
+	 * Turning off the execlists->tasklet until the reset is over
+	 * prevents the race.
+	 */
+	__tasklet_disable_once(&execlists->tasklet);
+
+	/*
+	 * We're using worker to queue preemption requests from the tasklet in
+	 * GuC submission mode.
+	 * Even though tasklet was disabled, we may still have a worker queued.
+	 * Let's make sure that all workers scheduled before disabling the
+	 * tasklet are completed before continuing with the reset.
+	 */
+	if (engine->i915->guc.preempt_wq)
+		flush_workqueue(engine->i915->guc.preempt_wq);
+
+	return i915_gem_find_active_request(engine);
+}
+
+static void execlists_reset(struct intel_engine_cs *engine,
+			    struct i915_request *request)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	unsigned long flags;
@@ -1837,6 +1868,9 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 		  engine->name, request ? request->global_seqno : 0,
 		  intel_engine_get_seqno(engine));
 
+	/* The submission tasklet must be disabled, engine->reset.prepare(). */
+	GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet));
+
 	/* See execlists_cancel_requests() for the irq/spinlock split. */
 	local_irq_save(flags);
 
@@ -1907,6 +1941,13 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 	unwind_wa_tail(request);
 }
 
+static void execlists_reset_finish(struct intel_engine_cs *engine)
+{
+	tasklet_enable(&engine->execlists.tasklet);
+
+	GEM_TRACE("%s\n", engine->name);
+}
+
 static int intel_logical_ring_emit_pdps(struct i915_request *rq)
 {
 	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
@@ -2236,7 +2277,10 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 {
 	/* Default vfuncs which can be overriden by each engine. */
 	engine->init_hw = gen8_init_common_ring;
-	engine->reset_hw = reset_common_ring;
+
+	engine->reset.prepare = execlists_reset_prepare;
+	engine->reset.reset = execlists_reset;
+	engine->reset.finish = execlists_reset_finish;
 
 	engine->context_pin = execlists_context_pin;
 	engine->context_unpin = execlists_context_unpin;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8f19349a6055..af5a178366ed 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -531,9 +531,20 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	return ret;
 }
 
-static void reset_ring_common(struct intel_engine_cs *engine,
-			      struct i915_request *request)
+static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
 {
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
+
+	return i915_gem_find_active_request(engine);
+}
+
+static void reset_ring(struct intel_engine_cs *engine,
+		       struct i915_request *request)
+{
+	GEM_TRACE("%s seqno=%x\n",
+		  engine->name, request ? request->global_seqno : 0);
+
 	/*
 	 * RC6 must be prevented until the reset is complete and the engine
 	 * reinitialised. If it occurs in the middle of this sequence, the
@@ -597,6 +608,10 @@ static void reset_ring_common(struct intel_engine_cs *engine,
 	}
 }
 
+static void reset_finish(struct intel_engine_cs *engine)
+{
+}
+
 static int intel_rcs_ctx_init(struct i915_request *rq)
 {
 	int ret;
@@ -2006,7 +2021,9 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 	intel_ring_init_semaphores(dev_priv, engine);
 
 	engine->init_hw = init_ring_common;
-	engine->reset_hw = reset_ring_common;
+	engine->reset.prepare = reset_prepare;
+	engine->reset.reset = reset_ring;
+	engine->reset.finish = reset_finish;
 
 	engine->context_pin = intel_ring_context_pin;
 	engine->context_unpin = intel_ring_context_unpin;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 010750e8ee44..1e8bacba7754 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -423,8 +423,13 @@ struct intel_engine_cs {
 	void		(*irq_disable)(struct intel_engine_cs *engine);
 
 	int		(*init_hw)(struct intel_engine_cs *engine);
-	void		(*reset_hw)(struct intel_engine_cs *engine,
-				    struct i915_request *rq);
+
+	struct {
+		struct i915_request *(*prepare)(struct intel_engine_cs *engine);
+		void (*reset)(struct intel_engine_cs *engine,
+			      struct i915_request *rq);
+		void (*finish)(struct intel_engine_cs *engine);
+	} reset;
 
 	void		(*park)(struct intel_engine_cs *engine);
 	void		(*unpark)(struct intel_engine_cs *engine);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/7] drm/i915: Split execlists/guc reset preparations
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (2 preceding siblings ...)
  2018-05-16  6:47 ` [PATCH 4/7] drm/i915: Move engine reset prepare/finish to backends Chris Wilson
@ 2018-05-16  6:47 ` Chris Wilson
  2018-05-16  6:47 ` [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset Chris Wilson
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  6:47 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will make the execlists reset prepare callback
take into account preemption by flushing the context-switch handler.
This is not applicable to the GuC submission backend, so split the two
into their own backend callbacks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 34 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c            | 12 ++------
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index ca38ac9ff4fa..7c55f110bbea 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -815,6 +815,37 @@ static void guc_submission_tasklet(unsigned long data)
 		guc_dequeue(engine);
 }
 
+static struct i915_request *
+guc_reset_prepare(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+
+	GEM_TRACE("%s\n", engine->name);
+
+	/*
+	 * Prevent request submission to the hardware until we have
+	 * completed the reset in i915_gem_reset_finish(). If a request
+	 * is completed by one engine, it may then queue a request
+	 * to a second via its execlists->tasklet *just* as we are
+	 * calling engine->init_hw() and also writing the ELSP.
+	 * Turning off the execlists->tasklet until the reset is over
+	 * prevents the race.
+	 */
+	__tasklet_disable_once(&execlists->tasklet);
+
+	/*
+	 * We're using worker to queue preemption requests from the tasklet in
+	 * GuC submission mode.
+	 * Even though tasklet was disabled, we may still have a worker queued.
+	 * Let's make sure that all workers scheduled before disabling the
+	 * tasklet are completed before continuing with the reset.
+	 */
+	if (engine->i915->guc.preempt_wq)
+		flush_workqueue(engine->i915->guc.preempt_wq);
+
+	return i915_gem_find_active_request(engine);
+}
+
 /*
  * Everything below here is concerned with setup & teardown, and is
  * therefore not part of the somewhat time-critical batch-submission
@@ -1275,6 +1306,9 @@ int intel_guc_submission_enable(struct intel_guc *guc)
 			&engine->execlists;
 
 		execlists->tasklet.func = guc_submission_tasklet;
+
+		engine->reset.prepare = guc_reset_prepare;
+
 		engine->park = guc_submission_park;
 		engine->unpark = guc_submission_unpark;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1548563e2b43..7afe52fa615d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1844,16 +1844,6 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	 */
 	__tasklet_disable_once(&execlists->tasklet);
 
-	/*
-	 * We're using worker to queue preemption requests from the tasklet in
-	 * GuC submission mode.
-	 * Even though tasklet was disabled, we may still have a worker queued.
-	 * Let's make sure that all workers scheduled before disabling the
-	 * tasklet are completed before continuing with the reset.
-	 */
-	if (engine->i915->guc.preempt_wq)
-		flush_workqueue(engine->i915->guc.preempt_wq);
-
 	return i915_gem_find_active_request(engine);
 }
 
@@ -2258,6 +2248,8 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->schedule = execlists_schedule;
 	engine->execlists.tasklet.func = execlists_submission_tasklet;
 
+	engine->reset.prepare = execlists_reset_prepare;
+
 	engine->park = NULL;
 	engine->unpark = NULL;
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (3 preceding siblings ...)
  2018-05-16  6:47 ` [PATCH 5/7] drm/i915: Split execlists/guc reset preparations Chris Wilson
@ 2018-05-16  6:47 ` Chris Wilson
  2018-05-16 14:24   ` Tvrtko Ursulin
  2018-05-16  6:47 ` [PATCH 7/7] drm/i915: Stop parking the signaler around reset Chris Wilson
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  6:47 UTC (permalink / raw)
  To: intel-gfx

Catch up with the inflight CSB events, after disabling the tasklet
before deciding which request was truly guilty of hanging the GPU.

v2: Restore checking of use_csb_mmio on every loop, don't forget old
vgpu.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 127 +++++++++++++++++++++----------
 1 file changed, 87 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7afe52fa615d..cae10ebf9432 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -957,34 +957,14 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	local_irq_restore(flags);
 }
 
-/*
- * Check the unread Context Status Buffers and manage the submission of new
- * contexts to the ELSP accordingly.
- */
-static void execlists_submission_tasklet(unsigned long data)
+static void process_csb(struct intel_engine_cs *engine)
 {
-	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
-	struct drm_i915_private *dev_priv = engine->i915;
+	struct drm_i915_private *i915 = engine->i915;
 	bool fw = false;
 
-	/*
-	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
-	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
-	 * not be relinquished until the device is idle (see
-	 * i915_gem_idle_work_handler()). As a precaution, we make sure
-	 * that all ELSP are drained i.e. we have processed the CSB,
-	 * before allowing ourselves to idle and calling intel_runtime_pm_put().
-	 */
-	GEM_BUG_ON(!dev_priv->gt.awake);
-
-	/*
-	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
-	 * imposing the cost of a locked atomic transaction when submitting a
-	 * new request (outside of the context-switch interrupt).
-	 */
-	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
+	do {
 		/* The HWSP contains a (cacheable) mirror of the CSB */
 		const u32 *buf =
 			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
@@ -992,28 +972,27 @@ static void execlists_submission_tasklet(unsigned long data)
 
 		if (unlikely(execlists->csb_use_mmio)) {
 			buf = (u32 * __force)
-				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
-			execlists->csb_head = -1; /* force mmio read of CSB ptrs */
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+			execlists->csb_head = -1; /* force mmio read of CSB */
 		}
 
 		/* Clear before reading to catch new interrupts */
 		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 		smp_mb__after_atomic();
 
-		if (unlikely(execlists->csb_head == -1)) { /* following a reset */
+		if (unlikely(execlists->csb_head == -1)) { /* after a reset */
 			if (!fw) {
-				intel_uncore_forcewake_get(dev_priv,
-							   execlists->fw_domains);
+				intel_uncore_forcewake_get(i915, execlists->fw_domains);
 				fw = true;
 			}
 
-			head = readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
 			tail = GEN8_CSB_WRITE_PTR(head);
 			head = GEN8_CSB_READ_PTR(head);
 			execlists->csb_head = head;
 		} else {
 			const int write_idx =
-				intel_hws_csb_write_index(dev_priv) -
+				intel_hws_csb_write_index(i915) -
 				I915_HWS_CSB_BUF0_INDEX;
 
 			head = execlists->csb_head;
@@ -1022,8 +1001,8 @@ static void execlists_submission_tasklet(unsigned long data)
 		}
 		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
 			  engine->name,
-			  head, GEN8_CSB_READ_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
-			  tail, GEN8_CSB_WRITE_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
+			  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
+			  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
 
 		while (head != tail) {
 			struct i915_request *rq;
@@ -1033,7 +1012,8 @@ static void execlists_submission_tasklet(unsigned long data)
 			if (++head == GEN8_CSB_ENTRIES)
 				head = 0;
 
-			/* We are flying near dragons again.
+			/*
+			 * We are flying near dragons again.
 			 *
 			 * We hold a reference to the request in execlist_port[]
 			 * but no more than that. We are operating in softirq
@@ -1142,15 +1122,48 @@ static void execlists_submission_tasklet(unsigned long data)
 		if (head != execlists->csb_head) {
 			execlists->csb_head = head;
 			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
-			       dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+			       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
 		}
-	}
+	} while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
 
-	if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT))
-		execlists_dequeue(engine);
+	if (unlikely(fw))
+		intel_uncore_forcewake_put(i915, execlists->fw_domains);
+}
+
+/*
+ * Check the unread Context Status Buffers and manage the submission of new
+ * contexts to the ELSP accordingly.
+ */
+static void execlists_submission_tasklet(unsigned long data)
+{
+	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
 
-	if (fw)
-		intel_uncore_forcewake_put(dev_priv, execlists->fw_domains);
+	GEM_TRACE("%s awake?=%d, active=%x, irq-posted?=%d\n",
+		  engine->name,
+		  engine->i915->gt.awake,
+		  engine->execlists.active,
+		  test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
+
+	/*
+	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
+	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
+	 * not be relinquished until the device is idle (see
+	 * i915_gem_idle_work_handler()). As a precaution, we make sure
+	 * that all ELSP are drained i.e. we have processed the CSB,
+	 * before allowing ourselves to idle and calling intel_runtime_pm_put().
+	 */
+	GEM_BUG_ON(!engine->i915->gt.awake);
+
+	/*
+	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
+	 * imposing the cost of a locked atomic transaction when submitting a
+	 * new request (outside of the context-switch interrupt).
+	 */
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+		process_csb(engine);
+
+	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
+		execlists_dequeue(engine);
 
 	/* If the engine is now idle, so should be the flag; and vice versa. */
 	GEM_BUG_ON(execlists_is_active(&engine->execlists,
@@ -1830,6 +1843,7 @@ static struct i915_request *
 execlists_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_request *request, *active;
 
 	GEM_TRACE("%s\n", engine->name);
 
@@ -1844,7 +1858,40 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	 */
 	__tasklet_disable_once(&execlists->tasklet);
 
-	return i915_gem_find_active_request(engine);
+	/*
+	 * We want to flush the pending context switches, having disabled
+	 * the tasklet above, we can assume exclusive access to the execlists.
+	 * For this allows us to catch up with an inflight preemption event,
+	 * and avoid blaming an innocent request if the stall was due to the
+	 * preemption itself.
+	 */
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+		process_csb(engine);
+
+	/*
+	 * The last active request can then be no later than the last request
+	 * now in ELSP[0]. So search backwards from there, so that if the GPU
+	 * has advanced beyond the last CSB update, it will be pardoned.
+	 */
+	active = NULL;
+	request = port_request(execlists->port);
+	if (request) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&engine->timeline.lock, flags);
+		list_for_each_entry_from_reverse(request,
+						 &engine->timeline.requests,
+						 link) {
+			if (__i915_request_completed(request,
+						     request->global_seqno))
+				break;
+
+			active = request;
+		}
+		spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	}
+
+	return active;
 }
 
 static void execlists_reset(struct intel_engine_cs *engine,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/7] drm/i915: Stop parking the signaler around reset
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (4 preceding siblings ...)
  2018-05-16  6:47 ` [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset Chris Wilson
@ 2018-05-16  6:47 ` Chris Wilson
  2018-05-16  9:15   ` Tvrtko Ursulin
  2018-05-16 10:13   ` [PATCH v2] " Chris Wilson
  2018-05-16  7:09 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/7] drm/i915: Remove tasklet flush before disable Patchwork
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  6:47 UTC (permalink / raw)
  To: intel-gfx

We cannot call kthread_park() from softirq context, so let's avoid it
entirely during the reset. We wanted to suspend the signaler so that it
would not mark a request as complete at the same time as we marked it as
being in error. Instead of parking the signaling, stop the engine from
advancing so that the GPU doesn't emit the breadcrumb for our chosen
"guilty" request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 14 --------------
 drivers/gpu/drm/i915/intel_lrc.c        | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 18 ++++++++++++++++++
 3 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index abf661d40641..b0fe452ce17c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3015,18 +3015,6 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 */
 	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);
 
-	/*
-	 * Prevent the signaler thread from updating the request
-	 * state (by calling dma_fence_signal) as we are processing
-	 * the reset. The write from the GPU of the seqno is
-	 * asynchronous and the signaler thread may see a different
-	 * value to us and declare the request complete, even though
-	 * the reset routine have picked that request as the active
-	 * (incomplete) request. This conflict is not handled
-	 * gracefully!
-	 */
-	kthread_park(engine->breadcrumbs.signaler);
-
 	request = engine->reset.prepare(engine);
 	if (request && request->fence.error == -EIO)
 		request = ERR_PTR(-EIO); /* Previous reset failed! */
@@ -3229,8 +3217,6 @@ void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
 {
 	engine->reset.finish(engine);
 
-	kthread_unpark(engine->breadcrumbs.signaler);
-
 	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index cae10ebf9432..16e4f3f6bbbf 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1839,6 +1839,21 @@ static int gen9_init_render_ring(struct intel_engine_cs *engine)
 	return 0;
 }
 
+static void set_stop_engine(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	const u32 base = engine->mmio_base;
+	const i915_reg_t mode = RING_MI_MODE(base);
+
+	GEM_TRACE("%s\n", engine->name);
+	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+	if (__intel_wait_for_register_fw(dev_priv,
+					 mode, MODE_IDLE, MODE_IDLE,
+					 1000, 0,
+					 NULL))
+		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
+}
+
 static struct i915_request *
 execlists_reset_prepare(struct intel_engine_cs *engine)
 {
@@ -1878,6 +1893,12 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	if (request) {
 		unsigned long flags;
 
+		/*
+		 * Prevent the breadcrumb from advancing before we decide
+		 * which request is currently active.
+		 */
+		set_stop_engine(engine);
+
 		spin_lock_irqsave(&engine->timeline.lock, flags);
 		list_for_each_entry_from_reverse(request,
 						 &engine->timeline.requests,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index af5a178366ed..bb88a92fcc1e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -531,8 +531,26 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	return ret;
 }
 
+static void set_stop_engine(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	const u32 base = engine->mmio_base;
+	const i915_reg_t mode = RING_MI_MODE(base);
+
+	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+	if (__intel_wait_for_register_fw(dev_priv,
+					 mode, MODE_IDLE, MODE_IDLE,
+					 1000, 0,
+					 NULL))
+		DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
+				 engine->name);
+}
+
 static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
 {
+	if (INTEL_GEN(engine->i915) >= 3)
+		set_stop_engine(engine);
+
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/7] drm/i915: Remove tasklet flush before disable
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (5 preceding siblings ...)
  2018-05-16  6:47 ` [PATCH 7/7] drm/i915: Stop parking the signaler around reset Chris Wilson
@ 2018-05-16  7:09 ` Patchwork
  2018-05-16  7:27 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2018-05-16  7:09 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/7] drm/i915: Remove tasklet flush before disable
URL   : https://patchwork.freedesktop.org/series/43235/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
823d4421c732 drm/i915: Remove tasklet flush before disable
97032fea8040 drm/i915: Only sync tasklets once for recursive reset preparation
7f52a34d6167 drm/i915/execlists: Refactor out complete_preempt_context()
4477eba97fad drm/i915: Move engine reset prepare/finish to backends
5ecbe1da729c drm/i915: Split execlists/guc reset preparations
0df011d60e46 drm/i915/execlists: Flush pending preemption events during reset
-:69: WARNING:LONG_LINE: line over 100 characters
#69: FILE: drivers/gpu/drm/i915/intel_lrc.c:975:
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));

-:87: WARNING:LONG_LINE: line over 100 characters
#87: FILE: drivers/gpu/drm/i915/intel_lrc.c:989:
+			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));

-:104: WARNING:LONG_LINE: line over 100 characters
#104: FILE: drivers/gpu/drm/i915/intel_lrc.c:1004:
+			  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",

-:105: WARNING:LONG_LINE: line over 100 characters
#105: FILE: drivers/gpu/drm/i915/intel_lrc.c:1005:
+			  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");

total: 0 errors, 4 warnings, 0 checks, 192 lines checked
664c90a51f2e drm/i915: Stop parking the signaler around reset

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/7] drm/i915: Remove tasklet flush before disable
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (6 preceding siblings ...)
  2018-05-16  7:09 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/7] drm/i915: Remove tasklet flush before disable Patchwork
@ 2018-05-16  7:27 ` Patchwork
  2018-05-16  8:39 ` [PATCH 1/7] " Mika Kuoppala
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2018-05-16  7:27 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/7] drm/i915: Remove tasklet flush before disable
URL   : https://patchwork.freedesktop.org/series/43235/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4190 -> Patchwork_9010 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9010 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9010, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43235/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9010:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9010 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         PASS -> FAIL (fdo#102575)

    igt@kms_frontbuffer_tracking@basic:
      fi-hsw-4200u:       PASS -> DMESG-FAIL (fdo#106103, fdo#102614)

    igt@prime_vgem@basic-fence-flip:
      fi-ilk-650:         PASS -> FAIL (fdo#104008)

    
  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#106103 https://bugs.freedesktop.org/show_bug.cgi?id=106103


== Participating hosts (42 -> 37) ==

  Missing    (5): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4190 -> Patchwork_9010

  CI_DRM_4190: 38305250b2e939269187ca7a9bd258687a16df3e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4484: b7432bf309d5d5a6e07e8a0d8b522302fb0b4502 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9010: 664c90a51f2ead03ea36ec2f667b7e98178d44ef @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4484: 62ef6b0db8967e7021fd3e0b57d03ff164b984fe @ git://anongit.freedesktop.org/piglit


== Linux commits ==

664c90a51f2e drm/i915: Stop parking the signaler around reset
0df011d60e46 drm/i915/execlists: Flush pending preemption events during reset
5ecbe1da729c drm/i915: Split execlists/guc reset preparations
4477eba97fad drm/i915: Move engine reset prepare/finish to backends
7f52a34d6167 drm/i915/execlists: Refactor out complete_preempt_context()
97032fea8040 drm/i915: Only sync tasklets once for recursive reset preparation
823d4421c732 drm/i915: Remove tasklet flush before disable

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9010/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] drm/i915: Remove tasklet flush before disable
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (7 preceding siblings ...)
  2018-05-16  7:27 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-16  8:39 ` Mika Kuoppala
  2018-05-16  8:41   ` Chris Wilson
  2018-05-16 10:33 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2) Patchwork
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Mika Kuoppala @ 2018-05-16  8:39 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> The idea was to try and let the existing tasklet run to completion
> before we began the reset, but it involves a racy check against anything
> else that tries to run the tasklet. Rather than acknowledge and ignore
> the race, let it be and don't try and be too clever.
>
> The tasklet will resume execution after reset (after spinning a bit
> during reset), but before we allow it to resume we will have cleared all
> the pending state.

The disable works only on all future reschedules and
the dequeue is behind timeline lock. But what guards against
the tasklet being currently reading the ports?

-Mika

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 9 ---------
>  1 file changed, 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0a2070112b66..0dc369a9ec4d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3035,16 +3035,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>  	 * calling engine->init_hw() and also writing the ELSP.
>  	 * Turning off the execlists->tasklet until the reset is over
>  	 * prevents the race.
> -	 *
> -	 * Note that this needs to be a single atomic operation on the
> -	 * tasklet (flush existing tasks, prevent new tasks) to prevent
> -	 * a race between reset and set-wedged. It is not, so we do the best
> -	 * we can atm and make sure we don't lock the machine up in the more
> -	 * common case of recursively being called from set-wedged from inside
> -	 * i915_reset.
>  	 */
> -	if (!atomic_read(&engine->execlists.tasklet.count))
> -		tasklet_kill(&engine->execlists.tasklet);
>  	tasklet_disable(&engine->execlists.tasklet);
>  
>  	/*
> -- 
> 2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] drm/i915: Remove tasklet flush before disable
  2018-05-16  8:39 ` [PATCH 1/7] " Mika Kuoppala
@ 2018-05-16  8:41   ` Chris Wilson
  2018-05-16  9:57     ` Mika Kuoppala
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  8:41 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-05-16 09:39:14)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > The idea was to try and let the existing tasklet run to completion
> > before we began the reset, but it involves a racy check against anything
> > else that tries to run the tasklet. Rather than acknowledge and ignore
> > the race, let it be and don't try and be too clever.
> >
> > The tasklet will resume execution after reset (after spinning a bit
> > during reset), but before we allow it to resume we will have cleared all
> > the pending state.
> 
> The disable works only on all future reschedules and
> the dequeue is behind timeline lock. But what guards against
> the tasklet being currently reading the ports?

tasklet_disable() itself is synchronous and waits for completion of the
current execution before returning. See tasklet_disable_nosync() for the
complimentary function just to prevent future schedules from executing.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/7] drm/i915: Stop parking the signaler around reset
  2018-05-16  6:47 ` [PATCH 7/7] drm/i915: Stop parking the signaler around reset Chris Wilson
@ 2018-05-16  9:15   ` Tvrtko Ursulin
  2018-05-16  9:25     ` Chris Wilson
  2018-05-16 10:13   ` [PATCH v2] " Chris Wilson
  1 sibling, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2018-05-16  9:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/05/2018 07:47, Chris Wilson wrote:
> We cannot call kthread_park() from softirq context, so let's avoid it
> entirely during the reset. We wanted to suspend the signaler so that it
> would not mark a request as complete at the same time as we marked it as
> being in error. Instead of parking the signaling, stop the engine from
> advancing so that the GPU doesn't emit the breadcrumb for our chosen
> "guilty" request.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> CC: Michel Thierry <michel.thierry@intel.com>
> Cc: Jeff McGee <jeff.mcgee@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c         | 14 --------------
>   drivers/gpu/drm/i915/intel_lrc.c        | 21 +++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 18 ++++++++++++++++++
>   3 files changed, 39 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index abf661d40641..b0fe452ce17c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3015,18 +3015,6 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>   	 */
>   	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);
>   
> -	/*
> -	 * Prevent the signaler thread from updating the request
> -	 * state (by calling dma_fence_signal) as we are processing
> -	 * the reset. The write from the GPU of the seqno is
> -	 * asynchronous and the signaler thread may see a different
> -	 * value to us and declare the request complete, even though
> -	 * the reset routine have picked that request as the active
> -	 * (incomplete) request. This conflict is not handled
> -	 * gracefully!
> -	 */
> -	kthread_park(engine->breadcrumbs.signaler);
> -
>   	request = engine->reset.prepare(engine);
>   	if (request && request->fence.error == -EIO)
>   		request = ERR_PTR(-EIO); /* Previous reset failed! */
> @@ -3229,8 +3217,6 @@ void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
>   {
>   	engine->reset.finish(engine);
>   
> -	kthread_unpark(engine->breadcrumbs.signaler);
> -
>   	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index cae10ebf9432..16e4f3f6bbbf 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1839,6 +1839,21 @@ static int gen9_init_render_ring(struct intel_engine_cs *engine)
>   	return 0;
>   }
>   
> +static void set_stop_engine(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	const u32 base = engine->mmio_base;
> +	const i915_reg_t mode = RING_MI_MODE(base);
> +
> +	GEM_TRACE("%s\n", engine->name);
> +	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
> +	if (__intel_wait_for_register_fw(dev_priv,
> +					 mode, MODE_IDLE, MODE_IDLE,
> +					 1000, 0,
> +					 NULL))
> +		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
> +}
> +
>   static struct i915_request *
>   execlists_reset_prepare(struct intel_engine_cs *engine)
>   {
> @@ -1878,6 +1893,12 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
>   	if (request) {
>   		unsigned long flags;
>   
> +		/*
> +		 * Prevent the breadcrumb from advancing before we decide
> +		 * which request is currently active.
> +		 */
> +		set_stop_engine(engine);
> +
>   		spin_lock_irqsave(&engine->timeline.lock, flags);
>   		list_for_each_entry_from_reverse(request,
>   						 &engine->timeline.requests,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index af5a178366ed..bb88a92fcc1e 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -531,8 +531,26 @@ static int init_ring_common(struct intel_engine_cs *engine)
>   	return ret;
>   }
>   
> +static void set_stop_engine(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	const u32 base = engine->mmio_base;
> +	const i915_reg_t mode = RING_MI_MODE(base);
> +
> +	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
> +	if (__intel_wait_for_register_fw(dev_priv,
> +					 mode, MODE_IDLE, MODE_IDLE,
> +					 1000, 0,
> +					 NULL))
> +		DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
> +				 engine->name);
> +}

Looks to be exactly the same implementation as in intel_lrc.c apart from 
debug vs trace. Move to intel_engine_cs.c?

> +
>   static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
>   {
> +	if (INTEL_GEN(engine->i915) >= 3)
> +		set_stop_engine(engine);
> +
>   	if (engine->irq_seqno_barrier)
>   		engine->irq_seqno_barrier(engine);
>   
> 

Most important question - is stopping the engine expect to mostly work 
with a palette of different hang types?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/7] drm/i915: Stop parking the signaler around reset
  2018-05-16  9:15   ` Tvrtko Ursulin
@ 2018-05-16  9:25     ` Chris Wilson
  2018-05-16  9:49       ` Tvrtko Ursulin
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  9:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-16 10:15:34)
> 
> On 16/05/2018 07:47, Chris Wilson wrote:
> > We cannot call kthread_park() from softirq context, so let's avoid it
> > entirely during the reset. We wanted to suspend the signaler so that it
> > would not mark a request as complete at the same time as we marked it as
> > being in error. Instead of parking the signaling, stop the engine from
> > advancing so that the GPU doesn't emit the breadcrumb for our chosen
> > "guilty" request.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > CC: Michel Thierry <michel.thierry@intel.com>
> > Cc: Jeff McGee <jeff.mcgee@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---

> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index af5a178366ed..bb88a92fcc1e 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -531,8 +531,26 @@ static int init_ring_common(struct intel_engine_cs *engine)
> >       return ret;
> >   }
> >   
> > +static void set_stop_engine(struct intel_engine_cs *engine)
> > +{
> > +     struct drm_i915_private *dev_priv = engine->i915;
> > +     const u32 base = engine->mmio_base;
> > +     const i915_reg_t mode = RING_MI_MODE(base);
> > +
> > +     I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
> > +     if (__intel_wait_for_register_fw(dev_priv,
> > +                                      mode, MODE_IDLE, MODE_IDLE,
> > +                                      1000, 0,
> > +                                      NULL))
> > +             DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
> > +                              engine->name);
> > +}
> 
> Looks to be exactly the same implementation as in intel_lrc.c apart from 
> debug vs trace. Move to intel_engine_cs.c?

Not unless you also suggest a name I like ;)

Mika once had plans for engine->stop() so I didn't think too hard about
this bit, expecting it to be superseded.

> >   static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
> >   {
> > +     if (INTEL_GEN(engine->i915) >= 3)
> > +             set_stop_engine(engine);
> > +
> >       if (engine->irq_seqno_barrier)
> >               engine->irq_seqno_barrier(engine);
> >   
> > 
> 
> Most important question - is stopping the engine expect to mostly work 
> with a palette of different hang types?

The good news is that if the engine is hung, it doesn't matter! So what
it reliably stops is Command Streamer execution along the ring, and we
only need to rely on it stopping before the MI_STORE_DWORD of the
breadcrumb to be sure that we won't see the breadcrumb advance as we
process reset.

Now what's stopping the breadcrumb still being in flight (from the GPU)
as we sample it (from the CPU)?
Not much.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/7] drm/i915: Stop parking the signaler around reset
  2018-05-16  9:25     ` Chris Wilson
@ 2018-05-16  9:49       ` Tvrtko Ursulin
  2018-05-16  9:58         ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2018-05-16  9:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/05/2018 10:25, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-16 10:15:34)
>>
>> On 16/05/2018 07:47, Chris Wilson wrote:
>>> We cannot call kthread_park() from softirq context, so let's avoid it
>>> entirely during the reset. We wanted to suspend the signaler so that it
>>> would not mark a request as complete at the same time as we marked it as
>>> being in error. Instead of parking the signaling, stop the engine from
>>> advancing so that the GPU doesn't emit the breadcrumb for our chosen
>>> "guilty" request.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> Cc: Michał Winiarski <michal.winiarski@intel.com>
>>> CC: Michel Thierry <michel.thierry@intel.com>
>>> Cc: Jeff McGee <jeff.mcgee@intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
> 
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> index af5a178366ed..bb88a92fcc1e 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> @@ -531,8 +531,26 @@ static int init_ring_common(struct intel_engine_cs *engine)
>>>        return ret;
>>>    }
>>>    
>>> +static void set_stop_engine(struct intel_engine_cs *engine)
>>> +{
>>> +     struct drm_i915_private *dev_priv = engine->i915;
>>> +     const u32 base = engine->mmio_base;
>>> +     const i915_reg_t mode = RING_MI_MODE(base);
>>> +
>>> +     I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
>>> +     if (__intel_wait_for_register_fw(dev_priv,
>>> +                                      mode, MODE_IDLE, MODE_IDLE,
>>> +                                      1000, 0,
>>> +                                      NULL))
>>> +             DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
>>> +                              engine->name);
>>> +}
>>
>> Looks to be exactly the same implementation as in intel_lrc.c apart from
>> debug vs trace. Move to intel_engine_cs.c?
> 
> Not unless you also suggest a name I like ;)
> 
> Mika once had plans for engine->stop() so I didn't think too hard about
> this bit, expecting it to be superseded.

Vfunc, or intel_engine_stop, anything but two of the same.

Timed out message should also possibly even be upgraded to notice level.

>>>    static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
>>>    {
>>> +     if (INTEL_GEN(engine->i915) >= 3)
>>> +             set_stop_engine(engine);
>>> +
>>>        if (engine->irq_seqno_barrier)
>>>                engine->irq_seqno_barrier(engine);
>>>    
>>>
>>
>> Most important question - is stopping the engine expect to mostly work
>> with a palette of different hang types?
> 
> The good news is that if the engine is hung, it doesn't matter! So what
> it reliably stops is Command Streamer execution along the ring, and we
> only need to rely on it stopping before the MI_STORE_DWORD of the
> breadcrumb to be sure that we won't see the breadcrumb advance as we
> process reset.
> 
> Now what's stopping the breadcrumb still being in flight (from the GPU)
> as we sample it (from the CPU)?
> Not much.

If it reliably stops the command streamer then that sounds good enough 
to me.

The in-flight write should be unlikely, since we detected a hang that 
means any activity happened long time ago.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] drm/i915: Remove tasklet flush before disable
  2018-05-16  8:41   ` Chris Wilson
@ 2018-05-16  9:57     ` Mika Kuoppala
  0 siblings, 0 replies; 28+ messages in thread
From: Mika Kuoppala @ 2018-05-16  9:57 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-05-16 09:39:14)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > The idea was to try and let the existing tasklet run to completion
>> > before we began the reset, but it involves a racy check against anything
>> > else that tries to run the tasklet. Rather than acknowledge and ignore
>> > the race, let it be and don't try and be too clever.
>> >
>> > The tasklet will resume execution after reset (after spinning a bit
>> > during reset), but before we allow it to resume we will have cleared all
>> > the pending state.
>> 
>> The disable works only on all future reschedules and
>> the dequeue is behind timeline lock. But what guards against
>> the tasklet being currently reading the ports?
>
> tasklet_disable() itself is synchronous and waits for completion of the
> current execution before returning. See tasklet_disable_nosync() for the
> complimentary function just to prevent future schedules from executing.

Ok, so the spinning is on the tasklet infra when it spins on
scheduling, not being able to call the tasklet->func().

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/7] drm/i915: Stop parking the signaler around reset
  2018-05-16  9:49       ` Tvrtko Ursulin
@ 2018-05-16  9:58         ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2018-05-16  9:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-16 10:49:34)
> 
> On 16/05/2018 10:25, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-16 10:15:34)
> >>
> >> On 16/05/2018 07:47, Chris Wilson wrote:
> >>> We cannot call kthread_park() from softirq context, so let's avoid it
> >>> entirely during the reset. We wanted to suspend the signaler so that it
> >>> would not mark a request as complete at the same time as we marked it as
> >>> being in error. Instead of parking the signaling, stop the engine from
> >>> advancing so that the GPU doesn't emit the breadcrumb for our chosen
> >>> "guilty" request.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >>> Cc: Michał Winiarski <michal.winiarski@intel.com>
> >>> CC: Michel Thierry <michel.thierry@intel.com>
> >>> Cc: Jeff McGee <jeff.mcgee@intel.com>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> > 
> >>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>> index af5a178366ed..bb88a92fcc1e 100644
> >>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>> @@ -531,8 +531,26 @@ static int init_ring_common(struct intel_engine_cs *engine)
> >>>        return ret;
> >>>    }
> >>>    
> >>> +static void set_stop_engine(struct intel_engine_cs *engine)
> >>> +{
> >>> +     struct drm_i915_private *dev_priv = engine->i915;
> >>> +     const u32 base = engine->mmio_base;
> >>> +     const i915_reg_t mode = RING_MI_MODE(base);
> >>> +
> >>> +     I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
> >>> +     if (__intel_wait_for_register_fw(dev_priv,
> >>> +                                      mode, MODE_IDLE, MODE_IDLE,
> >>> +                                      1000, 0,
> >>> +                                      NULL))
> >>> +             DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
> >>> +                              engine->name);
> >>> +}
> >>
> >> Looks to be exactly the same implementation as in intel_lrc.c apart from
> >> debug vs trace. Move to intel_engine_cs.c?
> > 
> > Not unless you also suggest a name I like ;)
> > 
> > Mika once had plans for engine->stop() so I didn't think too hard about
> > this bit, expecting it to be superseded.
> 
> Vfunc, or intel_engine_stop, anything but two of the same.
> 
> Timed out message should also possibly even be upgraded to notice level.

It doesn't make any difference unless an error occurs later. The
STOP_RING will time out, we know it does. MODE_IDLE will only be set on
the next arbitration point (aiui) :(

Such cases are usually only resolved by wedging as the reset itself also
fail.

> >>>    static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
> >>>    {
> >>> +     if (INTEL_GEN(engine->i915) >= 3)
> >>> +             set_stop_engine(engine);
> >>> +
> >>>        if (engine->irq_seqno_barrier)
> >>>                engine->irq_seqno_barrier(engine);
> >>>    
> >>>
> >>
> >> Most important question - is stopping the engine expect to mostly work
> >> with a palette of different hang types?
> > 
> > The good news is that if the engine is hung, it doesn't matter! So what
> > it reliably stops is Command Streamer execution along the ring, and we
> > only need to rely on it stopping before the MI_STORE_DWORD of the
> > breadcrumb to be sure that we won't see the breadcrumb advance as we
> > process reset.
> > 
> > Now what's stopping the breadcrumb still being in flight (from the GPU)
> > as we sample it (from the CPU)?
> > Not much.
> 
> If it reliably stops the command streamer then that sounds good enough 
> to me.
> 
> The in-flight write should be unlikely, since we detected a hang that 
> means any activity happened long time ago.

Rule of thumb is to always use "hang" ;)

- we may have falsely declared a hang and the gpu was just slow

- another engine is hung, this one is chugging along oblivious

However, the uncached read is just what I would use to allow the write
from the GPU to be visible to the CPU, so maybe just add another?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] drm/i915: Stop parking the signaler around reset
  2018-05-16  6:47 ` [PATCH 7/7] drm/i915: Stop parking the signaler around reset Chris Wilson
  2018-05-16  9:15   ` Tvrtko Ursulin
@ 2018-05-16 10:13   ` Chris Wilson
  2018-05-16 10:37     ` Tvrtko Ursulin
  1 sibling, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2018-05-16 10:13 UTC (permalink / raw)
  To: intel-gfx

We cannot call kthread_park() from softirq context, so let's avoid it
entirely during the reset. We wanted to suspend the signaler so that it
would not mark a request as complete at the same time as we marked it as
being in error. Instead of parking the signaling, stop the engine from
advancing so that the GPU doesn't emit the breadcrumb for our chosen
"guilty" request.

v2: Refactor setting STOP_RING so that we don't have the same code thrice

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 14 ------------
 drivers/gpu/drm/i915/intel_engine_cs.c  | 30 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  6 +++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 drivers/gpu/drm/i915/intel_uncore.c     | 12 +++-------
 6 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index abf661d40641..b0fe452ce17c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3015,18 +3015,6 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 */
 	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);
 
-	/*
-	 * Prevent the signaler thread from updating the request
-	 * state (by calling dma_fence_signal) as we are processing
-	 * the reset. The write from the GPU of the seqno is
-	 * asynchronous and the signaler thread may see a different
-	 * value to us and declare the request complete, even though
-	 * the reset routine have picked that request as the active
-	 * (incomplete) request. This conflict is not handled
-	 * gracefully!
-	 */
-	kthread_park(engine->breadcrumbs.signaler);
-
 	request = engine->reset.prepare(engine);
 	if (request && request->fence.error == -EIO)
 		request = ERR_PTR(-EIO); /* Previous reset failed! */
@@ -3229,8 +3217,6 @@ void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
 {
 	engine->reset.finish(engine);
 
-	kthread_unpark(engine->breadcrumbs.signaler);
-
 	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 6bfd7e3ed152..4d8105fcaca0 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -769,6 +769,36 @@ u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine)
 	return bbaddr;
 }
 
+
+int intel_engine_stop_cs(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	const u32 base = engine->mmio_base;
+	const i915_reg_t mode = RING_MI_MODE(base);
+	int err;
+
+	if (INTEL_GEN(dev_priv) < 3)
+		return -ENODEV;
+
+	GEM_TRACE("%s\n", engine->name);
+
+	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+
+	err = 0;
+	if (__intel_wait_for_register_fw(dev_priv,
+					 mode, MODE_IDLE, MODE_IDLE,
+					 1000, 0,
+					 NULL)) {
+		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
+		err = -ETIMEDOUT;
+	}
+
+	/* A final mmio read to let GPU writes be hopefully flushed to memory */
+	POSTING_READ_FW(mode);
+
+	return err;
+}
+
 const char *i915_cache_level_str(struct drm_i915_private *i915, int type)
 {
 	switch (type) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index cae10ebf9432..a973950e96b7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1878,6 +1878,12 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	if (request) {
 		unsigned long flags;
 
+		/*
+		 * Prevent the breadcrumb from advancing before we decide
+		 * which request is currently active.
+		 */
+		intel_engine_stop_cs(engine);
+
 		spin_lock_irqsave(&engine->timeline.lock, flags);
 		list_for_each_entry_from_reverse(request,
 						 &engine->timeline.requests,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index af5a178366ed..6f200a747176 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -533,6 +533,8 @@ static int init_ring_common(struct intel_engine_cs *engine)
 
 static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
 {
+	intel_engine_stop_cs(engine);
+
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1e8bacba7754..61f385a92484 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -878,6 +878,8 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_blt_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
 
+int intel_engine_stop_cs(struct intel_engine_cs *engine);
+
 u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
 u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 448293eb638d..b36a3b5736a0 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1702,15 +1702,9 @@ static void gen3_stop_engine(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 	const u32 base = engine->mmio_base;
-	const i915_reg_t mode = RING_MI_MODE(base);
-
-	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
-	if (__intel_wait_for_register_fw(dev_priv,
-					 mode, MODE_IDLE, MODE_IDLE,
-					 500, 0,
-					 NULL))
-		DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
-				 engine->name);
+
+	if (intel_engine_stop_cs(engine))
+		DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n", engine->name);
 
 	I915_WRITE_FW(RING_HEAD(base), I915_READ_FW(RING_TAIL(base)));
 	POSTING_READ_FW(RING_HEAD(base)); /* paranoia */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2)
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (8 preceding siblings ...)
  2018-05-16  8:39 ` [PATCH 1/7] " Mika Kuoppala
@ 2018-05-16 10:33 ` Patchwork
  2018-05-16 10:48 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2018-05-16 10:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2)
URL   : https://patchwork.freedesktop.org/series/43235/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
fca038eedae4 drm/i915: Remove tasklet flush before disable
3fb0dca9825c drm/i915: Only sync tasklets once for recursive reset preparation
e85ea5b7fe3a drm/i915/execlists: Refactor out complete_preempt_context()
63941f891891 drm/i915: Move engine reset prepare/finish to backends
8048a04288ce drm/i915: Split execlists/guc reset preparations
d290200d038f drm/i915/execlists: Flush pending preemption events during reset
-:69: WARNING:LONG_LINE: line over 100 characters
#69: FILE: drivers/gpu/drm/i915/intel_lrc.c:975:
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));

-:87: WARNING:LONG_LINE: line over 100 characters
#87: FILE: drivers/gpu/drm/i915/intel_lrc.c:989:
+			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));

-:104: WARNING:LONG_LINE: line over 100 characters
#104: FILE: drivers/gpu/drm/i915/intel_lrc.c:1004:
+			  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",

-:105: WARNING:LONG_LINE: line over 100 characters
#105: FILE: drivers/gpu/drm/i915/intel_lrc.c:1005:
+			  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");

total: 0 errors, 4 warnings, 0 checks, 192 lines checked
20872ea9f3ce drm/i915: Stop parking the signaler around reset
-:65: CHECK:LINE_SPACING: Please don't use multiple blank lines
#65: FILE: drivers/gpu/drm/i915/intel_engine_cs.c:772:
 
+

total: 0 errors, 0 warnings, 1 checks, 108 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] drm/i915: Stop parking the signaler around reset
  2018-05-16 10:13   ` [PATCH v2] " Chris Wilson
@ 2018-05-16 10:37     ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2018-05-16 10:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/05/2018 11:13, Chris Wilson wrote:
> We cannot call kthread_park() from softirq context, so let's avoid it
> entirely during the reset. We wanted to suspend the signaler so that it
> would not mark a request as complete at the same time as we marked it as
> being in error. Instead of parking the signaling, stop the engine from
> advancing so that the GPU doesn't emit the breadcrumb for our chosen
> "guilty" request.
> 
> v2: Refactor setting STOP_RING so that we don't have the same code thrice
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> CC: Michel Thierry <michel.thierry@intel.com>
> Cc: Jeff McGee <jeff.mcgee@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c         | 14 ------------
>   drivers/gpu/drm/i915/intel_engine_cs.c  | 30 +++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c        |  6 +++++
>   drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
>   drivers/gpu/drm/i915/intel_uncore.c     | 12 +++-------
>   6 files changed, 43 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index abf661d40641..b0fe452ce17c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3015,18 +3015,6 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>   	 */
>   	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);
>   
> -	/*
> -	 * Prevent the signaler thread from updating the request
> -	 * state (by calling dma_fence_signal) as we are processing
> -	 * the reset. The write from the GPU of the seqno is
> -	 * asynchronous and the signaler thread may see a different
> -	 * value to us and declare the request complete, even though
> -	 * the reset routine have picked that request as the active
> -	 * (incomplete) request. This conflict is not handled
> -	 * gracefully!
> -	 */
> -	kthread_park(engine->breadcrumbs.signaler);
> -
>   	request = engine->reset.prepare(engine);
>   	if (request && request->fence.error == -EIO)
>   		request = ERR_PTR(-EIO); /* Previous reset failed! */
> @@ -3229,8 +3217,6 @@ void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
>   {
>   	engine->reset.finish(engine);
>   
> -	kthread_unpark(engine->breadcrumbs.signaler);
> -
>   	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 6bfd7e3ed152..4d8105fcaca0 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -769,6 +769,36 @@ u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine)
>   	return bbaddr;
>   }
>   
> +
> +int intel_engine_stop_cs(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	const u32 base = engine->mmio_base;
> +	const i915_reg_t mode = RING_MI_MODE(base);
> +	int err;
> +
> +	if (INTEL_GEN(dev_priv) < 3)
> +		return -ENODEV;
> +
> +	GEM_TRACE("%s\n", engine->name);
> +
> +	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
> +
> +	err = 0;
> +	if (__intel_wait_for_register_fw(dev_priv,
> +					 mode, MODE_IDLE, MODE_IDLE,
> +					 1000, 0,
> +					 NULL)) {
> +		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
> +		err = -ETIMEDOUT;
> +	}
> +
> +	/* A final mmio read to let GPU writes be hopefully flushed to memory */
> +	POSTING_READ_FW(mode);
> +
> +	return err;
> +}
> +
>   const char *i915_cache_level_str(struct drm_i915_private *i915, int type)
>   {
>   	switch (type) {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index cae10ebf9432..a973950e96b7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1878,6 +1878,12 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
>   	if (request) {
>   		unsigned long flags;
>   
> +		/*
> +		 * Prevent the breadcrumb from advancing before we decide
> +		 * which request is currently active.
> +		 */
> +		intel_engine_stop_cs(engine);
> +
>   		spin_lock_irqsave(&engine->timeline.lock, flags);
>   		list_for_each_entry_from_reverse(request,
>   						 &engine->timeline.requests,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index af5a178366ed..6f200a747176 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -533,6 +533,8 @@ static int init_ring_common(struct intel_engine_cs *engine)
>   
>   static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
>   {
> +	intel_engine_stop_cs(engine);
> +
>   	if (engine->irq_seqno_barrier)
>   		engine->irq_seqno_barrier(engine);
>   
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 1e8bacba7754..61f385a92484 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -878,6 +878,8 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
>   int intel_init_blt_ring_buffer(struct intel_engine_cs *engine);
>   int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
>   
> +int intel_engine_stop_cs(struct intel_engine_cs *engine);
> +
>   u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
>   u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine);
>   
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 448293eb638d..b36a3b5736a0 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1702,15 +1702,9 @@ static void gen3_stop_engine(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
>   	const u32 base = engine->mmio_base;
> -	const i915_reg_t mode = RING_MI_MODE(base);
> -
> -	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
> -	if (__intel_wait_for_register_fw(dev_priv,
> -					 mode, MODE_IDLE, MODE_IDLE,
> -					 500, 0,
> -					 NULL))
> -		DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
> -				 engine->name);
> +
> +	if (intel_engine_stop_cs(engine))
> +		DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n", engine->name);
>   
>   	I915_WRITE_FW(RING_HEAD(base), I915_READ_FW(RING_TAIL(base)));
>   	POSTING_READ_FW(RING_HEAD(base)); /* paranoia */
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2)
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (9 preceding siblings ...)
  2018-05-16 10:33 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2) Patchwork
@ 2018-05-16 10:48 ` Patchwork
  2018-05-16 14:21 ` ✗ Fi.CI.IGT: failure for series starting with [1/7] drm/i915: Remove tasklet flush before disable Patchwork
  2018-05-16 18:26 ` ✗ Fi.CI.IGT: failure for series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2) Patchwork
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2018-05-16 10:48 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2)
URL   : https://patchwork.freedesktop.org/series/43235/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4191 -> Patchwork_9013 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9013 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9013, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43235/revisions/2/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9013:

  === IGT changes ===

    ==== Warnings ====

    igt@kms_force_connector_basic@force-edid:
      fi-snb-2520m:       SKIP -> PASS +3

    
== Known issues ==

  Here are the changes found in Patchwork_9013 that come from known issues:

  === IGT changes ===

    ==== Possible fixes ====

    igt@drv_module_reload@basic-reload:
      fi-glk-j4005:       DMESG-WARN (fdo#106248) -> PASS

    igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a:
      fi-glk-j4005:       DMESG-WARN (fdo#106235) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-bxt-dsi:         INCOMPLETE (fdo#103927) -> PASS

    
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#106235 https://bugs.freedesktop.org/show_bug.cgi?id=106235
  fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248


== Participating hosts (42 -> 35) ==

  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-hsw-4770 fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4191 -> Patchwork_9013

  CI_DRM_4191: 70daebf1a83c2ed6eff118d2a2806086c0c89027 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4484: b7432bf309d5d5a6e07e8a0d8b522302fb0b4502 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9013: 20872ea9f3ce987d4221d00c243170056a0ba62d @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4484: 62ef6b0db8967e7021fd3e0b57d03ff164b984fe @ git://anongit.freedesktop.org/piglit


== Linux commits ==

20872ea9f3ce drm/i915: Stop parking the signaler around reset
d290200d038f drm/i915/execlists: Flush pending preemption events during reset
8048a04288ce drm/i915: Split execlists/guc reset preparations
63941f891891 drm/i915: Move engine reset prepare/finish to backends
e85ea5b7fe3a drm/i915/execlists: Refactor out complete_preempt_context()
3fb0dca9825c drm/i915: Only sync tasklets once for recursive reset preparation
fca038eedae4 drm/i915: Remove tasklet flush before disable

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9013/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [1/7] drm/i915: Remove tasklet flush before disable
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (10 preceding siblings ...)
  2018-05-16 10:48 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-16 14:21 ` Patchwork
  2018-05-16 18:26 ` ✗ Fi.CI.IGT: failure for series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2) Patchwork
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2018-05-16 14:21 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/7] drm/i915: Remove tasklet flush before disable
URL   : https://patchwork.freedesktop.org/series/43235/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4190_full -> Patchwork_9010_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9010_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9010_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43235/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9010_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@drv_selftest@live_hangcheck:
      shard-apl:          PASS -> DMESG-FAIL

    igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy:
      shard-glk:          PASS -> FAIL

    
    ==== Warnings ====

    igt@gem_exec_schedule@deep-vebox:
      shard-kbl:          PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_9010_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@2x-plain-flip-fb-recreate-interruptible:
      shard-glk:          PASS -> FAIL (fdo#100368)

    igt@kms_flip_tiling@flip-to-y-tiled:
      shard-glk:          PASS -> FAIL (fdo#104724)

    
    ==== Possible fixes ====

    igt@gem_eio@in-flight-internal-immediate:
      shard-glk:          DMESG-WARN (fdo#106523) -> PASS +7

    igt@gem_eio@in-flight-suspend:
      shard-kbl:          DMESG-WARN (fdo#106523) -> PASS +13

    igt@gem_eio@wait-wedge-10ms:
      shard-apl:          DMESG-WARN (fdo#106523) -> PASS +6
      shard-snb:          DMESG-WARN (fdo#106523) -> PASS +5

    igt@gem_eio@wait-wedge-1us:
      shard-hsw:          DMESG-WARN (fdo#106523) -> PASS +4

    igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic:
      shard-glk:          FAIL (fdo#106509) -> PASS

    igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic:
      shard-glk:          FAIL (fdo#105454) -> PASS

    igt@kms_flip@2x-wf_vblank-ts-check-interruptible:
      shard-hsw:          FAIL (fdo#103928) -> PASS

    igt@kms_flip@blocking-wf_vblank:
      shard-glk:          FAIL (fdo#100368) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105454 https://bugs.freedesktop.org/show_bug.cgi?id=105454
  fdo#106509 https://bugs.freedesktop.org/show_bug.cgi?id=106509
  fdo#106523 https://bugs.freedesktop.org/show_bug.cgi?id=106523


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4190 -> Patchwork_9010

  CI_DRM_4190: 38305250b2e939269187ca7a9bd258687a16df3e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4484: b7432bf309d5d5a6e07e8a0d8b522302fb0b4502 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9010: 664c90a51f2ead03ea36ec2f667b7e98178d44ef @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4484: 62ef6b0db8967e7021fd3e0b57d03ff164b984fe @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9010/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset
  2018-05-16  6:47 ` [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset Chris Wilson
@ 2018-05-16 14:24   ` Tvrtko Ursulin
  2018-05-16 15:01     ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2018-05-16 14:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/05/2018 07:47, Chris Wilson wrote:
> Catch up with the inflight CSB events, after disabling the tasklet
> before deciding which request was truly guilty of hanging the GPU.
> 
> v2: Restore checking of use_csb_mmio on every loop, don't forget old
> vgpu.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> CC: Michel Thierry <michel.thierry@intel.com>
> Cc: Jeff McGee <jeff.mcgee@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 127 +++++++++++++++++++++----------
>   1 file changed, 87 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 7afe52fa615d..cae10ebf9432 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -957,34 +957,14 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   	local_irq_restore(flags);
>   }
>   
> -/*
> - * Check the unread Context Status Buffers and manage the submission of new
> - * contexts to the ELSP accordingly.
> - */
> -static void execlists_submission_tasklet(unsigned long data)
> +static void process_csb(struct intel_engine_cs *engine)
>   {
> -	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct execlist_port *port = execlists->port;
> -	struct drm_i915_private *dev_priv = engine->i915;
> +	struct drm_i915_private *i915 = engine->i915;
>   	bool fw = false;
>   
> -	/*
> -	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
> -	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
> -	 * not be relinquished until the device is idle (see
> -	 * i915_gem_idle_work_handler()). As a precaution, we make sure
> -	 * that all ELSP are drained i.e. we have processed the CSB,
> -	 * before allowing ourselves to idle and calling intel_runtime_pm_put().
> -	 */
> -	GEM_BUG_ON(!dev_priv->gt.awake);
> -
> -	/*
> -	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
> -	 * imposing the cost of a locked atomic transaction when submitting a
> -	 * new request (outside of the context-switch interrupt).
> -	 */
> -	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
> +	do {
>   		/* The HWSP contains a (cacheable) mirror of the CSB */
>   		const u32 *buf =
>   			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> @@ -992,28 +972,27 @@ static void execlists_submission_tasklet(unsigned long data)
>   
>   		if (unlikely(execlists->csb_use_mmio)) {
>   			buf = (u32 * __force)
> -				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> -			execlists->csb_head = -1; /* force mmio read of CSB ptrs */
> +				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> +			execlists->csb_head = -1; /* force mmio read of CSB */
>   		}
>   
>   		/* Clear before reading to catch new interrupts */
>   		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>   		smp_mb__after_atomic();
>   
> -		if (unlikely(execlists->csb_head == -1)) { /* following a reset */
> +		if (unlikely(execlists->csb_head == -1)) { /* after a reset */
>   			if (!fw) {
> -				intel_uncore_forcewake_get(dev_priv,
> -							   execlists->fw_domains);
> +				intel_uncore_forcewake_get(i915, execlists->fw_domains);
>   				fw = true;
>   			}
>   
> -			head = readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> +			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
>   			tail = GEN8_CSB_WRITE_PTR(head);
>   			head = GEN8_CSB_READ_PTR(head);
>   			execlists->csb_head = head;
>   		} else {
>   			const int write_idx =
> -				intel_hws_csb_write_index(dev_priv) -
> +				intel_hws_csb_write_index(i915) -
>   				I915_HWS_CSB_BUF0_INDEX;
>   
>   			head = execlists->csb_head;
> @@ -1022,8 +1001,8 @@ static void execlists_submission_tasklet(unsigned long data)
>   		}
>   		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
>   			  engine->name,
> -			  head, GEN8_CSB_READ_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
> -			  tail, GEN8_CSB_WRITE_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
> +			  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
> +			  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
>   
>   		while (head != tail) {
>   			struct i915_request *rq;
> @@ -1033,7 +1012,8 @@ static void execlists_submission_tasklet(unsigned long data)
>   			if (++head == GEN8_CSB_ENTRIES)
>   				head = 0;
>   
> -			/* We are flying near dragons again.
> +			/*
> +			 * We are flying near dragons again.
>   			 *
>   			 * We hold a reference to the request in execlist_port[]
>   			 * but no more than that. We are operating in softirq
> @@ -1142,15 +1122,48 @@ static void execlists_submission_tasklet(unsigned long data)
>   		if (head != execlists->csb_head) {
>   			execlists->csb_head = head;
>   			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> -			       dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> +			       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
>   		}
> -	}
> +	} while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));

Ideally it should three patches, or at least two:

1. Extract out CSB processing.
2. Move ENGINE_IRQ_EXECLISTS check to the caller / end of do-while loop.
3. Reset changes.

>   
> -	if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT))
> -		execlists_dequeue(engine);
> +	if (unlikely(fw))
> +		intel_uncore_forcewake_put(i915, execlists->fw_domains);
> +}
> +
> +/*
> + * Check the unread Context Status Buffers and manage the submission of new
> + * contexts to the ELSP accordingly.
> + */
> +static void execlists_submission_tasklet(unsigned long data)
> +{
> +	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
>   
> -	if (fw)
> -		intel_uncore_forcewake_put(dev_priv, execlists->fw_domains);
> +	GEM_TRACE("%s awake?=%d, active=%x, irq-posted?=%d\n",
> +		  engine->name,
> +		  engine->i915->gt.awake,
> +		  engine->execlists.active,
> +		  test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
> +
> +	/*
> +	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
> +	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
> +	 * not be relinquished until the device is idle (see
> +	 * i915_gem_idle_work_handler()). As a precaution, we make sure
> +	 * that all ELSP are drained i.e. we have processed the CSB,
> +	 * before allowing ourselves to idle and calling intel_runtime_pm_put().
> +	 */
> +	GEM_BUG_ON(!engine->i915->gt.awake);
> +
> +	/*
> +	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
> +	 * imposing the cost of a locked atomic transaction when submitting a
> +	 * new request (outside of the context-switch interrupt).
> +	 */
> +	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
> +		process_csb(engine);
> +
> +	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
> +		execlists_dequeue(engine);
>   
>   	/* If the engine is now idle, so should be the flag; and vice versa. */
>   	GEM_BUG_ON(execlists_is_active(&engine->execlists,
> @@ -1830,6 +1843,7 @@ static struct i915_request *
>   execlists_reset_prepare(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	struct i915_request *request, *active;
>   
>   	GEM_TRACE("%s\n", engine->name);
>   
> @@ -1844,7 +1858,40 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
>   	 */
>   	__tasklet_disable_once(&execlists->tasklet);
>   
> -	return i915_gem_find_active_request(engine);
> +	/*
> +	 * We want to flush the pending context switches, having disabled
> +	 * the tasklet above, we can assume exclusive access to the execlists.
> +	 * For this allows us to catch up with an inflight preemption event,
> +	 * and avoid blaming an innocent request if the stall was due to the
> +	 * preemption itself.
> +	 */
> +	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
> +		process_csb(engine);
> +
> +	/*
> +	 * The last active request can then be no later than the last request
> +	 * now in ELSP[0]. So search backwards from there, so that if the GPU
> +	 * has advanced beyond the last CSB update, it will be pardoned.
> +	 */
> +	active = NULL;
> +	request = port_request(execlists->port);
> +	if (request) {

Assignment to request is for nothing it seems.

> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&engine->timeline.lock, flags);
> +		list_for_each_entry_from_reverse(request,
> +						 &engine->timeline.requests,
> +						 link) {
> +			if (__i915_request_completed(request,
> +						     request->global_seqno))
> +				break;
> +
> +			active = request;
> +		}
> +		spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +	}
> +
> +	return active;
>   }
>   
>   static void execlists_reset(struct intel_engine_cs *engine,
> 

No other complaints and I could be bribed to look past the request to 
split it. :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] drm/i915: Only sync tasklets once for recursive reset preparation
  2018-05-16  6:47 ` [PATCH 2/7] drm/i915: Only sync tasklets once for recursive reset preparation Chris Wilson
@ 2018-05-16 14:24   ` Mika Kuoppala
  2018-05-16 14:29     ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Mika Kuoppala @ 2018-05-16 14:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> When setting up reset, we may need to recursively prepare an engine. In
> which case we should only synchronously flush the tasklets on the outer
> most call, the inner calls will then be inside an atomic section where
> the tasklet will never be run (and so the sync will never complete).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 2 +-
>  drivers/gpu/drm/i915/i915_gem.h | 7 +++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0dc369a9ec4d..aab1f5e77ae9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3036,7 +3036,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>  	 * Turning off the execlists->tasklet until the reset is over
>  	 * prevents the race.
>  	 */
> -	tasklet_disable(&engine->execlists.tasklet);
> +	__tasklet_disable_once(&engine->execlists.tasklet);

It is debatable that could the naming be improved as
the 'once' is tied to the disable now.

__tasklet_disable_wait_[once|first].

I was going to insist on adding a comment tho,
but there it is already above, explaining
the recursion from set wedged.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  
>  	/*
>  	 * We're using worker to queue preemption requests from the tasklet in
> diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
> index 525920404ede..af033e7c1c56 100644
> --- a/drivers/gpu/drm/i915/i915_gem.h
> +++ b/drivers/gpu/drm/i915/i915_gem.h
> @@ -26,6 +26,7 @@
>  #define __I915_GEM_H__
>  
>  #include <linux/bug.h>
> +#include <linux/interrupt.h>
>  
>  struct drm_i915_private;
>  
> @@ -72,4 +73,10 @@ struct drm_i915_private;
>  void i915_gem_park(struct drm_i915_private *i915);
>  void i915_gem_unpark(struct drm_i915_private *i915);
>  
> +static inline void __tasklet_disable_once(struct tasklet_struct *t)
> +{
> +	if (atomic_inc_return(&t->count) == 1)
> +		tasklet_unlock_wait(t);
> +}
> +
>  #endif /* __I915_GEM_H__ */
> -- 
> 2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] drm/i915: Only sync tasklets once for recursive reset preparation
  2018-05-16 14:24   ` Mika Kuoppala
@ 2018-05-16 14:29     ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2018-05-16 14:29 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-05-16 15:24:56)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > When setting up reset, we may need to recursively prepare an engine. In
> > which case we should only synchronously flush the tasklets on the outer
> > most call, the inner calls will then be inside an atomic section where
> > the tasklet will never be run (and so the sync will never complete).
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 2 +-
> >  drivers/gpu/drm/i915/i915_gem.h | 7 +++++++
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 0dc369a9ec4d..aab1f5e77ae9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3036,7 +3036,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
> >        * Turning off the execlists->tasklet until the reset is over
> >        * prevents the race.
> >        */
> > -     tasklet_disable(&engine->execlists.tasklet);
> > +     __tasklet_disable_once(&engine->execlists.tasklet);
> 
> It is debatable that could the naming be improved as
> the 'once' is tied to the disable now.
> 
> __tasklet_disable_wait_[once|first].

wait_once is a good suggestion

tasklet_disable
tasklet_disable_nosync
tasklet_disable_sync_once ?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset
  2018-05-16 14:24   ` Tvrtko Ursulin
@ 2018-05-16 15:01     ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2018-05-16 15:01 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-16 15:24:33)
> > +     /*
> > +      * The last active request can then be no later than the last request
> > +      * now in ELSP[0]. So search backwards from there, so that if the GPU
> > +      * has advanced beyond the last CSB update, it will be pardoned.
> > +      */
> > +     active = NULL;
> > +     request = port_request(execlists->port);
> > +     if (request) {
> 
> Assignment to request is for nothing it seems.
> 
> > +             unsigned long flags;
> > +
> > +             spin_lock_irqsave(&engine->timeline.lock, flags);
> > +             list_for_each_entry_from_reverse(request,

It's a 'from' list_for_each variant.

> > +                                              &engine->timeline.requests,
> > +                                              link) {
> > +                     if (__i915_request_completed(request,
> > +                                                  request->global_seqno))
> > +                             break;
> > +
> > +                     active = request;
> > +             }
> > +             spin_unlock_irqrestore(&engine->timeline.lock, flags);
> > +     }
> > +
> > +     return active;
> >   }
> >   
> >   static void execlists_reset(struct intel_engine_cs *engine,
> > 
> 
> No other complaints and I could be bribed to look past the request to 
> split it. :)

Is not clearing the backlog so we can get onto features not enough
incentive? Chocolate? Beer? Chocolate-flavoured beer?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2)
  2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
                   ` (11 preceding siblings ...)
  2018-05-16 14:21 ` ✗ Fi.CI.IGT: failure for series starting with [1/7] drm/i915: Remove tasklet flush before disable Patchwork
@ 2018-05-16 18:26 ` Patchwork
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2018-05-16 18:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2)
URL   : https://patchwork.freedesktop.org/series/43235/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4191_full -> Patchwork_9013_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9013_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9013_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43235/revisions/2/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9013_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@drv_selftest@live_hangcheck:
      shard-kbl:          PASS -> DMESG-FAIL

    igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy:
      shard-glk:          PASS -> FAIL

    
    ==== Warnings ====

    igt@pm_rc6_residency@rc6-accuracy:
      shard-kbl:          SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9013_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_eio@in-flight-contexts-immediate:
      shard-glk:          PASS -> FAIL (fdo#105954)

    igt@kms_flip@2x-wf_vblank-ts-check:
      shard-hsw:          PASS -> FAIL (fdo#103928)

    igt@kms_flip_tiling@flip-to-y-tiled:
      shard-glk:          PASS -> FAIL (fdo#104724, fdo#103822)

    igt@kms_flip_tiling@flip-y-tiled:
      shard-glk:          PASS -> FAIL (fdo#104724)

    
    ==== Possible fixes ====

    igt@gem_eio@in-flight-contexts-immediate:
      shard-hsw:          DMESG-WARN (fdo#106523) -> PASS +5

    igt@gem_eio@in-flight-internal-immediate:
      shard-glk:          DMESG-WARN (fdo#106523) -> PASS +7

    igt@gem_eio@in-flight-suspend:
      shard-kbl:          DMESG-WARN (fdo#106523) -> PASS +13

    igt@gem_eio@wait-wedge-10ms:
      shard-apl:          DMESG-WARN (fdo#106523) -> PASS +6
      shard-snb:          DMESG-WARN (fdo#106523) -> PASS +5

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking:
      shard-glk:          FAIL (fdo#105703) -> PASS

    igt@kms_cursor_legacy@flip-vs-cursor-toggle:
      shard-hsw:          FAIL (fdo#102670) -> PASS

    igt@kms_flip@plain-flip-ts-check-interruptible:
      shard-glk:          FAIL (fdo#100368) -> PASS

    igt@kms_flip_tiling@flip-to-x-tiled:
      shard-glk:          FAIL (fdo#104724, fdo#103822) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-wc:
      shard-kbl:          DMESG-WARN (fdo#105602, fdo#103558) -> PASS +1

    igt@kms_setmode@basic:
      shard-kbl:          FAIL (fdo#99912) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102670 https://bugs.freedesktop.org/show_bug.cgi?id=102670
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103822 https://bugs.freedesktop.org/show_bug.cgi?id=103822
  fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#105703 https://bugs.freedesktop.org/show_bug.cgi?id=105703
  fdo#105954 https://bugs.freedesktop.org/show_bug.cgi?id=105954
  fdo#106523 https://bugs.freedesktop.org/show_bug.cgi?id=106523
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4191 -> Patchwork_9013

  CI_DRM_4191: 70daebf1a83c2ed6eff118d2a2806086c0c89027 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4484: b7432bf309d5d5a6e07e8a0d8b522302fb0b4502 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9013: 20872ea9f3ce987d4221d00c243170056a0ba62d @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4484: 62ef6b0db8967e7021fd3e0b57d03ff164b984fe @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9013/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset
  2018-05-16 15:12 ` [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset Chris Wilson
@ 2018-05-16 15:31   ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2018-05-16 15:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/05/2018 16:12, Chris Wilson wrote:
> Catch up with the inflight CSB events, after disabling the tasklet
> before deciding which request was truly guilty of hanging the GPU.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> CC: Michel Thierry <michel.thierry@intel.com>
> Cc: Jeff McGee <jeff.mcgee@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 36 +++++++++++++++++++++++++++++++-
>   1 file changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e70d8d624899..3b889bb4352a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1843,6 +1843,7 @@ static struct i915_request *
>   execlists_reset_prepare(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	struct i915_request *request, *active;
>   
>   	GEM_TRACE("%s\n", engine->name);
>   
> @@ -1857,7 +1858,40 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
>   	 */
>   	__tasklet_disable_sync_once(&execlists->tasklet);
>   
> -	return i915_gem_find_active_request(engine);
> +	/*
> +	 * We want to flush the pending context switches, having disabled
> +	 * the tasklet above, we can assume exclusive access to the execlists.
> +	 * For this allows us to catch up with an inflight preemption event,
> +	 * and avoid blaming an innocent request if the stall was due to the
> +	 * preemption itself.
> +	 */
> +	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
> +		process_csb(engine);
> +
> +	/*
> +	 * The last active request can then be no later than the last request
> +	 * now in ELSP[0]. So search backwards from there, so that if the GPU
> +	 * has advanced beyond the last CSB update, it will be pardoned.
> +	 */
> +	active = NULL;
> +	request = port_request(execlists->port);
> +	if (request) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&engine->timeline.lock, flags);
> +		list_for_each_entry_from_reverse(request,
> +						 &engine->timeline.requests,
> +						 link) {
> +			if (__i915_request_completed(request,
> +						     request->global_seqno))
> +				break;
> +
> +			active = request;
> +		}
> +		spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +	}
> +
> +	return active;
>   }
>   
>   static void execlists_reset(struct intel_engine_cs *engine,
> 

Sounds sensible to me, just don't ask me to describe preemption and 
reset flow where things go bad without it. :)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset
  2018-05-16 15:12 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
@ 2018-05-16 15:12 ` Chris Wilson
  2018-05-16 15:31   ` Tvrtko Ursulin
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2018-05-16 15:12 UTC (permalink / raw)
  To: intel-gfx

Catch up with the inflight CSB events, after disabling the tasklet
before deciding which request was truly guilty of hanging the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 36 +++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e70d8d624899..3b889bb4352a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1843,6 +1843,7 @@ static struct i915_request *
 execlists_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_request *request, *active;
 
 	GEM_TRACE("%s\n", engine->name);
 
@@ -1857,7 +1858,40 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	 */
 	__tasklet_disable_sync_once(&execlists->tasklet);
 
-	return i915_gem_find_active_request(engine);
+	/*
+	 * We want to flush the pending context switches, having disabled
+	 * the tasklet above, we can assume exclusive access to the execlists.
+	 * For this allows us to catch up with an inflight preemption event,
+	 * and avoid blaming an innocent request if the stall was due to the
+	 * preemption itself.
+	 */
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+		process_csb(engine);
+
+	/*
+	 * The last active request can then be no later than the last request
+	 * now in ELSP[0]. So search backwards from there, so that if the GPU
+	 * has advanced beyond the last CSB update, it will be pardoned.
+	 */
+	active = NULL;
+	request = port_request(execlists->port);
+	if (request) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&engine->timeline.lock, flags);
+		list_for_each_entry_from_reverse(request,
+						 &engine->timeline.requests,
+						 link) {
+			if (__i915_request_completed(request,
+						     request->global_seqno))
+				break;
+
+			active = request;
+		}
+		spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	}
+
+	return active;
 }
 
 static void execlists_reset(struct intel_engine_cs *engine,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-05-16 18:26 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-16  6:47 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
2018-05-16  6:47 ` [PATCH 2/7] drm/i915: Only sync tasklets once for recursive reset preparation Chris Wilson
2018-05-16 14:24   ` Mika Kuoppala
2018-05-16 14:29     ` Chris Wilson
2018-05-16  6:47 ` [PATCH 3/7] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
2018-05-16  6:47 ` [PATCH 4/7] drm/i915: Move engine reset prepare/finish to backends Chris Wilson
2018-05-16  6:47 ` [PATCH 5/7] drm/i915: Split execlists/guc reset preparations Chris Wilson
2018-05-16  6:47 ` [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset Chris Wilson
2018-05-16 14:24   ` Tvrtko Ursulin
2018-05-16 15:01     ` Chris Wilson
2018-05-16  6:47 ` [PATCH 7/7] drm/i915: Stop parking the signaler around reset Chris Wilson
2018-05-16  9:15   ` Tvrtko Ursulin
2018-05-16  9:25     ` Chris Wilson
2018-05-16  9:49       ` Tvrtko Ursulin
2018-05-16  9:58         ` Chris Wilson
2018-05-16 10:13   ` [PATCH v2] " Chris Wilson
2018-05-16 10:37     ` Tvrtko Ursulin
2018-05-16  7:09 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/7] drm/i915: Remove tasklet flush before disable Patchwork
2018-05-16  7:27 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-16  8:39 ` [PATCH 1/7] " Mika Kuoppala
2018-05-16  8:41   ` Chris Wilson
2018-05-16  9:57     ` Mika Kuoppala
2018-05-16 10:33 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2) Patchwork
2018-05-16 10:48 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-16 14:21 ` ✗ Fi.CI.IGT: failure for series starting with [1/7] drm/i915: Remove tasklet flush before disable Patchwork
2018-05-16 18:26 ` ✗ Fi.CI.IGT: failure for series starting with [1/7] drm/i915: Remove tasklet flush before disable (rev2) Patchwork
2018-05-16 15:12 [PATCH 1/7] drm/i915: Remove tasklet flush before disable Chris Wilson
2018-05-16 15:12 ` [PATCH 6/7] drm/i915/execlists: Flush pending preemption events during reset Chris Wilson
2018-05-16 15:31   ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.