All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arun Siluvery <arun.siluvery@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: Tomas Elf <tomas.elf@intel.com>
Subject: [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset
Date: Fri, 17 Jun 2016 08:09:07 +0100	[thread overview]
Message-ID: <1466147355-4635-8-git-send-email-arun.siluvery@linux.intel.com> (raw)
In-Reply-To: <1466147355-4635-1-git-send-email-arun.siluvery@linux.intel.com>

We capture the state of an engine before resetting it, once the reset is
successful engine is restored with the same state and restarted.

The state includes head register and active request. We also nudge the head
forward if it hasn't advanced, otherwise when the engine is restarted HW
executes the same instruction and may hang again. Generally head
automatically advances to the next instruction as soon as HW reads current
instruction, without waiting for it to complete, however a MBOX wait
inserted directly to VCS/BCS engines doesn't behave in the same way,
instead head will still be pointing at the same instruction until it
completes.

If the head is modified, this is also updated in the context image so that
HW sees up to date value.

A valid request is expected in the state at this point otherwise we
wouldn't have reached this point, the context that submitted this request
is resubmitted to HW. The request that caused the hang would be at the
start of execlist queue, unless we resubmit and complete this request, it
cannot be removed from the queue.

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 94 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  9 ++++
 2 files changed, 103 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b83552a..c9aa2ca 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -487,6 +487,30 @@ static void execlists_context_unqueue(struct intel_engine_cs *engine,
 	execlists_submit_requests(req0, req1, tdr_resubmission);
 }
 
+/**
+ * intel_execlists_resubmit()
+ * @engine: engine to do resubmission for
+ *
+ * In execlists mode, engine reset postprocess mainly includes resubmission of
+ * context after reset, for this we bypass the execlist queue. This is
+ * necessary since at the point of TDR hang recovery the hardware will be hung
+ * and resubmitting a fixed context (the context that the TDR has identified
+ * as hung and fixed up in order to move past the blocking batch buffer) to a
+ * hung execlist queue will lock up the TDR.  Instead, opt for direct ELSP
+ * submission without depending on the rest of the driver.
+ */
+static void intel_execlists_resubmit(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (WARN_ON(list_empty(&engine->execlist_queue)))
+		return;
+
+	spin_lock_irqsave(&engine->execlist_lock, flags);
+	execlists_context_unqueue(engine, true);
+	spin_unlock_irqrestore(&engine->execlist_lock, flags);
+}
+
 static unsigned int
 execlists_check_remove_request(struct intel_engine_cs *engine, u32 ctx_id)
 {
@@ -1098,6 +1122,75 @@ static int gen8_engine_state_save(struct intel_engine_cs *engine,
 	return 0;
 }
 
+/**
+ * gen8_engine_start() - restore saved state and start engine
+ * @engine: engine to be started
+ * @state: state to be restored
+ *
+ * Returns:
+ *	0 if ok, otherwise propagates error codes.
+ */
+static int gen8_engine_start(struct intel_engine_cs *engine,
+			     struct intel_engine_cs_state *state)
+{
+	u32 head;
+	u32 head_addr, tail_addr;
+	u32 *reg_state;
+	struct intel_ringbuffer *ringbuf;
+	struct i915_gem_context *ctx;
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	ctx = state->req->ctx;
+	ringbuf = ctx->engine[engine->id].ringbuf;
+	reg_state = ctx->engine[engine->id].lrc_reg_state;
+
+	head = state->head;
+	head_addr = head & HEAD_ADDR;
+
+	if (head == engine->hangcheck.last_head) {
+		/*
+		 * The engine has not advanced since the last time it hung,
+		 * force it to advance to the next QWORD. In most cases the
+		 * engine head pointer will automatically advance to the
+		 * next instruction as soon as it has read the current
+		 * instruction, without waiting for it to complete. This
+		 * seems to be the default behaviour, however an MBOX wait
+		 * inserted directly to the VCS/BCS engines does not behave
+		 * in the same way, instead the head pointer will still be
+		 * pointing at the MBOX instruction until it completes.
+		 */
+		head_addr = roundup(head_addr, 8);
+		engine->hangcheck.last_head = head;
+	} else if (head_addr & 0x7) {
+		/* Ensure head pointer is pointing to a QWORD boundary */
+		head_addr = ALIGN(head_addr, 8);
+	}
+
+	tail_addr = reg_state[CTX_RING_TAIL+1] & TAIL_ADDR;
+
+	if (head_addr > tail_addr)
+		head_addr = tail_addr;
+	else if (head_addr >= ringbuf->size)
+		head_addr = 0;
+
+	head &= ~HEAD_ADDR;
+	head |= (head_addr & HEAD_ADDR);
+
+	/* Restore head */
+	reg_state[CTX_RING_HEAD+1] = head;
+	I915_WRITE_HEAD(engine, head);
+
+	/* set head */
+	ringbuf->head = head;
+	ringbuf->last_retired_head = -1;
+	intel_ring_update_space(ringbuf);
+
+	if (state->req)
+		intel_execlists_resubmit(engine);
+
+	return 0;
+}
+
 static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	int ret, i;
@@ -2056,6 +2149,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 
 	/* engine reset supporting functions */
 	engine->save = gen8_engine_state_save;
+	engine->start = gen8_engine_start;
 
 	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1)) {
 		engine->irq_seqno_barrier = bxt_a_seqno_barrier;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index daf2727..55cb0b5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -92,6 +92,13 @@ struct intel_ring_hangcheck {
 	enum intel_ring_hangcheck_action action;
 	int deadlock;
 	u32 instdone[I915_NUM_INSTDONE_REG];
+
+	/*
+	 * Last recorded ring head index.
+	 * This is only ever a ring index where as active
+	 * head may be a graphics address in a ring buffer
+	 */
+	u32 last_head;
 };
 
 struct intel_ringbuffer {
@@ -213,6 +220,8 @@ struct intel_engine_cs {
 	/* engine reset supporting functions */
 	int (*save)(struct intel_engine_cs *engine,
 		    struct intel_engine_cs_state *state);
+	int (*start)(struct intel_engine_cs *engine,
+		     struct intel_engine_cs_state *state);
 
 	/* GEN8 signal/wait table - never trust comments!
 	 *	  signal to	signal to    signal to   signal to      signal to
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-06-17  7:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 01/15] drm/i915: Update i915.reset to handle engine resets Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset Arun Siluvery
2016-06-17  7:35   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 03/15] drm/i915: Reinstate hang recovery work queue Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 04/15] drm/i915/tdr: Modify error handler for per engine hang recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset Arun Siluvery
2016-06-17  7:39   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 06/15] drm/i915/tdr: Capture engine state before reset Arun Siluvery
2016-06-17  7:09 ` Arun Siluvery [this message]
2016-06-17  7:32   ` [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset Chris Wilson
2016-06-17  7:09 ` [PATCH v2 08/15] drm/i915/tdr: Add support for per engine reset recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 09/15] drm/i915: Skip reset request if there is one already Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
2016-06-17  7:41   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
2016-06-17  7:42   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 12/15] drm/i915/tdr: Add engine reset count to error state Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs Arun Siluvery
2016-06-17  7:21   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 14/15] drm/i915/tdr: Enable Engine reset and recovery support Arun Siluvery
2016-06-17  7:09 ` [ONLY FOR BAT v2 15/15] drm/i915: Disable GuC submission for testing Engine reset patches Arun Siluvery
2016-06-17  7:33 ` ✗ Ro.CI.BAT: failure for Execlist based Engine reset and recovery Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1466147355-4635-8-git-send-email-arun.siluvery@linux.intel.com \
    --to=arun.siluvery@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tomas.elf@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.