All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Subject: [PATCH 01/17] drm/i915/execlists: Always clear pending&inflight requests on reset
Date: Tue, 30 Jul 2019 14:30:19 +0100	[thread overview]
Message-ID: <20190730133035.1977-2-chris@chris-wilson.co.uk> (raw)
In-Reply-To: <20190730133035.1977-1-chris@chris-wilson.co.uk>

If we skip the reset as we found the engine inactive at the time of the
reset, we still need to clear the residual inflight & pending request
bookkeeping to reflect the current state of HW.

Otherwise, we may end up stuck in a loop like:

<7> [416.490346] hangcheck rcs0
<7> [416.490371] hangcheck 	Awake? 1
<7> [416.490376] hangcheck 	Hangcheck: 8003 ms ago
<7> [416.490380] hangcheck 	Reset count: 0 (global 0)
<7> [416.490383] hangcheck 	Requests:
<7> [416.491210] hangcheck 	RING_START: 0x0017b000
<7> [416.491983] hangcheck 	RING_HEAD:  0x00000048
<7> [416.491992] hangcheck 	RING_TAIL:  0x00000048
<7> [416.492006] hangcheck 	RING_CTL:   0x00000000
<7> [416.492037] hangcheck 	RING_MODE:  0x00000200 [idle]
<7> [416.492044] hangcheck 	RING_IMR: 00000000
<7> [416.492809] hangcheck 	ACTHD:  0x00000000_9ca00048
<7> [416.492824] hangcheck 	BBADDR: 0x00000000_00001004
<7> [416.492838] hangcheck 	DMA_FADDR: 0x00000000_00000000
<7> [416.492845] hangcheck 	IPEIR: 0x00000000
<7> [416.492852] hangcheck 	IPEHR: 0x00000000
<7> [416.492863] hangcheck 	Execlist status: 0x00018001 00000000, entries 12
<7> [416.492869] hangcheck 	Execlist CSB read 1, write 1, tasklet queued? no (enabled)
<7> [416.492938] hangcheck 		Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq:  20ffa:16fd6!+  prio=-4094 @ 8307ms: signaled
<7> [416.492972] hangcheck 		Queue priority hint: -4093
<7> [416.492979] hangcheck 		Q  20ffa:16fd8-  prio=-4093 @ 8307ms: [i915]
<7> [416.492985] hangcheck 		Q  20ffa:16fda  prio=-4094 @ 8307ms: [i915]
<7> [416.492990] hangcheck 		Q  20ffa:16fdc  prio=-4094 @ 8307ms: [i915]
<7> [416.492996] hangcheck 		Q  20ffa:16fde  prio=-4094 @ 8307ms: [i915]
<7> [416.493001] hangcheck 		Q  20ffa:16fe0  prio=-4094 @ 8307ms: [i915]
<7> [416.493007] hangcheck 		Q  20ffa:16fe2  prio=-4094 @ 8307ms: [i915]
<7> [416.493013] hangcheck 		Q  20ffa:16fe4  prio=-4094 @ 8307ms: [i915]
<7> [416.493021] hangcheck 		...skipping 21 queued requests...
<7> [416.493027] hangcheck 		Q  20ffa:17010  prio=-4094 @ 8307ms: [i915]
<7> [416.493081] hangcheck HWSP:
<7> [416.493089] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [416.493094] hangcheck *
<7> [416.493100] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
<7> [416.493106] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
<7> [416.493111] hangcheck *
<7> [416.493117] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001
<7> [416.493123] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [416.493127] hangcheck *
<7> [416.493132] hangcheck Idle? no
<6> [416.512124] i915 0000:00:02.0: GPU HANG: ecode 11:0:0x00000000, hang on rcs0
<6> [416.512205] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6> [416.512207] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6> [416.512208] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6> [416.512210] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6> [416.512212] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<5> [416.513602] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
<7> [424.489258] hangcheck rcs0
<7> [424.489263] hangcheck 	Awake? 1
<7> [424.489267] hangcheck 	Hangcheck: 5954 ms ago
<7> [424.489271] hangcheck 	Reset count: 1 (global 0)
<7> [424.489274] hangcheck 	Requests:
<7> [424.490128] hangcheck 	RING_START: 0x00000000
<7> [424.490870] hangcheck 	RING_HEAD:  0x00000000
<7> [424.490877] hangcheck 	RING_TAIL:  0x00000000
<7> [424.490887] hangcheck 	RING_CTL:   0x00000000
<7> [424.490897] hangcheck 	RING_MODE:  0x00000200 [idle]
<7> [424.490904] hangcheck 	RING_IMR: 00000000
<7> [424.490917] hangcheck 	ACTHD:  0x00000000_00000000
<7> [424.490930] hangcheck 	BBADDR: 0x00000000_00000000
<7> [424.490943] hangcheck 	DMA_FADDR: 0x00000000_00000000
<7> [424.490950] hangcheck 	IPEIR: 0x00000000
<7> [424.490956] hangcheck 	IPEHR: 0x00000000
<7> [424.490968] hangcheck 	Execlist status: 0x00000001 00000000, entries 12
<7> [424.490972] hangcheck 	Execlist CSB read 11, write 11, tasklet queued? no (enabled)
<7> [424.490983] hangcheck 		Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq:  20ffa:16fd6!+  prio=-4094 @ 16305ms: signaled
<7> [424.490989] hangcheck 		Queue priority hint: -4093
<7> [424.490996] hangcheck 		Q  20ffa:16fd8-  prio=-4093 @ 16305ms: [i915]
<7> [424.491001] hangcheck 		Q  20ffa:16fda  prio=-4094 @ 16305ms: [i915]
<7> [424.491006] hangcheck 		Q  20ffa:16fdc  prio=-4094 @ 16305ms: [i915]
<7> [424.491011] hangcheck 		Q  20ffa:16fde  prio=-4094 @ 16305ms: [i915]
<7> [424.491016] hangcheck 		Q  20ffa:16fe0  prio=-4094 @ 16305ms: [i915]
<7> [424.491022] hangcheck 		Q  20ffa:16fe2  prio=-4094 @ 16305ms: [i915]
<7> [424.491048] hangcheck 		Q  20ffa:16fe4  prio=-4094 @ 16305ms: [i915]
<7> [424.491057] hangcheck 		...skipping 21 queued requests...
<7> [424.491063] hangcheck 		Q  20ffa:17010  prio=-4094 @ 16305ms: [i915]
<7> [424.491095] hangcheck HWSP:
<7> [424.491102] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [424.491106] hangcheck *
<7> [424.491113] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
<7> [424.491118] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
<7> [424.491122] hangcheck *
<7> [424.491127] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000b
<7> [424.491133] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [424.491136] hangcheck *
<7> [424.491141] hangcheck Idle? no
<5> [424.491834] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

Where not having cleared the pending array on reset, it persists
indefinitely.

Fixes: fff8102aaed5 ("drm/i915/execlists: Process interrupted context on reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 4d7c4d0dbf75..86dd1eddceac 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2283,18 +2283,6 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	GEM_BUG_ON(i915_active_is_idle(&ce->active));
 	GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
 	rq = active_request(rq);
-
-	/*
-	 * Catch up with any missed context-switch interrupts.
-	 *
-	 * Ideally we would just read the remaining CSB entries now that we
-	 * know the gpu is idle. However, the CSB registers are sometimes^W
-	 * often trashed across a GPU reset! Instead we have to rely on
-	 * guessing the missed context-switch events by looking at what
-	 * requests were completed.
-	 */
-	execlists_cancel_port_requests(execlists);
-
 	if (!rq) {
 		ce->ring->head = ce->ring->tail;
 		goto out_replay;
@@ -2356,6 +2344,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 
 unwind:
 	/* Push back any incomplete requests for replay after the reset. */
+	execlists_cancel_port_requests(execlists);
 	__unwind_incomplete_requests(engine);
 }
 
-- 
2.22.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-07-30 13:30 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-30 13:30 Quick and dirty intel_gt_pm.c rebase Chris Wilson
2019-07-30 13:30 ` Chris Wilson [this message]
2019-08-01  8:08   ` [PATCH 01/17] drm/i915/execlists: Always clear pending&inflight requests on reset Andi Shyti
2019-08-01  8:13     ` Chris Wilson
2019-07-30 13:30 ` [PATCH 02/17] drm/i915: Allow sharing the idle-barrier from other kernel requests Chris Wilson
2019-07-30 13:30 ` [PATCH 03/17] drm/i915: Flush extra hard after writing relocations through the GTT Chris Wilson
2019-07-30 13:30 ` [PATCH 04/17] drm/i915: Use drm_i915_private directly from drv_get_drvdata() Chris Wilson
2019-08-05 17:05   ` Andi Shyti
2019-08-05 18:01     ` Chris Wilson
2019-07-30 13:30 ` [PATCH 05/17] drm/i915/gem: Make caps.scheduler static Chris Wilson
2019-08-05 17:08   ` Andi Shyti
2019-08-05 18:07     ` Chris Wilson
2019-07-30 13:30 ` [PATCH 06/17] drm/i915: Move aliasing_ppgtt underneath its i915_ggtt Chris Wilson
2019-07-30 13:58   ` Tvrtko Ursulin
2019-07-30 14:12     ` Chris Wilson
2019-07-30 13:30 ` [PATCH 07/17] drm/i915/gt: Provide a local intel_context.vm Chris Wilson
2019-07-30 13:30 ` [PATCH 08/17] drm/i915: Remove lrc default desc from GEM context Chris Wilson
2019-07-30 22:57   ` Kumar Valsan, Prathap
2019-07-30 13:30 ` [PATCH 09/17] drm/i915: Push the ring creation flags to the backend Chris Wilson
2019-08-05 17:08   ` Andi Shyti
2019-09-02 13:59     ` Tvrtko Ursulin
2019-09-06 18:18       ` Chris Wilson
2019-09-02 14:17   ` Tvrtko Ursulin
2019-07-30 13:30 ` [PATCH 10/17] drm/i915: Hide unshrinkable context objects from the shrinker Chris Wilson
2019-08-02 16:01   ` Matthew Auld
2019-07-30 13:30 ` [PATCH 11/17] drm/i915/gt: Move the [class][inst] lookup for engines onto the GT Chris Wilson
2019-07-30 13:30 ` [PATCH 12/17] drm/i915: Use intel_engine_lookup_user for probing HAS_BSD etc Chris Wilson
2019-08-05 17:08   ` Andi Shyti
2019-07-30 13:30 ` [PATCH 13/17] drm/i915: Isolate i915_getparam_ioctl() Chris Wilson
2019-08-05 17:09   ` Andi Shyti
2019-07-30 13:30 ` [PATCH 14/17] drm/i915: Only include active engines in the capture state Chris Wilson
2019-07-30 13:30 ` [PATCH 15/17] drm/i915: Flush the freed object list on file close Chris Wilson
2019-08-02 17:00   ` Matthew Auld
2019-08-02 19:46     ` Chris Wilson
2019-07-30 13:30 ` [PATCH 16/17] drm/i915: Make debugfs/per_file_stats scale better Chris Wilson
2019-07-30 13:30 ` [PATCH 17/17] drm/i915/gt: Extract GT runtime power management from intel_pm.c Chris Wilson
2019-07-30 14:00 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/17] drm/i915/execlists: Always clear pending&inflight requests on reset Patchwork
2019-07-30 14:09 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-07-30 14:38 ` ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190730133035.1977-2-chris@chris-wilson.co.uk \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.