* [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
@ 2018-04-25 10:59 Chris Wilson
2018-04-25 11:19 ` Mika Kuoppala
2018-04-25 12:29 ` ✗ Fi.CI.BAT: failure for " Patchwork
0 siblings, 2 replies; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 10:59 UTC (permalink / raw)
To: intel-gfx
When WaIdleLiteRestore isn't enough.
Fixes an odd hang on gen8 (both bsw and bdw) during gem_ctx_switch,
where by all intents and purposes if we trigger a lite-restore as it is
processing the pipecontrol flushes, the RING is restored to the oword
following the command and tries to execute the destination address for
the pipecontrol rather than a valid command. With the theory being that
it doesn't like RING_HEAD being within a cacheline of the restored
RING_TAIL, we can evade that issue by not triggering a lite-restore if
we know we are inside the last request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 029901a8fa38..5c50263e45d3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -639,6 +639,19 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
if (port_count(&port[1]))
goto unlock;
+ /*
+ * Skip invoking a lite-restore if we know we have already
+ * started processing the last request queued to HW. This
+ * prevents a mystery *unrecoverable* hang on gen8, maybe
+ * related to updating TAIL within a cacheline of HEAD? (As
+ * there is still a delay between submitting the ESLP update
+ * and HW responding, we may still encounter whatever condition
+ * trips up, just less often.)
+ */
+ if (i915_seqno_passed(intel_engine_get_seqno(engine),
+ last->global_seqno - 1))
+ goto unlock;
+
/*
* WaIdleLiteRestore:bdw,skl
* Apply the wa NOOPs to prevent
--
2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
2018-04-25 10:59 [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request Chris Wilson
@ 2018-04-25 11:19 ` Mika Kuoppala
2018-04-25 11:23 ` Chris Wilson
2018-04-25 12:29 ` ✗ Fi.CI.BAT: failure for " Patchwork
1 sibling, 1 reply; 7+ messages in thread
From: Mika Kuoppala @ 2018-04-25 11:19 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> When WaIdleLiteRestore isn't enough.
>
> Fixes an odd hang on gen8 (both bsw and bdw) during gem_ctx_switch,
Do you have a testcase name? (testcase tag would be nice too)
> where by all intents and purposes if we trigger a lite-restore as it is
> processing the pipecontrol flushes, the RING is restored to the oword
s/RING/RING_HEAD ?
> following the command and tries to execute the destination address for
> the pipecontrol rather than a valid command. With the theory being that
> it doesn't like RING_HEAD being within a cacheline of the restored
> RING_TAIL, we can evade that issue by not triggering a lite-restore if
> we know we are inside the last request.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 029901a8fa38..5c50263e45d3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -639,6 +639,19 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> if (port_count(&port[1]))
> goto unlock;
>
> + /*
> + * Skip invoking a lite-restore if we know we have already
> + * started processing the last request queued to HW. This
> + * prevents a mystery *unrecoverable* hang on gen8, maybe
> + * related to updating TAIL within a cacheline of HEAD? (As
Did you try with WA_TAIL_DWORDS 16?
-Mika
> + * there is still a delay between submitting the ESLP update
> + * and HW responding, we may still encounter whatever condition
> + * trips up, just less often.)
> + */
> + if (i915_seqno_passed(intel_engine_get_seqno(engine),
> + last->global_seqno - 1))
> + goto unlock;
> +
> /*
> * WaIdleLiteRestore:bdw,skl
> * Apply the wa NOOPs to prevent
> --
> 2.17.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
2018-04-25 11:19 ` Mika Kuoppala
@ 2018-04-25 11:23 ` Chris Wilson
2018-04-25 11:31 ` Chris Wilson
0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 11:23 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Mika Kuoppala (2018-04-25 12:19:08)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > When WaIdleLiteRestore isn't enough.
> >
> > Fixes an odd hang on gen8 (both bsw and bdw) during gem_ctx_switch,
>
> Do you have a testcase name? (testcase tag would be nice too)
Just keep running gem_ctx_switch. Switching between the static set of
contexts, no rebind pressure whatsoever.
> > where by all intents and purposes if we trigger a lite-restore as it is
> > processing the pipecontrol flushes, the RING is restored to the oword
>
> s/RING/RING_HEAD ?
>
> > following the command and tries to execute the destination address for
> > the pipecontrol rather than a valid command. With the theory being that
> > it doesn't like RING_HEAD being within a cacheline of the restored
> > RING_TAIL, we can evade that issue by not triggering a lite-restore if
> > we know we are inside the last request.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> > drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 029901a8fa38..5c50263e45d3 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -639,6 +639,19 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > if (port_count(&port[1]))
> > goto unlock;
> >
> > + /*
> > + * Skip invoking a lite-restore if we know we have already
> > + * started processing the last request queued to HW. This
> > + * prevents a mystery *unrecoverable* hang on gen8, maybe
> > + * related to updating TAIL within a cacheline of HEAD? (As
>
> Did you try with WA_TAIL_DWORDS 16?
Sure can try, but the error state doesn't indicate TAIL==HEAD as would
be the issue with WaIdleLiteRestore (restoring to an idle ring wouldn't
generate the arbitration event, so no more context switches).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
2018-04-25 11:23 ` Chris Wilson
@ 2018-04-25 11:31 ` Chris Wilson
2018-04-25 12:45 ` Chris Wilson
0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 11:31 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Chris Wilson (2018-04-25 12:23:30)
> Quoting Mika Kuoppala (2018-04-25 12:19:08)
> > Did you try with WA_TAIL_DWORDS 16?
>
> Sure can try, but the error state doesn't indicate TAIL==HEAD as would
> be the issue with WaIdleLiteRestore (restoring to an idle ring wouldn't
> generate the arbitration event, so no more context switches).
Watch https://patchwork.freedesktop.org/series/42284/
Actually surviving on my bdw
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* ✗ Fi.CI.BAT: failure for drm/i915/execlists: Skip lite restore on the currently executing request
2018-04-25 10:59 [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request Chris Wilson
2018-04-25 11:19 ` Mika Kuoppala
@ 2018-04-25 12:29 ` Patchwork
1 sibling, 0 replies; 7+ messages in thread
From: Patchwork @ 2018-04-25 12:29 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915/execlists: Skip lite restore on the currently executing request
URL : https://patchwork.freedesktop.org/series/42281/
State : failure
== Summary ==
= CI Bug Log - changes from CI_DRM_4093 -> Patchwork_8796 =
== Summary - FAILURE ==
Serious unknown changes coming with Patchwork_8796 absolutely need to be
verified manually.
If you think the reported changes have nothing to do with the changes
introduced in Patchwork_8796, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.
External URL: https://patchwork.freedesktop.org/api/1.0/series/42281/revisions/1/mbox/
== Possible new issues ==
Here are the unknown changes that may have been introduced in Patchwork_8796:
=== IGT changes ===
==== Possible regressions ====
igt@prime_vgem@basic-fence-flip:
fi-elk-e7500: PASS -> FAIL
== Known issues ==
Here are the changes found in Patchwork_8796 that come from known issues:
=== IGT changes ===
==== Issues hit ====
igt@gem_exec_suspend@basic-s3:
fi-ivb-3520m: PASS -> DMESG-WARN (fdo#106084)
igt@kms_pipe_crc_basic@hang-read-crc-pipe-c:
fi-glk-j4005: PASS -> DMESG-WARN (fdo#106097) +2
==== Possible fixes ====
igt@kms_chamelium@dp-edid-read:
fi-kbl-7500u: FAIL (fdo#103841) -> PASS
igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
fi-bxt-dsi: INCOMPLETE (fdo#103927) -> PASS
igt@prime_vgem@basic-fence-flip:
fi-glk-j4005: DMESG-WARN (fdo#106235) -> PASS
fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
fdo#106084 https://bugs.freedesktop.org/show_bug.cgi?id=106084
fdo#106097 https://bugs.freedesktop.org/show_bug.cgi?id=106097
fdo#106235 https://bugs.freedesktop.org/show_bug.cgi?id=106235
== Participating hosts (38 -> 34) ==
Missing (4): fi-ctg-p8600 fi-ilk-m540 fi-skl-6700hq fi-hsw-4200u
== Build changes ==
* Linux: CI_DRM_4093 -> Patchwork_8796
CI_DRM_4093: a47694a9464b96e17b12f0bdb1085d10a61c57e9 @ git://anongit.freedesktop.org/gfx-ci/linux
IGT_4446: e5e8dafc991ee922ec159491c680caff0cfe9235 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
Patchwork_8796: 68be8df41b39d101b29d20347ce109eda0273029 @ git://anongit.freedesktop.org/gfx-ci/linux
piglit_4446: a2f486679f467cd6e82578384f56d4aabaa8cf2e @ git://anongit.freedesktop.org/piglit
== Linux commits ==
68be8df41b39 drm/i915/execlists: Skip lite restore on the currently executing request
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8796/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
2018-04-25 11:31 ` Chris Wilson
@ 2018-04-25 12:45 ` Chris Wilson
2018-04-25 13:14 ` Chris Wilson
0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 12:45 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Chris Wilson (2018-04-25 12:31:29)
> Quoting Chris Wilson (2018-04-25 12:23:30)
> > Quoting Mika Kuoppala (2018-04-25 12:19:08)
> > > Did you try with WA_TAIL_DWORDS 16?
> >
> > Sure can try, but the error state doesn't indicate TAIL==HEAD as would
> > be the issue with WaIdleLiteRestore (restoring to an idle ring wouldn't
> > generate the arbitration event, so no more context switches).
>
> Watch https://patchwork.freedesktop.org/series/42284/
>
> Actually surviving on my bdw
And what appears to work just as well is
@@ -1387,7 +1387,7 @@ static int execlists_request_alloc(struct i915_request *request)
*/
request->reserved_space += EXECLISTS_REQUEST_SIZE;
- ret = intel_ring_wait_for_space(request->ring, request->reserved_space);
+ ret = intel_ring_cacheline_align(request);
if (ret)
return ret;
If that doesn't hint towards some weirdness in HW, I don't know what
would ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
2018-04-25 12:45 ` Chris Wilson
@ 2018-04-25 13:14 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 13:14 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Chris Wilson (2018-04-25 13:45:45)
> Quoting Chris Wilson (2018-04-25 12:31:29)
> > Quoting Chris Wilson (2018-04-25 12:23:30)
> > > Quoting Mika Kuoppala (2018-04-25 12:19:08)
> > > > Did you try with WA_TAIL_DWORDS 16?
> > >
> > > Sure can try, but the error state doesn't indicate TAIL==HEAD as would
> > > be the issue with WaIdleLiteRestore (restoring to an idle ring wouldn't
> > > generate the arbitration event, so no more context switches).
> >
> > Watch https://patchwork.freedesktop.org/series/42284/
> >
> > Actually surviving on my bdw
>
> And what appears to work just as well is
>
> @@ -1387,7 +1387,7 @@ static int execlists_request_alloc(struct i915_request *request)
> */
> request->reserved_space += EXECLISTS_REQUEST_SIZE;
>
> - ret = intel_ring_wait_for_space(request->ring, request->reserved_space);
> + ret = intel_ring_cacheline_align(request);
> if (ret)
> return ret;
>
> If that doesn't hint towards some weirdness in HW, I don't know what
> would ;)
Fwiw, I've now hit GPU hangs with WA_TAIL_DWORDS set to 16 and with the
cacheline_align; scratch that theory.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-04-25 13:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 10:59 [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request Chris Wilson
2018-04-25 11:19 ` Mika Kuoppala
2018-04-25 11:23 ` Chris Wilson
2018-04-25 11:31 ` Chris Wilson
2018-04-25 12:45 ` Chris Wilson
2018-04-25 13:14 ` Chris Wilson
2018-04-25 12:29 ` ✗ Fi.CI.BAT: failure for " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.