All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
@ 2018-04-25 10:59 Chris Wilson
  2018-04-25 11:19 ` Mika Kuoppala
  2018-04-25 12:29 ` ✗ Fi.CI.BAT: failure for " Patchwork
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 10:59 UTC (permalink / raw)
  To: intel-gfx

When WaIdleLiteRestore isn't enough.

Fixes an odd hang on gen8 (both bsw and bdw) during gem_ctx_switch,
where by all intents and purposes if we trigger a lite-restore as it is
processing the pipecontrol flushes, the RING is restored to the oword
following the command and tries to execute the destination address for
the pipecontrol rather than a valid command. With the theory being that
it doesn't like RING_HEAD being within a cacheline of the restored
RING_TAIL, we can evade that issue by not triggering a lite-restore if
we know we are inside the last request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 029901a8fa38..5c50263e45d3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -639,6 +639,19 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		if (port_count(&port[1]))
 			goto unlock;
 
+		/*
+		 * Skip invoking a lite-restore if we know we have already
+		 * started processing the last request queued to HW. This
+		 * prevents a mystery *unrecoverable* hang on gen8, maybe
+		 * related to updating TAIL within a cacheline of HEAD? (As
+		 * there is still a delay between submitting the ESLP update
+		 * and HW responding, we may still encounter whatever condition
+		 * trips up, just less often.)
+		 */
+		if (i915_seqno_passed(intel_engine_get_seqno(engine),
+				      last->global_seqno - 1))
+			goto unlock;
+
 		/*
 		 * WaIdleLiteRestore:bdw,skl
 		 * Apply the wa NOOPs to prevent
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
  2018-04-25 10:59 [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request Chris Wilson
@ 2018-04-25 11:19 ` Mika Kuoppala
  2018-04-25 11:23   ` Chris Wilson
  2018-04-25 12:29 ` ✗ Fi.CI.BAT: failure for " Patchwork
  1 sibling, 1 reply; 7+ messages in thread
From: Mika Kuoppala @ 2018-04-25 11:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> When WaIdleLiteRestore isn't enough.
>
> Fixes an odd hang on gen8 (both bsw and bdw) during gem_ctx_switch,

Do you have a testcase name? (testcase tag would be nice too)

> where by all intents and purposes if we trigger a lite-restore as it is
> processing the pipecontrol flushes, the RING is restored to the oword

s/RING/RING_HEAD ?

> following the command and tries to execute the destination address for
> the pipecontrol rather than a valid command. With the theory being that
> it doesn't like RING_HEAD being within a cacheline of the restored
> RING_TAIL, we can evade that issue by not triggering a lite-restore if
> we know we are inside the last request.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 029901a8fa38..5c50263e45d3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -639,6 +639,19 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  		if (port_count(&port[1]))
>  			goto unlock;
>  
> +		/*
> +		 * Skip invoking a lite-restore if we know we have already
> +		 * started processing the last request queued to HW. This
> +		 * prevents a mystery *unrecoverable* hang on gen8, maybe
> +		 * related to updating TAIL within a cacheline of HEAD? (As

Did you try with WA_TAIL_DWORDS 16?

-Mika

> +		 * there is still a delay between submitting the ESLP update
> +		 * and HW responding, we may still encounter whatever condition
> +		 * trips up, just less often.)
> +		 */
> +		if (i915_seqno_passed(intel_engine_get_seqno(engine),
> +				      last->global_seqno - 1))
> +			goto unlock;
> +
>  		/*
>  		 * WaIdleLiteRestore:bdw,skl
>  		 * Apply the wa NOOPs to prevent
> -- 
> 2.17.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
  2018-04-25 11:19 ` Mika Kuoppala
@ 2018-04-25 11:23   ` Chris Wilson
  2018-04-25 11:31     ` Chris Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 11:23 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-04-25 12:19:08)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > When WaIdleLiteRestore isn't enough.
> >
> > Fixes an odd hang on gen8 (both bsw and bdw) during gem_ctx_switch,
> 
> Do you have a testcase name? (testcase tag would be nice too)

Just keep running gem_ctx_switch. Switching between the static set of
contexts, no rebind pressure whatsoever.

> > where by all intents and purposes if we trigger a lite-restore as it is
> > processing the pipecontrol flushes, the RING is restored to the oword
> 
> s/RING/RING_HEAD ?
> 
> > following the command and tries to execute the destination address for
> > the pipecontrol rather than a valid command. With the theory being that
> > it doesn't like RING_HEAD being within a cacheline of the restored
> > RING_TAIL, we can evade that issue by not triggering a lite-restore if
> > we know we are inside the last request.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 029901a8fa38..5c50263e45d3 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -639,6 +639,19 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >               if (port_count(&port[1]))
> >                       goto unlock;
> >  
> > +             /*
> > +              * Skip invoking a lite-restore if we know we have already
> > +              * started processing the last request queued to HW. This
> > +              * prevents a mystery *unrecoverable* hang on gen8, maybe
> > +              * related to updating TAIL within a cacheline of HEAD? (As
> 
> Did you try with WA_TAIL_DWORDS 16?

Sure can try, but the error state doesn't indicate TAIL==HEAD as would
be the issue with WaIdleLiteRestore (restoring to an idle ring wouldn't
generate the arbitration event, so no more context switches).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
  2018-04-25 11:23   ` Chris Wilson
@ 2018-04-25 11:31     ` Chris Wilson
  2018-04-25 12:45       ` Chris Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 11:31 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Chris Wilson (2018-04-25 12:23:30)
> Quoting Mika Kuoppala (2018-04-25 12:19:08)
> > Did you try with WA_TAIL_DWORDS 16?
> 
> Sure can try, but the error state doesn't indicate TAIL==HEAD as would
> be the issue with WaIdleLiteRestore (restoring to an idle ring wouldn't
> generate the arbitration event, so no more context switches).

Watch https://patchwork.freedesktop.org/series/42284/

Actually surviving on my bdw
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915/execlists: Skip lite restore on the currently executing request
  2018-04-25 10:59 [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request Chris Wilson
  2018-04-25 11:19 ` Mika Kuoppala
@ 2018-04-25 12:29 ` Patchwork
  1 sibling, 0 replies; 7+ messages in thread
From: Patchwork @ 2018-04-25 12:29 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/execlists: Skip lite restore on the currently executing request
URL   : https://patchwork.freedesktop.org/series/42281/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4093 -> Patchwork_8796 =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_8796 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_8796, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42281/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_8796:

  === IGT changes ===

    ==== Possible regressions ====

    igt@prime_vgem@basic-fence-flip:
      fi-elk-e7500:       PASS -> FAIL

    
== Known issues ==

  Here are the changes found in Patchwork_8796 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_exec_suspend@basic-s3:
      fi-ivb-3520m:       PASS -> DMESG-WARN (fdo#106084)

    igt@kms_pipe_crc_basic@hang-read-crc-pipe-c:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106097) +2

    
    ==== Possible fixes ====

    igt@kms_chamelium@dp-edid-read:
      fi-kbl-7500u:       FAIL (fdo#103841) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-bxt-dsi:         INCOMPLETE (fdo#103927) -> PASS

    igt@prime_vgem@basic-fence-flip:
      fi-glk-j4005:       DMESG-WARN (fdo#106235) -> PASS

    
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#106084 https://bugs.freedesktop.org/show_bug.cgi?id=106084
  fdo#106097 https://bugs.freedesktop.org/show_bug.cgi?id=106097
  fdo#106235 https://bugs.freedesktop.org/show_bug.cgi?id=106235


== Participating hosts (38 -> 34) ==

  Missing    (4): fi-ctg-p8600 fi-ilk-m540 fi-skl-6700hq fi-hsw-4200u 


== Build changes ==

    * Linux: CI_DRM_4093 -> Patchwork_8796

  CI_DRM_4093: a47694a9464b96e17b12f0bdb1085d10a61c57e9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4446: e5e8dafc991ee922ec159491c680caff0cfe9235 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8796: 68be8df41b39d101b29d20347ce109eda0273029 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4446: a2f486679f467cd6e82578384f56d4aabaa8cf2e @ git://anongit.freedesktop.org/piglit


== Linux commits ==

68be8df41b39 drm/i915/execlists: Skip lite restore on the currently executing request

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8796/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
  2018-04-25 11:31     ` Chris Wilson
@ 2018-04-25 12:45       ` Chris Wilson
  2018-04-25 13:14         ` Chris Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 12:45 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Chris Wilson (2018-04-25 12:31:29)
> Quoting Chris Wilson (2018-04-25 12:23:30)
> > Quoting Mika Kuoppala (2018-04-25 12:19:08)
> > > Did you try with WA_TAIL_DWORDS 16?
> > 
> > Sure can try, but the error state doesn't indicate TAIL==HEAD as would
> > be the issue with WaIdleLiteRestore (restoring to an idle ring wouldn't
> > generate the arbitration event, so no more context switches).
> 
> Watch https://patchwork.freedesktop.org/series/42284/
> 
> Actually surviving on my bdw

And what appears to work just as well is

@@ -1387,7 +1387,7 @@  static int execlists_request_alloc(struct i915_request *request)
 	 */
 	request->reserved_space += EXECLISTS_REQUEST_SIZE;
 
-	ret = intel_ring_wait_for_space(request->ring, request->reserved_space);
+	ret = intel_ring_cacheline_align(request);
 	if (ret)
 		return ret;

If that doesn't hint towards some weirdness in HW, I don't know what
would ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request
  2018-04-25 12:45       ` Chris Wilson
@ 2018-04-25 13:14         ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2018-04-25 13:14 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Chris Wilson (2018-04-25 13:45:45)
> Quoting Chris Wilson (2018-04-25 12:31:29)
> > Quoting Chris Wilson (2018-04-25 12:23:30)
> > > Quoting Mika Kuoppala (2018-04-25 12:19:08)
> > > > Did you try with WA_TAIL_DWORDS 16?
> > > 
> > > Sure can try, but the error state doesn't indicate TAIL==HEAD as would
> > > be the issue with WaIdleLiteRestore (restoring to an idle ring wouldn't
> > > generate the arbitration event, so no more context switches).
> > 
> > Watch https://patchwork.freedesktop.org/series/42284/
> > 
> > Actually surviving on my bdw
> 
> And what appears to work just as well is
> 
> @@ -1387,7 +1387,7 @@  static int execlists_request_alloc(struct i915_request *request)
>          */
>         request->reserved_space += EXECLISTS_REQUEST_SIZE;
>  
> -       ret = intel_ring_wait_for_space(request->ring, request->reserved_space);
> +       ret = intel_ring_cacheline_align(request);
>         if (ret)
>                 return ret;
> 
> If that doesn't hint towards some weirdness in HW, I don't know what
> would ;)

Fwiw, I've now hit GPU hangs with WA_TAIL_DWORDS set to 16 and with the
cacheline_align; scratch that theory.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-25 13:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 10:59 [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request Chris Wilson
2018-04-25 11:19 ` Mika Kuoppala
2018-04-25 11:23   ` Chris Wilson
2018-04-25 11:31     ` Chris Wilson
2018-04-25 12:45       ` Chris Wilson
2018-04-25 13:14         ` Chris Wilson
2018-04-25 12:29 ` ✗ Fi.CI.BAT: failure for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.