[PATCH 1/3] drm/i915: Always sanity check engine state upon idling

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3] drm/i915: Always sanity check engine state upon idling
@ 2017-08-26 11:09 Chris Wilson
  2017-08-26 11:09 ` [PATCH 2/3] drm/i915: Clear wedged status upon resume Chris Wilson
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Chris Wilson @ 2017-08-26 11:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthias Kaehlcke

When we do a locked idle we know that afterwards all requests have been
completed and the engines have been cleared of tasks. For whatever
reason, this doesn't always happen and we may go into a suspend with
ELSP still full, and this causes an issue upon resume as we get very,
very confused.

If the engines refuse to idle, mark the device as wedged. In the process
we get rid of the maybe unused open-coded version of wait_for_engines
reported by Nick Desaulniers and Matthias Kaehlcke.

References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
---
 drivers/gpu/drm/i915/i915_gem.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ac02785fdaff..c1520c0d2084 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3371,24 +3371,12 @@ static int wait_for_timeline(struct i915_gem_timeline *tl, unsigned int flags)
 	return 0;
 }
 
-static int wait_for_engine(struct intel_engine_cs *engine, int timeout_ms)
-{
-	return wait_for(intel_engine_is_idle(engine), timeout_ms);
-}
-
 static int wait_for_engines(struct drm_i915_private *i915)
 {
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-
-	for_each_engine(engine, i915, id) {
-		if (GEM_WARN_ON(wait_for_engine(engine, 50))) {
-			i915_gem_set_wedged(i915);
-			return -EIO;
-		}
-
-		GEM_BUG_ON(intel_engine_get_seqno(engine) !=
-			   intel_engine_last_submit(engine));
+	if (wait_for(intel_engines_are_idle(i915), 50)) {
+		DRM_ERROR("Failed to idle engines, declaring wedged!\n");
+		i915_gem_set_wedged(i915);
+		return -EIO;
 	}
 
 	return 0;
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] drm/i915: Clear wedged status upon resume
  2017-08-26 11:09 [PATCH 1/3] drm/i915: Always sanity check engine state upon idling Chris Wilson
@ 2017-08-26 11:09 ` Chris Wilson
  2017-08-29 13:49   ` Mika Kuoppala
  2017-08-26 11:09 ` [PATCH 3/3] drm/i915: Discard the request queue if we fail to sleep before suspend Chris Wilson
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2017-08-26 11:09 UTC (permalink / raw)
  To: intel-gfx

When we wait up from suspend, the device has been powered down and
should come back afresh. We should be able to safely remove the wedged
status from the previous session and start afresh.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c1520c0d2084..9dc24b915aa7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4518,6 +4518,12 @@ static void assert_kernel_context_is_current(struct drm_i915_private *dev_priv)
 
 void i915_gem_sanitize(struct drm_i915_private *i915)
 {
+	if (i915_terminally_wedged(&i915->gpu_error)) {
+		mutex_lock(&i915->drm.struct_mutex);
+		i915_gem_unset_wedged(i915);
+		mutex_unlock(&i915->drm.struct_mutex);
+	}
+
 	/*
 	 * If we inherit context state from the BIOS or earlier occupants
 	 * of the GPU, the GPU may be in an inconsistent state when we
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] drm/i915: Discard the request queue if we fail to sleep before suspend
  2017-08-26 11:09 [PATCH 1/3] drm/i915: Always sanity check engine state upon idling Chris Wilson
  2017-08-26 11:09 ` [PATCH 2/3] drm/i915: Clear wedged status upon resume Chris Wilson
@ 2017-08-26 11:09 ` Chris Wilson
  2017-08-29 13:54   ` Mika Kuoppala
  2017-08-26 11:26 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Always sanity check engine state upon idling Patchwork
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2017-08-26 11:09 UTC (permalink / raw)
  To: intel-gfx

If we fail to clear the outstanding request queue before suspending,
mark those requests as lost.

References: https://bugs.freedesktop.org/show_bug.cgi?id=102037
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9dc24b915aa7..37fbc64d9ffe 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4585,7 +4585,8 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
 	 * reset the GPU back to its idle, low power state.
 	 */
 	WARN_ON(dev_priv->gt.awake);
-	WARN_ON(!intel_engines_are_idle(dev_priv));
+	if (WARN_ON(!intel_engines_are_idle(dev_priv)))
+		i915_gem_set_wedged(dev_priv); /* no hope, so reset everthing */
 
 	/*
 	 * Neither the BIOS, ourselves or any other kernel
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Always sanity check engine state upon idling
  2017-08-26 11:09 [PATCH 1/3] drm/i915: Always sanity check engine state upon idling Chris Wilson
  2017-08-26 11:09 ` [PATCH 2/3] drm/i915: Clear wedged status upon resume Chris Wilson
  2017-08-26 11:09 ` [PATCH 3/3] drm/i915: Discard the request queue if we fail to sleep before suspend Chris Wilson
@ 2017-08-26 11:26 ` Patchwork
  2017-08-26 12:21 ` ✗ Fi.CI.IGT: warning " Patchwork
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2017-08-26 11:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Always sanity check engine state upon idling
URL   : https://patchwork.freedesktop.org/series/29387/
State : success

== Summary ==

Series 29387v1 series starting with [1/3] drm/i915: Always sanity check engine state upon idling
https://patchwork.freedesktop.org/api/1.0/series/29387/revisions/1/mbox/

Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-atomic:
                pass       -> FAIL       (fi-snb-2600) fdo#100215 +1
Test kms_frontbuffer_tracking:
        Subgroup basic:
                dmesg-warn -> PASS       (fi-bdw-5557u) fdo#102410

fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#102410 https://bugs.freedesktop.org/show_bug.cgi?id=102410

fi-bdw-5557u     total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:452s
fi-bdw-gvtdvm    total:279  pass:265  dwarn:0   dfail:0   fail:0   skip:14  time:438s
fi-blb-e6850     total:279  pass:224  dwarn:1   dfail:0   fail:0   skip:54  time:366s
fi-bsw-n3050     total:279  pass:243  dwarn:0   dfail:0   fail:0   skip:36  time:550s
fi-bwr-2160      total:279  pass:184  dwarn:0   dfail:0   fail:0   skip:95  time:252s
fi-bxt-j4205     total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:524s
fi-byt-j1900     total:279  pass:254  dwarn:1   dfail:0   fail:0   skip:24  time:526s
fi-byt-n2820     total:279  pass:250  dwarn:1   dfail:0   fail:0   skip:28  time:516s
fi-elk-e7500     total:279  pass:230  dwarn:0   dfail:0   fail:0   skip:49  time:439s
fi-glk-2a        total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:609s
fi-hsw-4770      total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:450s
fi-hsw-4770r     total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:425s
fi-ilk-650       total:279  pass:229  dwarn:0   dfail:0   fail:0   skip:50  time:425s
fi-ivb-3520m     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:495s
fi-ivb-3770      total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:471s
fi-kbl-7500u     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:478s
fi-kbl-7560u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:595s
fi-kbl-r         total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:601s
fi-pnv-d510      total:279  pass:223  dwarn:1   dfail:0   fail:0   skip:55  time:516s
fi-skl-6260u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:470s
fi-skl-6700k     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:475s
fi-skl-6770hq    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:489s
fi-skl-gvtdvm    total:279  pass:266  dwarn:0   dfail:0   fail:0   skip:13  time:443s
fi-skl-x1585l    total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:482s
fi-snb-2520m     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:548s
fi-snb-2600      total:279  pass:249  dwarn:0   dfail:0   fail:1   skip:29  time:406s

8afd435a127714eca60bcd86e1856aa14d1eea9e drm-tip: 2017y-08m-26d-10h-07m-05s UTC integration manifest
bc9cf04a1e60 drm/i915: Discard the request queue if we fail to sleep before suspend
336e612acb85 drm/i915: Clear wedged status upon resume
628b21e06ed9 drm/i915: Always sanity check engine state upon idling

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5499/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ✗ Fi.CI.IGT: warning for series starting with [1/3] drm/i915: Always sanity check engine state upon idling
  2017-08-26 11:09 [PATCH 1/3] drm/i915: Always sanity check engine state upon idling Chris Wilson
                   ` (2 preceding siblings ...)
  2017-08-26 11:26 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Always sanity check engine state upon idling Patchwork
@ 2017-08-26 12:21 ` Patchwork
  2017-08-29 12:25 ` [PATCH 1/3] " Chris Wilson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2017-08-26 12:21 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Always sanity check engine state upon idling
URL   : https://patchwork.freedesktop.org/series/29387/
State : warning

== Summary ==

Test perf:
        Subgroup blocking:
                fail       -> PASS       (shard-hsw) fdo#102252
Test kms_plane_multiple:
        Subgroup atomic-pipe-A-tiling-x:
                pass       -> SKIP       (shard-hsw)

fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252

shard-hsw        total:2185 pass:1210 dwarn:0   dfail:0   fail:16  skip:959 time:9348s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5499/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] drm/i915: Always sanity check engine state upon idling
  2017-08-26 11:09 [PATCH 1/3] drm/i915: Always sanity check engine state upon idling Chris Wilson
                   ` (3 preceding siblings ...)
  2017-08-26 12:21 ` ✗ Fi.CI.IGT: warning " Patchwork
@ 2017-08-29 12:25 ` Chris Wilson
  2017-08-29 13:07 ` Joonas Lahtinen
  2017-08-29 13:36 ` Mika Kuoppala
  6 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-08-29 12:25 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthias Kaehlcke

Quoting Chris Wilson (2017-08-26 12:09:33)
> When we do a locked idle we know that afterwards all requests have been
> completed and the engines have been cleared of tasks. For whatever
> reason, this doesn't always happen and we may go into a suspend with
> ELSP still full, and this causes an issue upon resume as we get very,
> very confused.
> 
> If the engines refuse to idle, mark the device as wedged. In the process
> we get rid of the maybe unused open-coded version of wait_for_engines
> reported by Nick Desaulniers and Matthias Kaehlcke.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
References: https://bugs.freedesktop.org/show_bug.cgi?id=102456

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matthias Kaehlcke <mka@chromium.org>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] drm/i915: Always sanity check engine state upon idling
  2017-08-26 11:09 [PATCH 1/3] drm/i915: Always sanity check engine state upon idling Chris Wilson
                   ` (4 preceding siblings ...)
  2017-08-29 12:25 ` [PATCH 1/3] " Chris Wilson
@ 2017-08-29 13:07 ` Joonas Lahtinen
  2017-08-29 13:19   ` Chris Wilson
  2017-08-29 13:36 ` Mika Kuoppala
  6 siblings, 1 reply; 12+ messages in thread
From: Joonas Lahtinen @ 2017-08-29 13:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthias Kaehlcke

On Sat, 2017-08-26 at 12:09 +0100, Chris Wilson wrote:
> When we do a locked idle we know that afterwards all requests have been
> completed and the engines have been cleared of tasks. For whatever
> reason, this doesn't always happen and we may go into a suspend with
> ELSP still full, and this causes an issue upon resume as we get very,
> very confused.
> 
> If the engines refuse to idle, mark the device as wedged. In the process
> we get rid of the maybe unused open-coded version of wait_for_engines
> reported by Nick Desaulniers and Matthias Kaehlcke.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matthias Kaehlcke <mka@chromium.org>

I assume GEM_WARN_ON -> DRM_ERROR was intentional.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] drm/i915: Always sanity check engine state upon idling
  2017-08-29 13:07 ` Joonas Lahtinen
@ 2017-08-29 13:19   ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-08-29 13:19 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx; +Cc: Matthias Kaehlcke

Quoting Joonas Lahtinen (2017-08-29 14:07:40)
> On Sat, 2017-08-26 at 12:09 +0100, Chris Wilson wrote:
> > When we do a locked idle we know that afterwards all requests have been
> > completed and the engines have been cleared of tasks. For whatever
> > reason, this doesn't always happen and we may go into a suspend with
> > ELSP still full, and this causes an issue upon resume as we get very,
> > very confused.
> > 
> > If the engines refuse to idle, mark the device as wedged. In the process
> > we get rid of the maybe unused open-coded version of wait_for_engines
> > reported by Nick Desaulniers and Matthias Kaehlcke.
> > 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Matthias Kaehlcke <mka@chromium.org>
> 
> I assume GEM_WARN_ON -> DRM_ERROR was intentional.

Yes. The first time the unused function was reported, the thread drifted
off in the direction of "that we probably want to always do the test"
rather than only in CI. Now that we have seen glk actually fail in this
way, we need to code defensively here (as it is no longer a theoretical
programming error). As it still is a hw issue we want the warning
(especially as this will cause the suspend to fail, we want a reason
why) and as a general rule all wedging should indicate an error (because
it is a last resort around driver bugs).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] drm/i915: Always sanity check engine state upon idling
  2017-08-26 11:09 [PATCH 1/3] drm/i915: Always sanity check engine state upon idling Chris Wilson
                   ` (5 preceding siblings ...)
  2017-08-29 13:07 ` Joonas Lahtinen
@ 2017-08-29 13:36 ` Mika Kuoppala
  2017-08-29 13:55   ` Chris Wilson
  6 siblings, 1 reply; 12+ messages in thread
From: Mika Kuoppala @ 2017-08-29 13:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthias Kaehlcke

Chris Wilson <chris@chris-wilson.co.uk> writes:

> When we do a locked idle we know that afterwards all requests have been
> completed and the engines have been cleared of tasks. For whatever
> reason, this doesn't always happen and we may go into a suspend with
> ELSP still full, and this causes an issue upon resume as we get very,
> very confused.
>
> If the engines refuse to idle, mark the device as wedged. In the process
> we get rid of the maybe unused open-coded version of wait_for_engines
> reported by Nick Desaulniers and Matthias Kaehlcke.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matthias Kaehlcke <mka@chromium.org>

I noticed that when actually do switch to kernel context, it's
async. And then we always do wait for idle.

So as all our usage is sync, why don't we just wait the req in
i915_gem_switch_to_kernel_context(i915) to pinpoint the request
uncompletion. And in addition have this as a further harderning.

But for the unconditional wedge and warn,

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

-Mika


> ---
>  drivers/gpu/drm/i915/i915_gem.c | 20 ++++----------------
>  1 file changed, 4 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ac02785fdaff..c1520c0d2084 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3371,24 +3371,12 @@ static int wait_for_timeline(struct i915_gem_timeline *tl, unsigned int flags)
>  	return 0;
>  }
>  
> -static int wait_for_engine(struct intel_engine_cs *engine, int timeout_ms)
> -{
> -	return wait_for(intel_engine_is_idle(engine), timeout_ms);
> -}
> -
>  static int wait_for_engines(struct drm_i915_private *i915)
>  {
> -	struct intel_engine_cs *engine;
> -	enum intel_engine_id id;
> -
> -	for_each_engine(engine, i915, id) {
> -		if (GEM_WARN_ON(wait_for_engine(engine, 50))) {
> -			i915_gem_set_wedged(i915);
> -			return -EIO;
> -		}
> -
> -		GEM_BUG_ON(intel_engine_get_seqno(engine) !=
> -			   intel_engine_last_submit(engine));
> +	if (wait_for(intel_engines_are_idle(i915), 50)) {
> +		DRM_ERROR("Failed to idle engines, declaring wedged!\n");
> +		i915_gem_set_wedged(i915);
> +		return -EIO;
>  	}
>  
>  	return 0;
> -- 
> 2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] drm/i915: Clear wedged status upon resume
  2017-08-26 11:09 ` [PATCH 2/3] drm/i915: Clear wedged status upon resume Chris Wilson
@ 2017-08-29 13:49   ` Mika Kuoppala
  0 siblings, 0 replies; 12+ messages in thread
From: Mika Kuoppala @ 2017-08-29 13:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> When we wait up from suspend, the device has been powered down and
> should come back afresh. We should be able to safely remove the wedged
> status from the previous session and start afresh.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c1520c0d2084..9dc24b915aa7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4518,6 +4518,12 @@ static void assert_kernel_context_is_current(struct drm_i915_private *dev_priv)
>  
>  void i915_gem_sanitize(struct drm_i915_private *i915)
>  {
> +	if (i915_terminally_wedged(&i915->gpu_error)) {
> +		mutex_lock(&i915->drm.struct_mutex);
> +		i915_gem_unset_wedged(i915);
> +		mutex_unlock(&i915->drm.struct_mutex);
> +	}
> +
>  	/*
>  	 * If we inherit context state from the BIOS or earlier occupants
>  	 * of the GPU, the GPU may be in an inconsistent state when we
> -- 
> 2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] drm/i915: Discard the request queue if we fail to sleep before suspend
  2017-08-26 11:09 ` [PATCH 3/3] drm/i915: Discard the request queue if we fail to sleep before suspend Chris Wilson
@ 2017-08-29 13:54   ` Mika Kuoppala
  0 siblings, 0 replies; 12+ messages in thread
From: Mika Kuoppala @ 2017-08-29 13:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we fail to clear the outstanding request queue before suspending,
> mark those requests as lost.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=102037
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9dc24b915aa7..37fbc64d9ffe 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4585,7 +4585,8 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
>  	 * reset the GPU back to its idle, low power state.
>  	 */
>  	WARN_ON(dev_priv->gt.awake);
> -	WARN_ON(!intel_engines_are_idle(dev_priv));
> +	if (WARN_ON(!intel_engines_are_idle(dev_priv)))
> +		i915_gem_set_wedged(dev_priv); /* no hope, so reset everthing */

s/ever/every

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

>  
>  	/*
>  	 * Neither the BIOS, ourselves or any other kernel
> -- 
> 2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] drm/i915: Always sanity check engine state upon idling
  2017-08-29 13:36 ` Mika Kuoppala
@ 2017-08-29 13:55   ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-08-29 13:55 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx; +Cc: Matthias Kaehlcke

Quoting Mika Kuoppala (2017-08-29 14:36:57)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > When we do a locked idle we know that afterwards all requests have been
> > completed and the engines have been cleared of tasks. For whatever
> > reason, this doesn't always happen and we may go into a suspend with
> > ELSP still full, and this causes an issue upon resume as we get very,
> > very confused.
> >
> > If the engines refuse to idle, mark the device as wedged. In the process
> > we get rid of the maybe unused open-coded version of wait_for_engines
> > reported by Nick Desaulniers and Matthias Kaehlcke.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Matthias Kaehlcke <mka@chromium.org>
> 
> I noticed that when actually do switch to kernel context, it's
> async. And then we always do wait for idle.
> 
> So as all our usage is sync, why don't we just wait the req in
> i915_gem_switch_to_kernel_context(i915) to pinpoint the request
> uncompletion. And in addition have this as a further harderning.

They are separate for historical reasons, i.e. they have been used
independently. Note that the switch to kernel context may be between 0
and one request per engine to wait upon, and yet we still want to wait.

However, we can move the wait-for-idle into switch-to-kernel-context as
that is common across all callers at present.

* spots an open coded switch to kernel context.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-08-29 13:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-26 11:09 [PATCH 1/3] drm/i915: Always sanity check engine state upon idling Chris Wilson
2017-08-26 11:09 ` [PATCH 2/3] drm/i915: Clear wedged status upon resume Chris Wilson
2017-08-29 13:49   ` Mika Kuoppala
2017-08-26 11:09 ` [PATCH 3/3] drm/i915: Discard the request queue if we fail to sleep before suspend Chris Wilson
2017-08-29 13:54   ` Mika Kuoppala
2017-08-26 11:26 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Always sanity check engine state upon idling Patchwork
2017-08-26 12:21 ` ✗ Fi.CI.IGT: warning " Patchwork
2017-08-29 12:25 ` [PATCH 1/3] " Chris Wilson
2017-08-29 13:07 ` Joonas Lahtinen
2017-08-29 13:19   ` Chris Wilson
2017-08-29 13:36 ` Mika Kuoppala
2017-08-29 13:55   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.