* [PATCH 1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno()
@ 2017-04-13 13:07 Chris Wilson
2017-04-13 13:07 ` [PATCH 2/2] drm/i915: Reset hangcheck timeouts upon idling Chris Wilson
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Chris Wilson @ 2017-04-13 13:07 UTC (permalink / raw)
To: intel-gfx
The hangcheck runs independently to the main flow of seqno through the
driver. However, we have an odd coupling of the seqno reset that is
unwelcome, and if poked at just the right rate can cause spurious hangs
(e.g. gem_exec_whisper) on an apparently idle engine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/i915/intel_engine_cs.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 7681d17ce454..f3cb7e931317 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -265,6 +265,7 @@ void intel_engine_init_global_seqno(struct intel_engine_cs *engine, u32 seqno)
struct drm_i915_private *dev_priv = engine->i915;
GEM_BUG_ON(!intel_engine_is_idle(engine));
+ GEM_BUG_ON(i915_gem_active_isset(&engine->timeline->last_request));
/* Our semaphore implementation is strictly monotonic (i.e. we proceed
* so long as the semaphore value in the register/page is greater
@@ -284,9 +285,6 @@ void intel_engine_init_global_seqno(struct intel_engine_cs *engine, u32 seqno)
intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
- GEM_BUG_ON(i915_gem_active_isset(&engine->timeline->last_request));
- engine->hangcheck.seqno = seqno;
-
/* After manually advancing the seqno, fake the interrupt in case
* there are any waiters for that seqno.
*/
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] drm/i915: Reset hangcheck timeouts upon idling
2017-04-13 13:07 [PATCH 1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno() Chris Wilson
@ 2017-04-13 13:07 ` Chris Wilson
2017-04-19 14:09 ` Mika Kuoppala
2017-04-13 13:37 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno() Patchwork
2017-04-21 8:12 ` [PATCH 1/2] " Mika Kuoppala
2 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2017-04-13 13:07 UTC (permalink / raw)
To: intel-gfx
If we have a long period of idleness, we turn off the hangcheck timer
and stop polling the hardware. Before we restart the hangcheck, we
should clear the previous timestamps to prevent us thinking that the
engine was stalled for a long time, if the seqno were manipulated
carefully (such as the repeating patterns in gem_exec_whisper).
It should have no impact upon normal use.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/i915/intel_hangcheck.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index b0ca0c4c70d9..a74decca5109 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -409,13 +409,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
int busy_count = 0;
if (!i915.enable_hangcheck)
- return;
+ goto disarm_hangcheck;
if (!READ_ONCE(dev_priv->gt.awake))
- return;
+ goto disarm_hangcheck;
if (i915_terminally_wedged(&dev_priv->gpu_error))
- return;
+ goto disarm_hangcheck;
/* As enabling the GPU requires fairly extensive mmio access,
* periodically arm the mmio checker to see if we are triggering
@@ -446,8 +446,14 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
hangcheck_declare_hang(dev_priv, hung, stuck);
/* Reset timer in case GPU hangs without another request being added */
- if (busy_count)
+ if (busy_count) {
i915_queue_hangcheck(dev_priv);
+ return;
+ }
+
+disarm_hangcheck:
+ for_each_engine(engine, dev_priv, id)
+ intel_engine_init_hangcheck(engine);
}
void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno()
2017-04-13 13:07 [PATCH 1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno() Chris Wilson
2017-04-13 13:07 ` [PATCH 2/2] drm/i915: Reset hangcheck timeouts upon idling Chris Wilson
@ 2017-04-13 13:37 ` Patchwork
2017-04-21 8:12 ` [PATCH 1/2] " Mika Kuoppala
2 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2017-04-13 13:37 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno()
URL : https://patchwork.freedesktop.org/series/23003/
State : failure
== Summary ==
Series 23003v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/23003/revisions/1/mbox/
Test gem_busy:
Subgroup basic-hang-default:
pass -> FAIL (fi-hsw-4770r) fdo#100561
pass -> FAIL (fi-bsw-n3050)
pass -> FAIL (fi-byt-n2820)
pass -> FAIL (fi-bxt-j4205)
pass -> FAIL (fi-kbl-7500u)
pass -> FAIL (fi-bdw-5557u)
pass -> FAIL (fi-skl-6260u)
pass -> FAIL (fi-skl-6700hq)
pass -> FAIL (fi-kbl-7560u)
pass -> FAIL (fi-snb-2600)
pass -> FAIL (fi-byt-j1900)
pass -> FAIL (fi-skl-6700k)
pass -> FAIL (fi-snb-2520m)
pass -> FAIL (fi-bdw-gvtdvm)
pass -> FAIL (fi-skl-gvtdvm)
pass -> FAIL (fi-ilk-650)
pass -> FAIL (fi-ivb-3520m)
pass -> FAIL (fi-bxt-t5700)
pass -> FAIL (fi-skl-6770hq)
pass -> FAIL (fi-hsw-4770)
pass -> FAIL (fi-ivb-3770)
fdo#100561 https://bugs.freedesktop.org/show_bug.cgi?id=100561
fi-bdw-5557u total:278 pass:266 dwarn:0 dfail:0 fail:1 skip:11 time:546s
fi-bdw-gvtdvm total:278 pass:255 dwarn:8 dfail:0 fail:1 skip:14 time:545s
fi-bsw-n3050 total:278 pass:241 dwarn:0 dfail:0 fail:1 skip:36 time:695s
fi-bxt-j4205 total:278 pass:258 dwarn:0 dfail:0 fail:1 skip:19 time:624s
fi-bxt-t5700 total:278 pass:257 dwarn:0 dfail:0 fail:1 skip:20 time:679s
fi-byt-j1900 total:278 pass:249 dwarn:0 dfail:0 fail:5 skip:24 time:602s
fi-byt-n2820 total:278 pass:245 dwarn:0 dfail:0 fail:5 skip:28 time:594s
fi-hsw-4770 total:278 pass:257 dwarn:0 dfail:0 fail:5 skip:16 time:528s
fi-hsw-4770r total:278 pass:257 dwarn:0 dfail:0 fail:5 skip:16 time:524s
fi-ilk-650 total:278 pass:223 dwarn:0 dfail:0 fail:5 skip:50 time:539s
fi-ivb-3520m total:278 pass:253 dwarn:0 dfail:0 fail:7 skip:18 time:605s
fi-ivb-3770 total:278 pass:255 dwarn:0 dfail:0 fail:5 skip:18 time:583s
fi-kbl-7500u total:278 pass:259 dwarn:0 dfail:0 fail:1 skip:18 time:573s
fi-kbl-7560u total:278 pass:266 dwarn:1 dfail:0 fail:1 skip:10 time:682s
fi-skl-6260u total:278 pass:267 dwarn:0 dfail:0 fail:1 skip:10 time:574s
fi-skl-6700hq total:278 pass:260 dwarn:0 dfail:0 fail:1 skip:17 time:688s
fi-skl-6700k total:278 pass:255 dwarn:4 dfail:0 fail:1 skip:18 time:577s
fi-skl-6770hq total:278 pass:267 dwarn:0 dfail:0 fail:1 skip:10 time:618s
fi-skl-gvtdvm total:278 pass:264 dwarn:0 dfail:0 fail:1 skip:13 time:549s
fi-snb-2520m total:278 pass:245 dwarn:0 dfail:0 fail:5 skip:28 time:651s
fi-snb-2600 total:278 pass:244 dwarn:0 dfail:0 fail:5 skip:29 time:524s
f1faf571d9530365a34fe33a3efa3fb224661692 drm-tip: 2017y-04m-13d-12h-34m-48s UTC integration manifest
23aee1e drm/i915: Reset hangcheck timeouts upon idling
5fc653d drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno()
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4502/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reset hangcheck timeouts upon idling
2017-04-13 13:07 ` [PATCH 2/2] drm/i915: Reset hangcheck timeouts upon idling Chris Wilson
@ 2017-04-19 14:09 ` Mika Kuoppala
2017-04-19 16:25 ` Chris Wilson
0 siblings, 1 reply; 7+ messages in thread
From: Mika Kuoppala @ 2017-04-19 14:09 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> If we have a long period of idleness, we turn off the hangcheck timer
> and stop polling the hardware. Before we restart the hangcheck, we
> should clear the previous timestamps to prevent us thinking that the
> engine was stalled for a long time, if the seqno were manipulated
> carefully (such as the repeating patterns in gem_exec_whisper).
>
> It should have no impact upon normal use.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/intel_hangcheck.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index b0ca0c4c70d9..a74decca5109 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -409,13 +409,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> int busy_count = 0;
>
> if (!i915.enable_hangcheck)
> - return;
> + goto disarm_hangcheck;
>
> if (!READ_ONCE(dev_priv->gt.awake))
> - return;
> + goto disarm_hangcheck;
>
> if (i915_terminally_wedged(&dev_priv->gpu_error))
> - return;
> + goto disarm_hangcheck;
>
> /* As enabling the GPU requires fairly extensive mmio access,
> * periodically arm the mmio checker to see if we are triggering
> @@ -446,8 +446,14 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> hangcheck_declare_hang(dev_priv, hung, stuck);
>
> /* Reset timer in case GPU hangs without another request being added */
> - if (busy_count)
> + if (busy_count) {
> i915_queue_hangcheck(dev_priv);
Now if we don't have a waiter, we always init hangcheck. And thus
we never detect a hang if so. As demonstrated by the
gem_busy@basic-default-hang.
I suggest we decouple the waiters completely from hangcheck:
- const bool busy = intel_engine_has_waiter(engine);
+ const bool busy = engine->timeline->inflight_seqnos;
-Mika
> + return;
> + }
> +
> +disarm_hangcheck:
> + for_each_engine(engine, dev_priv, id)
> + intel_engine_init_hangcheck(engine);
> }
>
> void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
> --
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reset hangcheck timeouts upon idling
2017-04-19 14:09 ` Mika Kuoppala
@ 2017-04-19 16:25 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-04-19 16:25 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
On Wed, Apr 19, 2017 at 05:09:46PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > If we have a long period of idleness, we turn off the hangcheck timer
> > and stop polling the hardware. Before we restart the hangcheck, we
> > should clear the previous timestamps to prevent us thinking that the
> > engine was stalled for a long time, if the seqno were manipulated
> > carefully (such as the repeating patterns in gem_exec_whisper).
> >
> > It should have no impact upon normal use.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/intel_hangcheck.c | 14 ++++++++++----
> > 1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> > index b0ca0c4c70d9..a74decca5109 100644
> > --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> > @@ -409,13 +409,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> > int busy_count = 0;
> >
> > if (!i915.enable_hangcheck)
> > - return;
> > + goto disarm_hangcheck;
> >
> > if (!READ_ONCE(dev_priv->gt.awake))
> > - return;
> > + goto disarm_hangcheck;
> >
> > if (i915_terminally_wedged(&dev_priv->gpu_error))
> > - return;
> > + goto disarm_hangcheck;
> >
> > /* As enabling the GPU requires fairly extensive mmio access,
> > * periodically arm the mmio checker to see if we are triggering
> > @@ -446,8 +446,14 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> > hangcheck_declare_hang(dev_priv, hung, stuck);
> >
> > /* Reset timer in case GPU hangs without another request being added */
> > - if (busy_count)
> > + if (busy_count) {
> > i915_queue_hangcheck(dev_priv);
>
> Now if we don't have a waiter, we always init hangcheck. And thus
> we never detect a hang if so. As demonstrated by the
> gem_busy@basic-default-hang.
>
> I suggest we decouple the waiters completely from hangcheck:
>
> - const bool busy = intel_engine_has_waiter(engine);
> + const bool busy = engine->timeline->inflight_seqnos;
inflight seqnos isn't a good choice either, as that doesn't mean the
engine is active yet. The only issue with this patch was resetting the
hangcheck.seqno nerfed the waiterless hangcheck. Turned out to be a very
bad idea.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno()
2017-04-13 13:07 [PATCH 1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno() Chris Wilson
2017-04-13 13:07 ` [PATCH 2/2] drm/i915: Reset hangcheck timeouts upon idling Chris Wilson
2017-04-13 13:37 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno() Patchwork
@ 2017-04-21 8:12 ` Mika Kuoppala
2017-04-21 8:36 ` Chris Wilson
2 siblings, 1 reply; 7+ messages in thread
From: Mika Kuoppala @ 2017-04-21 8:12 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> The hangcheck runs independently to the main flow of seqno through the
> driver. However, we have an odd coupling of the seqno reset that is
> unwelcome, and if poked at just the right rate can cause spurious hangs
> (e.g. gem_exec_whisper) on an apparently idle engine.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/intel_engine_cs.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 7681d17ce454..f3cb7e931317 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -265,6 +265,7 @@ void intel_engine_init_global_seqno(struct intel_engine_cs *engine, u32 seqno)
> struct drm_i915_private *dev_priv = engine->i915;
>
> GEM_BUG_ON(!intel_engine_is_idle(engine));
> + GEM_BUG_ON(i915_gem_active_isset(&engine->timeline->last_request));
>
> /* Our semaphore implementation is strictly monotonic (i.e. we proceed
> * so long as the semaphore value in the register/page is greater
> @@ -284,9 +285,6 @@ void intel_engine_init_global_seqno(struct intel_engine_cs *engine, u32 seqno)
> intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
>
> - GEM_BUG_ON(i915_gem_active_isset(&engine->timeline->last_request));
> - engine->hangcheck.seqno = seqno;
> -
> /* After manually advancing the seqno, fake the interrupt in case
> * there are any waiters for that seqno.
> */
> --
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno()
2017-04-21 8:12 ` [PATCH 1/2] " Mika Kuoppala
@ 2017-04-21 8:36 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-04-21 8:36 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
On Fri, Apr 21, 2017 at 11:12:21AM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > The hangcheck runs independently to the main flow of seqno through the
> > driver. However, we have an odd coupling of the seqno reset that is
> > unwelcome, and if poked at just the right rate can cause spurious hangs
> > (e.g. gem_exec_whisper) on an apparently idle engine.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Ta, I'll send this by itself to CI to confirm that it is safe :)
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-04-21 8:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-13 13:07 [PATCH 1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno() Chris Wilson
2017-04-13 13:07 ` [PATCH 2/2] drm/i915: Reset hangcheck timeouts upon idling Chris Wilson
2017-04-19 14:09 ` Mika Kuoppala
2017-04-19 16:25 ` Chris Wilson
2017-04-13 13:37 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915: Stop touching hangcheck.seqno from intel_engine_init_global_seqno() Patchwork
2017-04-21 8:12 ` [PATCH 1/2] " Mika Kuoppala
2017-04-21 8:36 ` Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.