* [PATCH] drm/i915: Flush pending interrupt following a GPU reset
@ 2018-03-21 15:00 Chris Wilson
2018-03-21 15:55 ` Jeff McGee
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Chris Wilson @ 2018-03-21 15:00 UTC (permalink / raw)
To: intel-gfx
After resetting the GPU (or subset of engines), call synchronize_irq()
to flush any pending irq before proceeding with the cleanup. For a
device level reset, we disable the interupts around the reset, but when
resetting just one engine, we have to avoid such global disabling. This
leaves us open to an interrupt arriving for the engine as we try to
reset it. We already do try to flush the IIR following the reset, but we
have to ensure that the in-flight interrupt does not land after we start
cleaning up after the reset; enter synchronize_irq().
As it current stands, we very rarely, but fatally, see sequences such as:
2.... 57964564us : execlists_reset_prepare: rcs0
2.... 57964613us : execlists_reset: rcs0 seqno=424
0d.h1 57964615us : gen8_cs_irq_handler: rcs0 CS active=1
2d..1 57964617us : __i915_request_unsubmit: rcs0 fence 29:1056 <- global_seqno 1060
2.... 57964703us : execlists_reset_finish: rcs0
0..s. 57964705us : execlists_submission_tasklet: rcs0 awake?=1, active=0, irq-posted?=1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
drivers/gpu/drm/i915/intel_uncore.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 4c616d074a97..04830d6125d6 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -2116,11 +2116,14 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
i915_stop_engines(dev_priv, engine_mask);
ret = -ENODEV;
- if (reset)
+ if (reset) {
+ GEM_TRACE("engine_mask=%x\n", engine_mask);
ret = reset(dev_priv, engine_mask);
+ }
if (ret != -ETIMEDOUT)
break;
+ synchronize_irq(dev_priv->drm.irq);
cond_resched();
}
intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] drm/i915: Flush pending interrupt following a GPU reset
2018-03-21 15:00 [PATCH] drm/i915: Flush pending interrupt following a GPU reset Chris Wilson
@ 2018-03-21 15:55 ` Jeff McGee
2018-03-21 16:41 ` Chris Wilson
2018-03-21 16:42 ` Chris Wilson
2018-03-21 16:59 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
` (2 subsequent siblings)
3 siblings, 2 replies; 8+ messages in thread
From: Jeff McGee @ 2018-03-21 15:55 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Wed, Mar 21, 2018 at 03:00:23PM +0000, Chris Wilson wrote:
> After resetting the GPU (or subset of engines), call synchronize_irq()
> to flush any pending irq before proceeding with the cleanup. For a
> device level reset, we disable the interupts around the reset, but when
> resetting just one engine, we have to avoid such global disabling. This
> leaves us open to an interrupt arriving for the engine as we try to
> reset it. We already do try to flush the IIR following the reset, but we
> have to ensure that the in-flight interrupt does not land after we start
> cleaning up after the reset; enter synchronize_irq().
>
> As it current stands, we very rarely, but fatally, see sequences such as:
>
> 2.... 57964564us : execlists_reset_prepare: rcs0
> 2.... 57964613us : execlists_reset: rcs0 seqno=424
> 0d.h1 57964615us : gen8_cs_irq_handler: rcs0 CS active=1
> 2d..1 57964617us : __i915_request_unsubmit: rcs0 fence 29:1056 <- global_seqno 1060
> 2.... 57964703us : execlists_reset_finish: rcs0
> 0..s. 57964705us : execlists_submission_tasklet: rcs0 awake?=1, active=0, irq-posted?=1
>
I can repro this sequence easily with force preemption IGT. Just tried this
patch and the issue is still there. For the moment I can just mitigate with
https://patchwork.freedesktop.org/patch/211086/
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> ---
> drivers/gpu/drm/i915/intel_uncore.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 4c616d074a97..04830d6125d6 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -2116,11 +2116,14 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
> i915_stop_engines(dev_priv, engine_mask);
>
> ret = -ENODEV;
> - if (reset)
> + if (reset) {
> + GEM_TRACE("engine_mask=%x\n", engine_mask);
> ret = reset(dev_priv, engine_mask);
> + }
> if (ret != -ETIMEDOUT)
> break;
>
> + synchronize_irq(dev_priv->drm.irq);
> cond_resched();
> }
> intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> --
> 2.16.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] drm/i915: Flush pending interrupt following a GPU reset
2018-03-21 15:55 ` Jeff McGee
@ 2018-03-21 16:41 ` Chris Wilson
2018-03-21 16:42 ` Chris Wilson
1 sibling, 0 replies; 8+ messages in thread
From: Chris Wilson @ 2018-03-21 16:41 UTC (permalink / raw)
To: Jeff McGee; +Cc: intel-gfx
Quoting Jeff McGee (2018-03-21 15:55:16)
> On Wed, Mar 21, 2018 at 03:00:23PM +0000, Chris Wilson wrote:
> > After resetting the GPU (or subset of engines), call synchronize_irq()
> > to flush any pending irq before proceeding with the cleanup. For a
> > device level reset, we disable the interupts around the reset, but when
> > resetting just one engine, we have to avoid such global disabling. This
> > leaves us open to an interrupt arriving for the engine as we try to
> > reset it. We already do try to flush the IIR following the reset, but we
> > have to ensure that the in-flight interrupt does not land after we start
> > cleaning up after the reset; enter synchronize_irq().
> >
> > As it current stands, we very rarely, but fatally, see sequences such as:
> >
> > 2.... 57964564us : execlists_reset_prepare: rcs0
> > 2.... 57964613us : execlists_reset: rcs0 seqno=424
> > 0d.h1 57964615us : gen8_cs_irq_handler: rcs0 CS active=1
> > 2d..1 57964617us : __i915_request_unsubmit: rcs0 fence 29:1056 <- global_seqno 1060
> > 2.... 57964703us : execlists_reset_finish: rcs0
> > 0..s. 57964705us : execlists_submission_tasklet: rcs0 awake?=1, active=0, irq-posted?=1
> >
> I can repro this sequence easily with force preemption IGT. Just tried this
> patch and the issue is still there. For the moment I can just mitigate with
> https://patchwork.freedesktop.org/patch/211086/
These patch is purely to make sure that the irq_handler is not called
after execlists_reset, as that is what we rely on.
We *should* never be in a situation where CSB tail is invalid, that
implies either we screwed up or the hw is incoherent.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] drm/i915: Flush pending interrupt following a GPU reset
2018-03-21 15:55 ` Jeff McGee
2018-03-21 16:41 ` Chris Wilson
@ 2018-03-21 16:42 ` Chris Wilson
2018-03-21 17:10 ` Jeff McGee
1 sibling, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2018-03-21 16:42 UTC (permalink / raw)
To: Jeff McGee; +Cc: intel-gfx
Quoting Jeff McGee (2018-03-21 15:55:16)
> On Wed, Mar 21, 2018 at 03:00:23PM +0000, Chris Wilson wrote:
> > After resetting the GPU (or subset of engines), call synchronize_irq()
> > to flush any pending irq before proceeding with the cleanup. For a
> > device level reset, we disable the interupts around the reset, but when
> > resetting just one engine, we have to avoid such global disabling. This
> > leaves us open to an interrupt arriving for the engine as we try to
> > reset it. We already do try to flush the IIR following the reset, but we
> > have to ensure that the in-flight interrupt does not land after we start
> > cleaning up after the reset; enter synchronize_irq().
> >
> > As it current stands, we very rarely, but fatally, see sequences such as:
> >
> > 2.... 57964564us : execlists_reset_prepare: rcs0
> > 2.... 57964613us : execlists_reset: rcs0 seqno=424
> > 0d.h1 57964615us : gen8_cs_irq_handler: rcs0 CS active=1
> > 2d..1 57964617us : __i915_request_unsubmit: rcs0 fence 29:1056 <- global_seqno 1060
> > 2.... 57964703us : execlists_reset_finish: rcs0
> > 0..s. 57964705us : execlists_submission_tasklet: rcs0 awake?=1, active=0, irq-posted?=1
> >
> I can repro this sequence easily with force preemption IGT.
With the sequence I suggested?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Flush pending interrupt following a GPU reset
2018-03-21 15:00 [PATCH] drm/i915: Flush pending interrupt following a GPU reset Chris Wilson
2018-03-21 15:55 ` Jeff McGee
@ 2018-03-21 16:59 ` Patchwork
2018-03-21 17:14 ` ✓ Fi.CI.BAT: success " Patchwork
2018-03-21 21:13 ` ✓ Fi.CI.IGT: " Patchwork
3 siblings, 0 replies; 8+ messages in thread
From: Patchwork @ 2018-03-21 16:59 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Flush pending interrupt following a GPU reset
URL : https://patchwork.freedesktop.org/series/40383/
State : warning
== Summary ==
$ dim checkpatch origin/drm-tip
c3bbd56f3b68 drm/i915: Flush pending interrupt following a GPU reset
-:23: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#23:
2d..1 57964617us : __i915_request_unsubmit: rcs0 fence 29:1056 <- global_seqno 1060
total: 0 errors, 1 warnings, 0 checks, 15 lines checked
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] drm/i915: Flush pending interrupt following a GPU reset
2018-03-21 16:42 ` Chris Wilson
@ 2018-03-21 17:10 ` Jeff McGee
0 siblings, 0 replies; 8+ messages in thread
From: Jeff McGee @ 2018-03-21 17:10 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Wed, Mar 21, 2018 at 04:42:32PM +0000, Chris Wilson wrote:
> Quoting Jeff McGee (2018-03-21 15:55:16)
> > On Wed, Mar 21, 2018 at 03:00:23PM +0000, Chris Wilson wrote:
> > > After resetting the GPU (or subset of engines), call synchronize_irq()
> > > to flush any pending irq before proceeding with the cleanup. For a
> > > device level reset, we disable the interupts around the reset, but when
> > > resetting just one engine, we have to avoid such global disabling. This
> > > leaves us open to an interrupt arriving for the engine as we try to
> > > reset it. We already do try to flush the IIR following the reset, but we
> > > have to ensure that the in-flight interrupt does not land after we start
> > > cleaning up after the reset; enter synchronize_irq().
> > >
> > > As it current stands, we very rarely, but fatally, see sequences such as:
> > >
> > > 2.... 57964564us : execlists_reset_prepare: rcs0
> > > 2.... 57964613us : execlists_reset: rcs0 seqno=424
> > > 0d.h1 57964615us : gen8_cs_irq_handler: rcs0 CS active=1
> > > 2d..1 57964617us : __i915_request_unsubmit: rcs0 fence 29:1056 <- global_seqno 1060
> > > 2.... 57964703us : execlists_reset_finish: rcs0
> > > 0..s. 57964705us : execlists_submission_tasklet: rcs0 awake?=1, active=0, irq-posted?=1
> > >
> > I can repro this sequence easily with force preemption IGT.
>
> With the sequence I suggested?
> -Chris
Yes. Your approach to protecting port[1] context is working well. This is
the only issue I'm still hitting. I'll post my updated RFC set in a sec.
-Jeff
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Flush pending interrupt following a GPU reset
2018-03-21 15:00 [PATCH] drm/i915: Flush pending interrupt following a GPU reset Chris Wilson
2018-03-21 15:55 ` Jeff McGee
2018-03-21 16:59 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
@ 2018-03-21 17:14 ` Patchwork
2018-03-21 21:13 ` ✓ Fi.CI.IGT: " Patchwork
3 siblings, 0 replies; 8+ messages in thread
From: Patchwork @ 2018-03-21 17:14 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Flush pending interrupt following a GPU reset
URL : https://patchwork.freedesktop.org/series/40383/
State : success
== Summary ==
Series 40383v1 drm/i915: Flush pending interrupt following a GPU reset
https://patchwork.freedesktop.org/api/1.0/series/40383/revisions/1/mbox/
---- Known issues:
Test gem_mmap_gtt:
Subgroup basic-small-bo-tiledx:
pass -> FAIL (fi-gdg-551) fdo#102575
Test kms_flip:
Subgroup basic-flip-vs-wf_vblank:
fail -> PASS (fi-cfl-s2) fdo#100368
Test kms_frontbuffer_tracking:
Subgroup basic:
pass -> FAIL (fi-cnl-y3) fdo#103167
fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
fi-bdw-5557u total:285 pass:264 dwarn:0 dfail:0 fail:0 skip:21 time:432s
fi-bdw-gvtdvm total:285 pass:261 dwarn:0 dfail:0 fail:0 skip:24 time:447s
fi-blb-e6850 total:285 pass:220 dwarn:1 dfail:0 fail:0 skip:64 time:380s
fi-bsw-n3050 total:285 pass:239 dwarn:0 dfail:0 fail:0 skip:46 time:543s
fi-bwr-2160 total:285 pass:180 dwarn:0 dfail:0 fail:0 skip:105 time:295s
fi-bxt-dsi total:285 pass:255 dwarn:0 dfail:0 fail:0 skip:30 time:512s
fi-bxt-j4205 total:285 pass:256 dwarn:0 dfail:0 fail:0 skip:29 time:508s
fi-byt-j1900 total:285 pass:250 dwarn:0 dfail:0 fail:0 skip:35 time:516s
fi-byt-n2820 total:285 pass:246 dwarn:0 dfail:0 fail:0 skip:39 time:500s
fi-cfl-8700k total:285 pass:257 dwarn:0 dfail:0 fail:0 skip:28 time:410s
fi-cfl-s2 total:285 pass:259 dwarn:0 dfail:0 fail:0 skip:26 time:567s
fi-cfl-u total:285 pass:259 dwarn:0 dfail:0 fail:0 skip:26 time:512s
fi-cnl-drrs total:285 pass:254 dwarn:3 dfail:0 fail:0 skip:28 time:525s
fi-cnl-y3 total:285 pass:258 dwarn:0 dfail:0 fail:1 skip:26 time:585s
fi-elk-e7500 total:285 pass:225 dwarn:1 dfail:0 fail:0 skip:59 time:425s
fi-gdg-551 total:285 pass:176 dwarn:0 dfail:0 fail:1 skip:108 time:321s
fi-glk-1 total:285 pass:257 dwarn:0 dfail:0 fail:0 skip:28 time:535s
fi-hsw-4770 total:285 pass:258 dwarn:0 dfail:0 fail:0 skip:27 time:401s
fi-ilk-650 total:285 pass:225 dwarn:0 dfail:0 fail:0 skip:60 time:418s
fi-ivb-3520m total:285 pass:256 dwarn:0 dfail:0 fail:0 skip:29 time:474s
fi-ivb-3770 total:285 pass:252 dwarn:0 dfail:0 fail:0 skip:33 time:432s
fi-kbl-7500u total:285 pass:260 dwarn:1 dfail:0 fail:0 skip:24 time:478s
fi-kbl-7567u total:285 pass:265 dwarn:0 dfail:0 fail:0 skip:20 time:468s
fi-kbl-r total:285 pass:258 dwarn:0 dfail:0 fail:0 skip:27 time:512s
fi-pnv-d510 total:285 pass:219 dwarn:1 dfail:0 fail:0 skip:65 time:656s
fi-skl-6260u total:285 pass:265 dwarn:0 dfail:0 fail:0 skip:20 time:440s
fi-skl-6600u total:285 pass:258 dwarn:0 dfail:0 fail:0 skip:27 time:529s
fi-skl-6700k2 total:285 pass:261 dwarn:0 dfail:0 fail:0 skip:24 time:501s
fi-skl-6770hq total:285 pass:265 dwarn:0 dfail:0 fail:0 skip:20 time:491s
fi-skl-guc total:285 pass:257 dwarn:0 dfail:0 fail:0 skip:28 time:427s
fi-skl-gvtdvm total:285 pass:262 dwarn:0 dfail:0 fail:0 skip:23 time:447s
fi-snb-2520m total:285 pass:245 dwarn:0 dfail:0 fail:0 skip:40 time:583s
fi-snb-2600 total:285 pass:245 dwarn:0 dfail:0 fail:0 skip:40 time:398s
69b094355a7bc8d1752a43508ded3266d4e5a223 drm-tip: 2018y-03m-21d-15h-43m-50s UTC integration manifest
c3bbd56f3b68 drm/i915: Flush pending interrupt following a GPU reset
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8434/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
* ✓ Fi.CI.IGT: success for drm/i915: Flush pending interrupt following a GPU reset
2018-03-21 15:00 [PATCH] drm/i915: Flush pending interrupt following a GPU reset Chris Wilson
` (2 preceding siblings ...)
2018-03-21 17:14 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-03-21 21:13 ` Patchwork
3 siblings, 0 replies; 8+ messages in thread
From: Patchwork @ 2018-03-21 21:13 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Flush pending interrupt following a GPU reset
URL : https://patchwork.freedesktop.org/series/40383/
State : success
== Summary ==
---- Known issues:
Test kms_cursor_crc:
Subgroup cursor-64x64-suspend:
incomplete -> PASS (shard-hsw) fdo#103540
Test kms_flip:
Subgroup 2x-flip-vs-expired-vblank:
fail -> PASS (shard-hsw) fdo#102887
Subgroup 2x-plain-flip-fb-recreate:
fail -> PASS (shard-hsw) fdo#100368 +1
Subgroup modeset-vs-vblank-race-interruptible:
pass -> FAIL (shard-hsw) fdo#103060
Test kms_frontbuffer_tracking:
Subgroup fbc-1p-primscrn-indfb-plflip-blt:
fail -> PASS (shard-apl) fdo#101623
Test kms_setmode:
Subgroup basic:
pass -> FAIL (shard-apl) fdo#99912
fdo#103540 https://bugs.freedesktop.org/show_bug.cgi?id=103540
fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
fdo#101623 https://bugs.freedesktop.org/show_bug.cgi?id=101623
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
shard-apl total:3478 pass:1814 dwarn:1 dfail:0 fail:7 skip:1655 time:13063s
shard-hsw total:3478 pass:1767 dwarn:1 dfail:0 fail:2 skip:1707 time:11883s
shard-snb total:3478 pass:1357 dwarn:1 dfail:0 fail:3 skip:2117 time:7236s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8434/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-03-21 21:13 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-21 15:00 [PATCH] drm/i915: Flush pending interrupt following a GPU reset Chris Wilson
2018-03-21 15:55 ` Jeff McGee
2018-03-21 16:41 ` Chris Wilson
2018-03-21 16:42 ` Chris Wilson
2018-03-21 17:10 ` Jeff McGee
2018-03-21 16:59 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2018-03-21 17:14 ` ✓ Fi.CI.BAT: success " Patchwork
2018-03-21 21:13 ` ✓ Fi.CI.IGT: " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.