All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH] drm/i915/gt: Clear the execlists timers before restarting
@ 2020-12-03  0:57 Chris Wilson
  2020-12-03  1:03 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for " Patchwork
  2020-12-03  7:46 ` [Intel-gfx] [PATCH] " Chris Wilson
  0 siblings, 2 replies; 3+ messages in thread
From: Chris Wilson @ 2020-12-03  0:57 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Across a reset, we stop the engine but not the timers. This leaves a
window where the timers have inconsistent state with the engine, causing
false timeslicing/preemption decisions to be made immediately upon
resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
This fits the trace of a failure across reset, and has a certain ring of
truth to it, but the preempt timer should have been cleared with the
first submission after the reset (and before the first submission should
not be an issue). I fear there's something else lurking here with the
timer vs reset.

For reference, the issue is the immediate reset following the first,
both due to preeempt timeout, but there was not a submission during the
reset to prime the preempt timer:

[   27.184920] kworker/-121       3.... 27095209us : execlists_reset: 0000:00:02.0 bcs0: reset for preemption time out
[   27.184962] kworker/-121       3d..1 27095309us : active_context: 0000:00:02.0 bcs0: ccid found at active:0
[   27.185005] kworker/-121       3d..1 27095312us : execlists_hold: 0000:00:02.0 bcs0: fence 1c:45, current 44 on hold
[   27.185048] kworker/-121       3d..1 27095313us : execlists_hold: 0000:00:02.0 bcs0: fence 1c:46, current 44 on hold
[   27.185091] kworker/-121       3d..1 27095314us : execlists_hold: 0000:00:02.0 bcs0: fence 1c:47, current 44 on hold
[   27.185135] kworker/-121       3.... 27095316us : intel_engine_reset: 0000:00:02.0 bcs0: flags=8
[   27.185178] kworker/-121       3.... 27095345us : execlists_reset_prepare: 0000:00:02.0 bcs0: depth<-1
[   27.185218] kworker/-121       3.... 27095346us : intel_engine_stop_cs: 0000:00:02.0 bcs0: 
[   27.185259] kworker/-121       3.... 27096347us : intel_engine_stop_cs: 0000:00:02.0 bcs0: timed out on STOP_RING -> IDLE
[   27.185304] kworker/-121       3.... 27096367us : __intel_gt_reset: 0000:00:02.0 engine_mask=2
[   27.185345] kworker/-121       3.... 27097297us : intel_engine_cancel_stop_cs: 0000:00:02.0 bcs0: 
[   27.185388] kworker/-121       3.... 27097299us : execlists_reset_finish: 0000:00:02.0 bcs0: depth->1
[   27.185440] kworker/-121       3d..2 27097348us : __i915_schedule: 0000:00:02.0 bcs0: bumping queue-priority-hint:1025 for rq:13:20, inflight:1c:47 prio 0
[   27.185485] kworker/-121       3..s1 27097350us : execlists_reset: 0000:00:02.0 bcs0: reset for preemption time out
[   27.185528] kworker/-121       3d.s2 27097454us : active_context: 0000:00:02.0 bcs0: ccid found at active:0

---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 43703efb36d1..f5685ec9e0cd 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4200,6 +4200,9 @@ static int execlists_resume(struct intel_engine_cs *engine)
 
 	intel_breadcrumbs_reset(engine->breadcrumbs);
 
+	cancel_timer(&execlists->timer);
+	cancel_timer(&execlists->preempt);
+
 	if (GEM_SHOW_DEBUG() && unexpected_starting_state(engine)) {
 		struct drm_printer p = drm_debug_printer(__func__);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915/gt: Clear the execlists timers before restarting
  2020-12-03  0:57 [Intel-gfx] [PATCH] drm/i915/gt: Clear the execlists timers before restarting Chris Wilson
@ 2020-12-03  1:03 ` Patchwork
  2020-12-03  7:46 ` [Intel-gfx] [PATCH] " Chris Wilson
  1 sibling, 0 replies; 3+ messages in thread
From: Patchwork @ 2020-12-03  1:03 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/gt: Clear the execlists timers before restarting
URL   : https://patchwork.freedesktop.org/series/84515/
State : failure

== Summary ==

CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  DESCEND  objtool
  CHK     include/generated/compile.h
  CC [M]  drivers/gpu/drm/i915/gt/intel_lrc.o
drivers/gpu/drm/i915/gt/intel_lrc.c: In function ‘execlists_resume’:
drivers/gpu/drm/i915/gt/intel_lrc.c:4203:16: error: ‘execlists’ undeclared (first use in this function); did you mean ‘execlists_hold’?
  cancel_timer(&execlists->timer);
                ^~~~~~~~~
                execlists_hold
drivers/gpu/drm/i915/gt/intel_lrc.c:4203:16: note: each undeclared identifier is reported only once for each function it appears in
scripts/Makefile.build:283: recipe for target 'drivers/gpu/drm/i915/gt/intel_lrc.o' failed
make[4]: *** [drivers/gpu/drm/i915/gt/intel_lrc.o] Error 1
scripts/Makefile.build:500: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:500: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:500: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1797: recipe for target 'drivers' failed
make: *** [drivers] Error 2


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gt: Clear the execlists timers before restarting
  2020-12-03  0:57 [Intel-gfx] [PATCH] drm/i915/gt: Clear the execlists timers before restarting Chris Wilson
  2020-12-03  1:03 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for " Patchwork
@ 2020-12-03  7:46 ` Chris Wilson
  1 sibling, 0 replies; 3+ messages in thread
From: Chris Wilson @ 2020-12-03  7:46 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2020-12-03 00:57:31)
> Across a reset, we stop the engine but not the timers. This leaves a
> window where the timers have inconsistent state with the engine, causing
> false timeslicing/preemption decisions to be made immediately upon
> resume.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> This fits the trace of a failure across reset, and has a certain ring of
> truth to it, but the preempt timer should have been cleared with the
> first submission after the reset (and before the first submission should
> not be an issue). I fear there's something else lurking here with the
> timer vs reset.
> 
> For reference, the issue is the immediate reset following the first,
> both due to preeempt timeout, but there was not a submission during the
> reset to prime the preempt timer:
> 
> [   27.184920] kworker/-121       3.... 27095209us : execlists_reset: 0000:00:02.0 bcs0: reset for preemption time out
> [   27.184962] kworker/-121       3d..1 27095309us : active_context: 0000:00:02.0 bcs0: ccid found at active:0
> [   27.185005] kworker/-121       3d..1 27095312us : execlists_hold: 0000:00:02.0 bcs0: fence 1c:45, current 44 on hold
> [   27.185048] kworker/-121       3d..1 27095313us : execlists_hold: 0000:00:02.0 bcs0: fence 1c:46, current 44 on hold
> [   27.185091] kworker/-121       3d..1 27095314us : execlists_hold: 0000:00:02.0 bcs0: fence 1c:47, current 44 on hold
> [   27.185135] kworker/-121       3.... 27095316us : intel_engine_reset: 0000:00:02.0 bcs0: flags=8
> [   27.185178] kworker/-121       3.... 27095345us : execlists_reset_prepare: 0000:00:02.0 bcs0: depth<-1
> [   27.185218] kworker/-121       3.... 27095346us : intel_engine_stop_cs: 0000:00:02.0 bcs0: 
> [   27.185259] kworker/-121       3.... 27096347us : intel_engine_stop_cs: 0000:00:02.0 bcs0: timed out on STOP_RING -> IDLE
> [   27.185304] kworker/-121       3.... 27096367us : __intel_gt_reset: 0000:00:02.0 engine_mask=2
> [   27.185345] kworker/-121       3.... 27097297us : intel_engine_cancel_stop_cs: 0000:00:02.0 bcs0: 

I see what happened here that quietly slipped by.

The reset failed. And since we didn't reset the engine, the inflight
tracking stays intact, hence why we do immediately attempt the reset
again.

We don't have the immediate fallback to a device reset here as we are in
the atomic engine-reset path.

Tricky.

> [   27.185388] kworker/-121       3.... 27097299us : execlists_reset_finish: 0000:00:02.0 bcs0: depth->1
> [   27.185440] kworker/-121       3d..2 27097348us : __i915_schedule: 0000:00:02.0 bcs0: bumping queue-priority-hint:1025 for rq:13:20, inflight:1c:47 prio 0
> [   27.185485] kworker/-121       3..s1 27097350us : execlists_reset: 0000:00:02.0 bcs0: reset for preemption time out
> [   27.185528] kworker/-121       3d.s2 27097454us : active_context: 0000:00:02.0 bcs0: ccid found at active:0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-12-03  7:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03  0:57 [Intel-gfx] [PATCH] drm/i915/gt: Clear the execlists timers before restarting Chris Wilson
2020-12-03  1:03 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for " Patchwork
2020-12-03  7:46 ` [Intel-gfx] [PATCH] " Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.