[PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts
@ 2018-03-27 21:01 Chris Wilson
  2018-03-27 21:54 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Chris Wilson @ 2018-03-27 21:01 UTC (permalink / raw)
  To: intel-gfx

Tvrtko uncovered a fun issue with recovering from a wedge device. In his
tests, he wedged the driver by injecting an unrecoverable hang whilst a
batch was spinning. As we reset the gpu in the middle of the spinner,
when resumed it would continue on from the next instruction in the ring
and write it's breadcrumb. However, on wedging we updated our
bookkeeping to indicate that the GPU had completed executing and would
restart from after the breadcrumb; so the emission of the stale
breadcrumb from before the reset came as a bit of a surprise.

A simple fix is to when rebinding the context into the GPU, we update
the ring register state in the context image to match our bookkeeping.
We already have to update the RING_START and RING_TAIL, so updating
RING_HEAD as well is trivial. This works because whenever we unbind the
context, we keep the bookkeeping in check; and on wedging we unbind all
contexts.

Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ba7f7831f934..654634254b64 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1272,6 +1272,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
 	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
 	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
 		i915_ggtt_offset(ce->ring->vma);
+	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;

 	ce->state->obj->pin_global++;
 	i915_gem_context_get(ctx);
-- 
2.16.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915/execlists: Reset ring registers on rebinding contexts
  2018-03-27 21:01 [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts Chris Wilson
@ 2018-03-27 21:54 ` Patchwork
  2018-03-27 22:10 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2018-03-27 21:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/execlists: Reset ring registers on rebinding contexts
URL   : https://patchwork.freedesktop.org/series/40763/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
ad09e66d7bde drm/i915/execlists: Reset ring registers on rebinding contexts
-:36: CHECK:SPACING: spaces preferred around that '+' (ctx:VxV)
#36: FILE: drivers/gpu/drm/i915/intel_lrc.c:1275:
+	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
 	                               ^

total: 0 errors, 0 warnings, 1 checks, 7 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915/execlists: Reset ring registers on rebinding contexts
  2018-03-27 21:01 [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts Chris Wilson
  2018-03-27 21:54 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
@ 2018-03-27 22:10 ` Patchwork
  2018-03-28  7:07 ` ✗ Fi.CI.IGT: failure " Patchwork
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2018-03-27 22:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/execlists: Reset ring registers on rebinding contexts
URL   : https://patchwork.freedesktop.org/series/40763/
State : success

== Summary ==

Series 40763v1 drm/i915/execlists: Reset ring registers on rebinding contexts
https://patchwork.freedesktop.org/api/1.0/series/40763/revisions/1/mbox/

---- Known issues:

Test kms_chamelium:
        Subgroup dp-edid-read:
                pass       -> FAIL       (fi-kbl-7500u) fdo#102505
Test kms_flip:
        Subgroup basic-flip-vs-wf_vblank:
                pass       -> FAIL       (fi-skl-6770hq) fdo#100368
Test kms_pipe_crc_basic:
        Subgroup read-crc-pipe-c-frame-sequence:
                pass       -> FAIL       (fi-skl-6770hq) fdo#103481

fdo#102505 https://bugs.freedesktop.org/show_bug.cgi?id=102505
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481

fi-bdw-5557u     total:285  pass:264  dwarn:0   dfail:0   fail:0   skip:21  time:432s
fi-bdw-gvtdvm    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:447s
fi-blb-e6850     total:285  pass:220  dwarn:1   dfail:0   fail:0   skip:64  time:381s
fi-bsw-n3050     total:285  pass:239  dwarn:0   dfail:0   fail:0   skip:46  time:540s
fi-bwr-2160      total:285  pass:180  dwarn:0   dfail:0   fail:0   skip:105 time:297s
fi-bxt-j4205     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:514s
fi-byt-j1900     total:285  pass:250  dwarn:0   dfail:0   fail:0   skip:35  time:522s
fi-byt-n2820     total:285  pass:246  dwarn:0   dfail:0   fail:0   skip:39  time:512s
fi-cfl-8700k     total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:413s
fi-cfl-s3        total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:570s
fi-cfl-u         total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:513s
fi-cnl-y3        total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:590s
fi-elk-e7500     total:285  pass:225  dwarn:1   dfail:0   fail:0   skip:59  time:428s
fi-gdg-551       total:285  pass:176  dwarn:0   dfail:0   fail:1   skip:108 time:326s
fi-glk-1         total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:540s
fi-hsw-4770      total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:405s
fi-ilk-650       total:285  pass:225  dwarn:0   dfail:0   fail:0   skip:60  time:425s
fi-ivb-3520m     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:470s
fi-ivb-3770      total:285  pass:252  dwarn:0   dfail:0   fail:0   skip:33  time:433s
fi-kbl-7500u     total:285  pass:259  dwarn:1   dfail:0   fail:1   skip:24  time:472s
fi-kbl-7567u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:469s
fi-kbl-r         total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:515s
fi-pnv-d510      total:285  pass:219  dwarn:1   dfail:0   fail:0   skip:65  time:662s
fi-skl-6260u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:442s
fi-skl-6600u     total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:534s
fi-skl-6700k2    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:503s
fi-skl-6770hq    total:285  pass:263  dwarn:0   dfail:0   fail:2   skip:20  time:492s
fi-skl-guc       total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:432s
fi-skl-gvtdvm    total:285  pass:262  dwarn:0   dfail:0   fail:0   skip:23  time:444s
fi-snb-2520m     total:285  pass:245  dwarn:0   dfail:0   fail:0   skip:40  time:594s
Blacklisted hosts:
fi-cnl-psr       total:285  pass:256  dwarn:3   dfail:0   fail:0   skip:26  time:544s
fi-glk-j4005     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:486s

0539b52e05cd0abe697d45f2a2373ec42af7ebcb drm-tip: 2018y-03m-27d-18h-45m-40s UTC integration manifest
ad09e66d7bde drm/i915/execlists: Reset ring registers on rebinding contexts

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8508/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915/execlists: Reset ring registers on rebinding contexts
  2018-03-27 21:01 [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts Chris Wilson
  2018-03-27 21:54 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
  2018-03-27 22:10 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-03-28  7:07 ` Patchwork
  2018-03-28 19:30   ` Chris Wilson
  2018-03-28  7:58 ` [PATCH] " Mika Kuoppala
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Patchwork @ 2018-03-28  7:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/execlists: Reset ring registers on rebinding contexts
URL   : https://patchwork.freedesktop.org/series/40763/
State : failure

== Summary ==

---- Possible new issues:

Test kms_flip:
        Subgroup flip-vs-blocking-wf-vblank:
                pass       -> FAIL       (shard-hsw)

---- Known issues:

Test kms_cursor_crc:
        Subgroup cursor-128x128-suspend:
                dmesg-warn -> PASS       (shard-snb) fdo#102365
        Subgroup cursor-64x64-suspend:
                pass       -> INCOMPLETE (shard-hsw) fdo#103540
Test kms_cursor_legacy:
        Subgroup flip-vs-cursor-atomic:
                pass       -> FAIL       (shard-hsw) fdo#102670
Test kms_flip:
        Subgroup 2x-dpms-vs-vblank-race-interruptible:
                fail       -> PASS       (shard-hsw) fdo#103060
        Subgroup 2x-flip-vs-expired-vblank-interruptible:
                fail       -> PASS       (shard-hsw) fdo#102887 +1
        Subgroup plain-flip-fb-recreate-interruptible:
                pass       -> FAIL       (shard-hsw) fdo#100368
Test kms_plane:
        Subgroup plane-panning-bottom-right-pipe-a-planes:
                pass       -> FAIL       (shard-apl) fdo#103166
Test kms_rotation_crc:
        Subgroup sprite-rotation-180:
                pass       -> FAIL       (shard-snb) fdo#103925
Test perf:
        Subgroup polling:
                fail       -> PASS       (shard-hsw) fdo#102252

fdo#102365 https://bugs.freedesktop.org/show_bug.cgi?id=102365
fdo#103540 https://bugs.freedesktop.org/show_bug.cgi?id=103540
fdo#102670 https://bugs.freedesktop.org/show_bug.cgi?id=102670
fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
fdo#103925 https://bugs.freedesktop.org/show_bug.cgi?id=103925
fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252

shard-apl        total:3495 pass:1830 dwarn:1   dfail:0   fail:8   skip:1655 time:12914s
shard-hsw        total:3478 pass:1773 dwarn:1   dfail:0   fail:4   skip:1698 time:10973s
shard-snb        total:3495 pass:1373 dwarn:1   dfail:0   fail:4   skip:2117 time:6966s
Blacklisted hosts:
shard-kbl        total:3475 pass:1943 dwarn:2   dfail:0   fail:9   skip:1520 time:9497s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8508/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts
  2018-03-27 21:01 [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts Chris Wilson
                   ` (2 preceding siblings ...)
  2018-03-28  7:07 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2018-03-28  7:58 ` Mika Kuoppala
  2018-03-28  8:32   ` Chris Wilson
  2018-03-28 10:27 ` Tvrtko Ursulin
  2018-03-28 16:26 ` Tvrtko Ursulin
  5 siblings, 1 reply; 10+ messages in thread
From: Mika Kuoppala @ 2018-03-28  7:58 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Tvrtko uncovered a fun issue with recovering from a wedge device. In his
> tests, he wedged the driver by injecting an unrecoverable hang whilst a
> batch was spinning. As we reset the gpu in the middle of the spinner,
> when resumed it would continue on from the next instruction in the ring
> and write it's breadcrumb. However, on wedging we updated our
> bookkeeping to indicate that the GPU had completed executing and would
> restart from after the breadcrumb; so the emission of the stale
> breadcrumb from before the reset came as a bit of a surprise.
>

Ok trying to make sense of the above and how the wedging works.
Here is my assertions.

The spinning batch was never found to be guilty of anything.
On wedge we fast forwarded all engine seqnos to be what
was last submitted.
We did hw reset.
On context image, the RING_HEAD was pointing to bb start
of spin batch (or the instruction after it)
On resubmitting the context, we saw a seqno write from pre
reset era.

So this doesn't affect only spinning batches but any busy
batch that was running while we wedged?

-Mika

> A simple fix is to when rebinding the context into the GPU, we update
> the ring register state in the context image to match our bookkeeping.
> We already have to update the RING_START and RING_TAIL, so updating
> RING_HEAD as well is trivial. This works because whenever we unbind the
> context, we keep the bookkeeping in check; and on wedging we unbind all
> contexts.
>
> Testcase: igt/gem_eio
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ba7f7831f934..654634254b64 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1272,6 +1272,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
>  	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>  	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
>  		i915_ggtt_offset(ce->ring->vma);
> +	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
>  
>  	ce->state->obj->pin_global++;
>  	i915_gem_context_get(ctx);
> -- 
> 2.16.3
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts
  2018-03-28  7:58 ` [PATCH] " Mika Kuoppala
@ 2018-03-28  8:32   ` Chris Wilson
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2018-03-28  8:32 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-03-28 08:58:38)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Tvrtko uncovered a fun issue with recovering from a wedge device. In his
> > tests, he wedged the driver by injecting an unrecoverable hang whilst a
> > batch was spinning. As we reset the gpu in the middle of the spinner,
> > when resumed it would continue on from the next instruction in the ring
> > and write it's breadcrumb. However, on wedging we updated our
> > bookkeeping to indicate that the GPU had completed executing and would
> > restart from after the breadcrumb; so the emission of the stale
> > breadcrumb from before the reset came as a bit of a surprise.
> >
> 
> Ok trying to make sense of the above and how the wedging works.
> Here is my assertions.
> 
> The spinning batch was never found to be guilty of anything.

It was definitely guilty.

> On wedge we fast forwarded all engine seqnos to be what
> was last submitted.

Correct.

> We did hw reset.

Correct.

> On context image, the RING_HEAD was pointing to bb start
> of spin batch (or the instruction after it)

Instruction after.

> On resubmitting the context, we saw a seqno write from pre
> reset era.

Correct.
 
> So this doesn't affect only spinning batches but any busy
> batch that was running while we wedged?

Correct. Any execlists recovery from _wedged_ would be prone to hitting
this bug. legacy submission already applies the ring registers reset on
recovery.

Thinking of which, if we could, we should ban all contexts on wedging?
Or at least process the ban accounting for a failed reset. That sounds
more plausible (set_wedge() is a nasty lockless affair).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts
  2018-03-27 21:01 [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts Chris Wilson
                   ` (3 preceding siblings ...)
  2018-03-28  7:58 ` [PATCH] " Mika Kuoppala
@ 2018-03-28 10:27 ` Tvrtko Ursulin
  2018-03-28 16:26 ` Tvrtko Ursulin
  5 siblings, 0 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2018-03-28 10:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 27/03/2018 22:01, Chris Wilson wrote:
> Tvrtko uncovered a fun issue with recovering from a wedge device. In his
> tests, he wedged the driver by injecting an unrecoverable hang whilst a
> batch was spinning. As we reset the gpu in the middle of the spinner,
> when resumed it would continue on from the next instruction in the ring
> and write it's breadcrumb. However, on wedging we updated our
> bookkeeping to indicate that the GPU had completed executing and would
> restart from after the breadcrumb; so the emission of the stale
> breadcrumb from before the reset came as a bit of a surprise.
> 
> A simple fix is to when rebinding the context into the GPU, we update
> the ring register state in the context image to match our bookkeeping.
> We already have to update the RING_START and RING_TAIL, so updating
> RING_HEAD as well is trivial. This works because whenever we unbind the
> context, we keep the bookkeeping in check; and on wedging we unbind all
> contexts.
> 
> Testcase: igt/gem_eio
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Tested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

More tags when I am done reading the code - since this is very interesting.

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ba7f7831f934..654634254b64 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1272,6 +1272,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
>   	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>   	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
>   		i915_ggtt_offset(ce->ring->vma);
> +	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
>   
>   	ce->state->obj->pin_global++;
>   	i915_gem_context_get(ctx);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts
  2018-03-27 21:01 [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts Chris Wilson
                   ` (4 preceding siblings ...)
  2018-03-28 10:27 ` Tvrtko Ursulin
@ 2018-03-28 16:26 ` Tvrtko Ursulin
  2018-03-28 16:36   ` Chris Wilson
  5 siblings, 1 reply; 10+ messages in thread
From: Tvrtko Ursulin @ 2018-03-28 16:26 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 27/03/2018 22:01, Chris Wilson wrote:
> Tvrtko uncovered a fun issue with recovering from a wedge device. In his
> tests, he wedged the driver by injecting an unrecoverable hang whilst a
> batch was spinning. As we reset the gpu in the middle of the spinner,
> when resumed it would continue on from the next instruction in the ring
> and write it's breadcrumb. However, on wedging we updated our
> bookkeeping to indicate that the GPU had completed executing and would
> restart from after the breadcrumb; so the emission of the stale
> breadcrumb from before the reset came as a bit of a surprise.
> 
> A simple fix is to when rebinding the context into the GPU, we update
> the ring register state in the context image to match our bookkeeping.
> We already have to update the RING_START and RING_TAIL, so updating
> RING_HEAD as well is trivial. This works because whenever we unbind the
> context, we keep the bookkeeping in check; and on wedging we unbind all
> contexts.
> 
> Testcase: igt/gem_eio
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ba7f7831f934..654634254b64 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1272,6 +1272,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
>   	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>   	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
>   		i915_ggtt_offset(ce->ring->vma);
> +	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
>   
>   	ce->state->obj->pin_global++;
>   	i915_gem_context_get(ctx);
> 

After quite some amount of walking trough the code, looking at traces 
and chatting on IRC:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts
  2018-03-28 16:26 ` Tvrtko Ursulin
@ 2018-03-28 16:36   ` Chris Wilson
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2018-03-28 16:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-03-28 17:26:37)
> 
> On 27/03/2018 22:01, Chris Wilson wrote:
> > Tvrtko uncovered a fun issue with recovering from a wedge device. In his
> > tests, he wedged the driver by injecting an unrecoverable hang whilst a
> > batch was spinning. As we reset the gpu in the middle of the spinner,
> > when resumed it would continue on from the next instruction in the ring
> > and write it's breadcrumb. However, on wedging we updated our
> > bookkeeping to indicate that the GPU had completed executing and would
> > restart from after the breadcrumb; so the emission of the stale
> > breadcrumb from before the reset came as a bit of a surprise.
> > 
> > A simple fix is to when rebinding the context into the GPU, we update
> > the ring register state in the context image to match our bookkeeping.
> > We already have to update the RING_START and RING_TAIL, so updating
> > RING_HEAD as well is trivial. This works because whenever we unbind the
> > context, we keep the bookkeeping in check; and on wedging we unbind all
> > contexts.
> > 
> > Testcase: igt/gem_eio
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index ba7f7831f934..654634254b64 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1272,6 +1272,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
> >       ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
> >       ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
> >               i915_ggtt_offset(ce->ring->vma);
> > +     ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
> >   
> >       ce->state->obj->pin_global++;
> >       i915_gem_context_get(ctx);
> > 
> 
> After quite some amount of walking trough the code, looking at traces 
> and chatting on IRC:
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

I feel like it is one of those that is going to be asked about in 6
months time and I'll have to admit the shameful secret. Smoke and
mirrors, smoke and mirrors.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ✗ Fi.CI.IGT: failure for drm/i915/execlists: Reset ring registers on rebinding contexts
  2018-03-28  7:07 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2018-03-28 19:30   ` Chris Wilson
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2018-03-28 19:30 UTC (permalink / raw)
  To: Patchwork; +Cc: intel-gfx

Quoting Patchwork (2018-03-28 08:07:45)
> == Series Details ==
> 
> Series: drm/i915/execlists: Reset ring registers on rebinding contexts
> URL   : https://patchwork.freedesktop.org/series/40763/
> State : failure
> 
> == Summary ==
> 
> ---- Possible new issues:
> 
> Test kms_flip:
>         Subgroup flip-vs-blocking-wf-vblank:
>                 pass       -> FAIL       (shard-hsw)

Tvrtko, thankyou for such an interesting bug, now please stop messing
around with gem_eio ;)

Pushed,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-28 19:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-27 21:01 [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts Chris Wilson
2018-03-27 21:54 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2018-03-27 22:10 ` ✓ Fi.CI.BAT: success " Patchwork
2018-03-28  7:07 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-03-28 19:30   ` Chris Wilson
2018-03-28  7:58 ` [PATCH] " Mika Kuoppala
2018-03-28  8:32   ` Chris Wilson
2018-03-28 10:27 ` Tvrtko Ursulin
2018-03-28 16:26 ` Tvrtko Ursulin
2018-03-28 16:36   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.