[PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking
@ 2017-03-27  3:28 ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-03-27  3:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Mika Kuoppala, # v4 . 10+

If the request->wa_tail is 0 (because it landed exactly on the end of
the ringbuffer), when we reconstruct request->tail following a reset we
fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
never able to catch up with RING_TAIL and the GPU spins endlessly. If
the ring contains a couple of breadcrumbs, even our hangcheck is unable
to catch the busy-looping as the ACTHD and seqno continually advance.

Fixes: a3aabe86a340 ("drm/i915/execlists: Reinitialise context image after GPU hang")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
---
 drivers/gpu/drm/i915/intel_lrc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3fdabba0a32d..c8dd848d2ebe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1302,6 +1302,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 
 	/* Reset WaIdleLiteRestore:bdw,skl as well */
 	request->tail = request->wa_tail - WA_TAIL_DWORDS * sizeof(u32);
+	request->tail &= request->ring->size - 1;
 	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking
@ 2017-03-27  3:28 ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-03-27  3:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala, # v4 . 10+

If the request->wa_tail is 0 (because it landed exactly on the end of
the ringbuffer), when we reconstruct request->tail following a reset we
fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
never able to catch up with RING_TAIL and the GPU spins endlessly. If
the ring contains a couple of breadcrumbs, even our hangcheck is unable
to catch the busy-looping as the ACTHD and seqno continually advance.

Fixes: a3aabe86a340 ("drm/i915/execlists: Reinitialise context image after GPU hang")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
---
 drivers/gpu/drm/i915/intel_lrc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3fdabba0a32d..c8dd848d2ebe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1302,6 +1302,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 
 	/* Reset WaIdleLiteRestore:bdw,skl as well */
 	request->tail = request->wa_tail - WA_TAIL_DWORDS * sizeof(u32);
+	request->tail &= request->ring->size - 1;
 	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
 }
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] drm/i915: Assert that the request->tail fits within the ring
  2017-03-27  3:28 ` Chris Wilson
  (?)
@ 2017-03-27  3:28 ` Chris Wilson
  -1 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-03-27  3:28 UTC (permalink / raw)
  To: intel-gfx

In addition to being qword-aligned, the RING_TAIL offset must be within
the ring!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 4 ++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c8dd848d2ebe..3b84fbb7da5d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -327,6 +327,7 @@ static u64 execlists_update_context(struct drm_i915_gem_request *rq)
 	u32 *reg_state = ce->lrc_reg_state;
 
 	GEM_BUG_ON(!IS_ALIGNED(rq->tail, 8));
+	GEM_BUG_ON(rq->tail >= rq->ring->size);
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
 
 	/* True 32b PPGTT with dynamic page allocation: update PDP
@@ -1304,6 +1305,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 	request->tail = request->wa_tail - WA_TAIL_DWORDS * sizeof(u32);
 	request->tail &= request->ring->size - 1;
 	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
+	GEM_BUG_ON(request->tail >= request->ring->size);
 }
 
 static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
@@ -1515,6 +1517,7 @@ static void gen8_emit_breadcrumb(struct drm_i915_gem_request *request, u32 *cs)
 	*cs++ = MI_NOOP;
 	request->tail = intel_ring_offset(request, cs);
 	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
+	GEM_BUG_ON(request->tail >= request->ring->size);
 
 	gen8_emit_wa_tail(request, cs);
 }
@@ -1543,6 +1546,7 @@ static void gen8_emit_breadcrumb_render(struct drm_i915_gem_request *request,
 	*cs++ = MI_NOOP;
 	request->tail = intel_ring_offset(request, cs);
 	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
+	GEM_BUG_ON(request->tail >= request->ring->size);
 
 	gen8_emit_wa_tail(request, cs);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4729ac7ac122..47921dcbedb3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -775,6 +775,7 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
 	i915_gem_request_submit(request);
 
 	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
+	GEM_BUG_ON(request->tail >= request->ring->size);
 	I915_WRITE_TAIL(request->engine, request->tail);
 }
 
@@ -787,6 +788,7 @@ static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 *cs)
 
 	req->tail = intel_ring_offset(req, cs);
 	GEM_BUG_ON(!IS_ALIGNED(req->tail, 8));
+	GEM_BUG_ON(req->tail >= req->ring->size);
 }
 
 static const int i9xx_emit_breadcrumb_sz = 4;
@@ -826,6 +828,7 @@ static void gen8_render_emit_breadcrumb(struct drm_i915_gem_request *req,
 
 	req->tail = intel_ring_offset(req, cs);
 	GEM_BUG_ON(!IS_ALIGNED(req->tail, 8));
+	GEM_BUG_ON(req->tail >= req->ring->size);
 }
 
 static const int gen8_render_emit_breadcrumb_sz = 8;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking
  2017-03-27  3:28 ` Chris Wilson
  (?)
  (?)
@ 2017-03-27  3:45 ` Patchwork
  -1 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2017-03-27  3:45 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking
URL   : https://patchwork.freedesktop.org/series/21907/
State : success

== Summary ==

Series 21907v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/21907/revisions/1/mbox/

Test gem_exec_fence:
        Subgroup await-hang-default:
                pass       -> INCOMPLETE (fi-ilk-650) fdo#100144
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-a:
                dmesg-warn -> PASS       (fi-byt-n2820) fdo#100126

fdo#100144 https://bugs.freedesktop.org/show_bug.cgi?id=100144
fdo#100126 https://bugs.freedesktop.org/show_bug.cgi?id=100126

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time: 455s
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time: 456s
fi-bsw-n3050     total:278  pass:239  dwarn:0   dfail:0   fail:0   skip:39  time: 580s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time: 540s
fi-bxt-t5700     total:278  pass:258  dwarn:0   dfail:0   fail:0   skip:20  time: 567s
fi-byt-j1900     total:278  pass:251  dwarn:0   dfail:0   fail:0   skip:27  time: 507s
fi-byt-n2820     total:278  pass:247  dwarn:0   dfail:0   fail:0   skip:31  time: 511s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time: 438s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time: 440s
fi-ilk-650       total:48   pass:27   dwarn:0   dfail:0   fail:0   skip:20  time: 0s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 515s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 496s
fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 485s
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time: 487s
fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time: 603s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time: 495s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time: 532s
fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time: 464s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time: 554s
fi-snb-2600      total:278  pass:249  dwarn:0   dfail:0   fail:0   skip:29  time: 414s

295dfed68b4cc63cdc1747a915e0099a32bf0955 drm-tip: 2017y-03m-26d-01h-26m-45s UTC integration manifest
06fca80 drm/i915: Assert that the request->tail fits within the ring
5022d1b drm/i915/execlists: Wrap tail pointer after reset tweaking

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4309/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after        reset tweaking
  2017-03-27  3:28 ` Chris Wilson
@ 2017-03-27 10:44   ` Mika Kuoppala
  -1 siblings, 0 replies; 12+ messages in thread
From: Mika Kuoppala @ 2017-03-27 10:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: # v4 . 10+

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If the request->wa_tail is 0 (because it landed exactly on the end of
> the ringbuffer), when we reconstruct request->tail following a reset we
> fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> never able to catch up with RING_TAIL and the GPU spins endlessly. If
> the ring contains a couple of breadcrumbs, even our hangcheck is unable
> to catch the busy-looping as the ACTHD and seqno continually advance.

Tail is past ring size (on hw) and the ring contents has seqno writes.
So we will replay the ring contents over and over and seqno advances
and wraps back to the first breadcrumbs in ring?

> Fixes: a3aabe86a340 ("drm/i915/execlists: Reinitialise context image after GPU hang")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: <stable@vger.kernel.org> # v4.10+
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3fdabba0a32d..c8dd848d2ebe 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1302,6 +1302,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
>  
>  	/* Reset WaIdleLiteRestore:bdw,skl as well */
>  	request->tail = request->wa_tail - WA_TAIL_DWORDS * sizeof(u32);
> +	request->tail &= request->ring->size - 1;
>  	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
>  }
>  
> -- 
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking
@ 2017-03-27 10:44   ` Mika Kuoppala
  0 siblings, 0 replies; 12+ messages in thread
From: Mika Kuoppala @ 2017-03-27 10:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: # v4 . 10+

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If the request->wa_tail is 0 (because it landed exactly on the end of
> the ringbuffer), when we reconstruct request->tail following a reset we
> fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> never able to catch up with RING_TAIL and the GPU spins endlessly. If
> the ring contains a couple of breadcrumbs, even our hangcheck is unable
> to catch the busy-looping as the ACTHD and seqno continually advance.

Tail is past ring size (on hw) and the ring contents has seqno writes.
So we will replay the ring contents over and over and seqno advances
and wraps back to the first breadcrumbs in ring?

> Fixes: a3aabe86a340 ("drm/i915/execlists: Reinitialise context image after GPU hang")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: <stable@vger.kernel.org> # v4.10+
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3fdabba0a32d..c8dd848d2ebe 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1302,6 +1302,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
>  
>  	/* Reset WaIdleLiteRestore:bdw,skl as well */
>  	request->tail = request->wa_tail - WA_TAIL_DWORDS * sizeof(u32);
> +	request->tail &= request->ring->size - 1;
>  	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
>  }
>  
> -- 
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after  reset tweaking
  2017-03-27 10:44   ` Mika Kuoppala
@ 2017-03-27 11:00     ` Chris Wilson
  -1 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-03-27 11:00 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, # v4 . 10+

On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If the request->wa_tail is 0 (because it landed exactly on the end of
> > the ringbuffer), when we reconstruct request->tail following a reset we
> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
> > to catch the busy-looping as the ACTHD and seqno continually advance.
> 
> Tail is past ring size (on hw) and the ring contents has seqno writes.
> So we will replay the ring contents over and over and seqno advances
> and wraps back to the first breadcrumbs in ring?

Yup. It was most confusing to watch. The execlist_port[] was static,
RING_START was static, yet the seqno kept changing. I felt like I was
hallucinating. That or insomnia.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking
@ 2017-03-27 11:00     ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-03-27 11:00 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, # v4 . 10+

On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If the request->wa_tail is 0 (because it landed exactly on the end of
> > the ringbuffer), when we reconstruct request->tail following a reset we
> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
> > to catch the busy-looping as the ACTHD and seqno continually advance.
> 
> Tail is past ring size (on hw) and the ring contents has seqno writes.
> So we will replay the ring contents over and over and seqno advances
> and wraps back to the first breadcrumbs in ring?

Yup. It was most confusing to watch. The execlist_port[] was static,
RING_START was static, yet the seqno kept changing. I felt like I was
hallucinating. That or insomnia.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after        reset tweaking
  2017-03-27 11:00     ` Chris Wilson
@ 2017-03-27 11:07       ` Mika Kuoppala
  -1 siblings, 0 replies; 12+ messages in thread
From: Mika Kuoppala @ 2017-03-27 11:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, # v4 . 10+

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > If the request->wa_tail is 0 (because it landed exactly on the end of
>> > the ringbuffer), when we reconstruct request->tail following a reset we
>> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
>> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
>> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
>> > to catch the busy-looping as the ACTHD and seqno continually advance.
>> 
>> Tail is past ring size (on hw) and the ring contents has seqno writes.
>> So we will replay the ring contents over and over and seqno advances
>> and wraps back to the first breadcrumbs in ring?
>
> Yup. It was most confusing to watch. The execlist_port[] was static,
> RING_START was static, yet the seqno kept changing. I felt like I was
> hallucinating. That or insomnia.

/o\

When we reset_common_ring() it is always after a hw reset. So the
'last' in sense of hardware's lrc contexts doesn't mean much.

So can we actually get rid of the tail trickery as for first
request after reset, as the lite restore can't happen and
should not matter?

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking
@ 2017-03-27 11:07       ` Mika Kuoppala
  0 siblings, 0 replies; 12+ messages in thread
From: Mika Kuoppala @ 2017-03-27 11:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, # v4 . 10+

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > If the request->wa_tail is 0 (because it landed exactly on the end of
>> > the ringbuffer), when we reconstruct request->tail following a reset we
>> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
>> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
>> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
>> > to catch the busy-looping as the ACTHD and seqno continually advance.
>> 
>> Tail is past ring size (on hw) and the ring contents has seqno writes.
>> So we will replay the ring contents over and over and seqno advances
>> and wraps back to the first breadcrumbs in ring?
>
> Yup. It was most confusing to watch. The execlist_port[] was static,
> RING_START was static, yet the seqno kept changing. I felt like I was
> hallucinating. That or insomnia.

/o\

When we reset_common_ring() it is always after a hw reset. So the
'last' in sense of hardware's lrc contexts doesn't mean much.

So can we actually get rid of the tail trickery as for first
request after reset, as the lite restore can't happen and
should not matter?

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after  reset tweaking
  2017-03-27 11:07       ` Mika Kuoppala
@ 2017-03-27 11:19         ` Chris Wilson
  -1 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-03-27 11:19 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, # v4 . 10+

On Mon, Mar 27, 2017 at 02:07:09PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> 
> >> > If the request->wa_tail is 0 (because it landed exactly on the end of
> >> > the ringbuffer), when we reconstruct request->tail following a reset we
> >> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> >> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
> >> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
> >> > to catch the busy-looping as the ACTHD and seqno continually advance.
> >> 
> >> Tail is past ring size (on hw) and the ring contents has seqno writes.
> >> So we will replay the ring contents over and over and seqno advances
> >> and wraps back to the first breadcrumbs in ring?
> >
> > Yup. It was most confusing to watch. The execlist_port[] was static,
> > RING_START was static, yet the seqno kept changing. I felt like I was
> > hallucinating. That or insomnia.
> 
> /o\
> 
> When we reset_common_ring() it is always after a hw reset. So the
> 'last' in sense of hardware's lrc contexts doesn't mean much.
> 
> So can we actually get rid of the tail trickery as for first
> request after reset, as the lite restore can't happen and
> should not matter?

So move handling the rare case into the latency sensitive hotpath? ;)

The complaint I feel is that we don't have a great interface, otoh this
manipulation is currently a one-off.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking
@ 2017-03-27 11:19         ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-03-27 11:19 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, # v4 . 10+

On Mon, Mar 27, 2017 at 02:07:09PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> 
> >> > If the request->wa_tail is 0 (because it landed exactly on the end of
> >> > the ringbuffer), when we reconstruct request->tail following a reset we
> >> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> >> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
> >> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
> >> > to catch the busy-looping as the ACTHD and seqno continually advance.
> >> 
> >> Tail is past ring size (on hw) and the ring contents has seqno writes.
> >> So we will replay the ring contents over and over and seqno advances
> >> and wraps back to the first breadcrumbs in ring?
> >
> > Yup. It was most confusing to watch. The execlist_port[] was static,
> > RING_START was static, yet the seqno kept changing. I felt like I was
> > hallucinating. That or insomnia.
> 
> /o\
> 
> When we reset_common_ring() it is always after a hw reset. So the
> 'last' in sense of hardware's lrc contexts doesn't mean much.
> 
> So can we actually get rid of the tail trickery as for first
> request after reset, as the lite restore can't happen and
> should not matter?

So move handling the rare case into the latency sensitive hotpath? ;)

The complaint I feel is that we don't have a great interface, otoh this
manipulation is currently a one-off.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-03-27 11:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-27  3:28 [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking Chris Wilson
2017-03-27  3:28 ` Chris Wilson
2017-03-27  3:28 ` [PATCH 2/2] drm/i915: Assert that the request->tail fits within the ring Chris Wilson
2017-03-27  3:45 ` ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking Patchwork
2017-03-27 10:44 ` [Intel-gfx] [PATCH 1/2] " Mika Kuoppala
2017-03-27 10:44   ` Mika Kuoppala
2017-03-27 11:00   ` [Intel-gfx] " Chris Wilson
2017-03-27 11:00     ` Chris Wilson
2017-03-27 11:07     ` [Intel-gfx] " Mika Kuoppala
2017-03-27 11:07       ` Mika Kuoppala
2017-03-27 11:19       ` [Intel-gfx] " Chris Wilson
2017-03-27 11:19         ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.