All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: fix up semaphore_waits_for
@ 2014-03-14 23:08 Daniel Vetter
  2014-03-14 23:08 ` [PATCH 2/2] drm/i915: Add FIXME for bdw semaphore detection in hancheck Daniel Vetter
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-03-14 23:08 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky, Mika Kuoppala

There's an entire pile of issues in here:

- Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
  offset of the batch buffer when a batch is executed. Semaphores are
  always emitted to the main ring, so we always want to look at that.

- Mask the obtained HEAD pointer with the actual ring size, which is
  much smaller. Together with the above issue this resulted us in
  trying to dereference a pointer way outside of the ring mmio
  mapping. The resulting invalid access in interrupt context
  (hangcheck is executed from timers) lead to a full blown kernel
  panic. The fbcon panic handler then tried to frob our driver harder,
  resulting in a full machine hang at least on my snb here where I've
  stumbled over this.

- Handle ring wrapping correctly and be a bit more explicit about how
  many dwords we're scanning. We probably should also scan more than
  just 4 ...

- Space out some of teh computations for readability.

This prevents hard-hangs on my snb here. Verdict from QA is still
pending, but the symptoms match.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_irq.c | 38 ++++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index be2713f12e08..473372a6c97d 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2529,29 +2529,43 @@ static struct intel_ring_buffer *
 semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	u32 cmd, ipehr, acthd, acthd_min;
+	u32 cmd, ipehr, head;
+	int i;
 
 	ipehr = I915_READ(RING_IPEHR(ring->mmio_base));
 	if ((ipehr & ~(0x3 << 16)) !=
 	    (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | MI_SEMAPHORE_REGISTER))
 		return NULL;
 
-	/* ACTHD is likely pointing to the dword after the actual command,
-	 * so scan backwards until we find the MBOX.
+	/*
+	 * HEAD is likely pointing to the dword after the actual command,
+	 * so scan backwards until we find the MBOX. But limit it to just 3
+	 * dwords. Note that we don't care about ACTHD here since that might
+	 * point at at batch, and semaphores are always emitted into the
+	 * ringbuffer itself.
 	 */
-	acthd = intel_ring_get_active_head(ring) & HEAD_ADDR;
-	acthd_min = max((int)acthd - 3 * 4, 0);
-	do {
-		cmd = ioread32(ring->virtual_start + acthd);
+	head = I915_READ_HEAD(ring) & HEAD_ADDR;
+
+	for (i = 4; i; --i) {
+		/*
+		 * Be paranoid and presume the hw has gone off into the wild -
+		 * our ring is smaller than what the hardware (and hence
+		 * HEAD_ADDR) allows. Also handles wrap-around.
+		 */
+		head &= ring->size - 1;
+
+		/* This here seems to blow up */
+		cmd = ioread32(ring->virtual_start + head);
 		if (cmd == ipehr)
 			break;
 
-		acthd -= 4;
-		if (acthd < acthd_min)
-			return NULL;
-	} while (1);
+		head -= 4;
+	}
+
+	if (!i)
+		return NULL;
 
-	*seqno = ioread32(ring->virtual_start+acthd+4)+1;
+	*seqno = ioread32(ring->virtual_start + head + 4) + 1;
 	return &dev_priv->ring[(ring->id + (((ipehr >> 17) & 1) + 1)) % 3];
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] drm/i915: Add FIXME for bdw semaphore detection in hancheck
  2014-03-14 23:08 [PATCH 1/2] drm/i915: fix up semaphore_waits_for Daniel Vetter
@ 2014-03-14 23:08 ` Daniel Vetter
  2014-03-15  0:38   ` Ben Widawsky
  2014-03-15 15:46 ` [PATCH 1/2] drm/i915: fix up semaphore_waits_for Chris Wilson
  2014-03-21 17:33 ` Mika Kuoppala
  2 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-03-14 23:08 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

Currently not an issue since we don't emit sempahores, but better
not forget about those.

As a little prep work extract the ipehr decoding for cleaner control
flow. And apply a bit of polish.

Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_irq.c | 20 ++++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h |  3 ++-
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 473372a6c97d..0f3a6d791502 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2525,6 +2525,23 @@ ring_idle(struct intel_ring_buffer *ring, u32 seqno)
 		i915_seqno_passed(seqno, ring_last_seqno(ring)));
 }
 
+static bool
+ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr)
+{
+	if (INTEL_INFO(dev)->gen >= 8) {
+		/*
+		 * FIXME: gen8 semaphore support - currently we don't emit
+		 * semaphores on bdw anyway, but this needs to be addressed when
+		 * we merge that code.
+		 */
+		return false;
+	} else {
+		ipehr &= ~MI_SEMAPHORE_SYNC_MASK;
+		return ipehr == (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE |
+				 MI_SEMAPHORE_REGISTER);
+	}
+}
+
 static struct intel_ring_buffer *
 semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
 {
@@ -2533,8 +2550,7 @@ semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
 	int i;
 
 	ipehr = I915_READ(RING_IPEHR(ring->mmio_base));
-	if ((ipehr & ~(0x3 << 16)) !=
-	    (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | MI_SEMAPHORE_REGISTER))
+	if (!ipehr_is_semaphore_wait(ring->dev, ipehr))
 		return NULL;
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 146609ab42bb..23267859156f 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -244,7 +244,8 @@
 #define   MI_SEMAPHORE_SYNC_BVE	    (0<<16) /* VECS wait for BCS  (VEBSYNC) */
 #define   MI_SEMAPHORE_SYNC_VVE	    (1<<16) /* VECS wait for VCS  (VEVSYNC) */
 #define   MI_SEMAPHORE_SYNC_RVE	    (2<<16) /* VECS wait for RCS  (VERSYNC) */
-#define   MI_SEMAPHORE_SYNC_INVALID  (3<<16)
+#define   MI_SEMAPHORE_SYNC_INVALID (3<<16)
+#define   MI_SEMAPHORE_SYNC_MASK    (3<<16)
 #define MI_SET_CONTEXT		MI_INSTR(0x18, 0)
 #define   MI_MM_SPACE_GTT		(1<<8)
 #define   MI_MM_SPACE_PHYSICAL		(0<<8)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Add FIXME for bdw semaphore detection in hancheck
  2014-03-14 23:08 ` [PATCH 2/2] drm/i915: Add FIXME for bdw semaphore detection in hancheck Daniel Vetter
@ 2014-03-15  0:38   ` Ben Widawsky
  2014-03-15 11:44     ` Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Ben Widawsky @ 2014-03-15  0:38 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development

On Sat, Mar 15, 2014 at 12:08:56AM +0100, Daniel Vetter wrote:
> Currently not an issue since we don't emit sempahores, but better
> not forget about those.
> 
> As a little prep work extract the ipehr decoding for cleaner control
> flow. And apply a bit of polish.
> 
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

I've made a note to add the relevant BDW patch already.

one comment below.
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 20 ++++++++++++++++++--
>  drivers/gpu/drm/i915/i915_reg.h |  3 ++-
>  2 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 473372a6c97d..0f3a6d791502 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2525,6 +2525,23 @@ ring_idle(struct intel_ring_buffer *ring, u32 seqno)
>  		i915_seqno_passed(seqno, ring_last_seqno(ring)));
>  }
>  
> +static bool
> +ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr)
> +{
> +	if (INTEL_INFO(dev)->gen >= 8) {
> +		/*
> +		 * FIXME: gen8 semaphore support - currently we don't emit
> +		 * semaphores on bdw anyway, but this needs to be addressed when
> +		 * we merge that code.
> +		 */
> +		return false;
> +	} else {

If you wanted to be paranoid:
WARN_ON((ipehr & MI_SEMAPHORE_SYNC_MASK) == MI_SEMAPHORE_SYNC_INVALID)

> +		ipehr &= ~MI_SEMAPHORE_SYNC_MASK;
> +		return ipehr == (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE |
> +				 MI_SEMAPHORE_REGISTER);
> +	}
> +}
> +
>  static struct intel_ring_buffer *
>  semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
>  {
> @@ -2533,8 +2550,7 @@ semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
>  	int i;
>  
>  	ipehr = I915_READ(RING_IPEHR(ring->mmio_base));
> -	if ((ipehr & ~(0x3 << 16)) !=
> -	    (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | MI_SEMAPHORE_REGISTER))
> +	if (!ipehr_is_semaphore_wait(ring->dev, ipehr))
>  		return NULL;
>  
>  	/*
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 146609ab42bb..23267859156f 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -244,7 +244,8 @@
>  #define   MI_SEMAPHORE_SYNC_BVE	    (0<<16) /* VECS wait for BCS  (VEBSYNC) */
>  #define   MI_SEMAPHORE_SYNC_VVE	    (1<<16) /* VECS wait for VCS  (VEVSYNC) */
>  #define   MI_SEMAPHORE_SYNC_RVE	    (2<<16) /* VECS wait for RCS  (VERSYNC) */
> -#define   MI_SEMAPHORE_SYNC_INVALID  (3<<16)
> +#define   MI_SEMAPHORE_SYNC_INVALID (3<<16)
> +#define   MI_SEMAPHORE_SYNC_MASK    (3<<16)
>  #define MI_SET_CONTEXT		MI_INSTR(0x18, 0)
>  #define   MI_MM_SPACE_GTT		(1<<8)
>  #define   MI_MM_SPACE_PHYSICAL		(0<<8)
> -- 
> 1.8.1.4
> 

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Add FIXME for bdw semaphore detection in hancheck
  2014-03-15  0:38   ` Ben Widawsky
@ 2014-03-15 11:44     ` Daniel Vetter
  2014-03-18  9:26       ` [PATCH] drm/i915: make semaphore signaller detection more robust Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-03-15 11:44 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, Intel Graphics Development

On Fri, Mar 14, 2014 at 05:38:55PM -0700, Ben Widawsky wrote:
> On Sat, Mar 15, 2014 at 12:08:56AM +0100, Daniel Vetter wrote:
> > Currently not an issue since we don't emit sempahores, but better
> > not forget about those.
> > 
> > As a little prep work extract the ipehr decoding for cleaner control
> > flow. And apply a bit of polish.
> > 
> > Cc: Ben Widawsky <ben@bwidawsk.net>
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
> 
> I've made a note to add the relevant BDW patch already.

Thanks. I'll pull it in as soon as the previous bugfix patch is reviewed
by Mika/Chris. Btw I've almost written the bdw patch myself (it's just two
lines or so + adjusting the number of dwords we scan back) but then
realized that the #defines I'm adding will needlessly conflict with your
series, so let it be.

> one comment below.
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 20 ++++++++++++++++++--
> >  drivers/gpu/drm/i915/i915_reg.h |  3 ++-
> >  2 files changed, 20 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 473372a6c97d..0f3a6d791502 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -2525,6 +2525,23 @@ ring_idle(struct intel_ring_buffer *ring, u32 seqno)
> >  		i915_seqno_passed(seqno, ring_last_seqno(ring)));
> >  }
> >  
> > +static bool
> > +ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr)
> > +{
> > +	if (INTEL_INFO(dev)->gen >= 8) {
> > +		/*
> > +		 * FIXME: gen8 semaphore support - currently we don't emit
> > +		 * semaphores on bdw anyway, but this needs to be addressed when
> > +		 * we merge that code.
> > +		 */
> > +		return false;
> > +	} else {
> 
> If you wanted to be paranoid:
> WARN_ON((ipehr & MI_SEMAPHORE_SYNC_MASK) == MI_SEMAPHORE_SYNC_INVALID)

Good idea but wrong spot. At the end of semaphore_waits_for we use the
SYNC_MASK to compute the ring pointer of the signaller. That's completely
unchecked code (and full of magic stuff). I'll throw another patch on top
to extract this and add a few paranoid checks in the spirit of your
suggestion above.

Cheers, Daniel

> 
> > +		ipehr &= ~MI_SEMAPHORE_SYNC_MASK;
> > +		return ipehr == (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE |
> > +				 MI_SEMAPHORE_REGISTER);
> > +	}
> > +}
> > +
> >  static struct intel_ring_buffer *
> >  semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
> >  {
> > @@ -2533,8 +2550,7 @@ semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
> >  	int i;
> >  
> >  	ipehr = I915_READ(RING_IPEHR(ring->mmio_base));
> > -	if ((ipehr & ~(0x3 << 16)) !=
> > -	    (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | MI_SEMAPHORE_REGISTER))
> > +	if (!ipehr_is_semaphore_wait(ring->dev, ipehr))
> >  		return NULL;
> >  
> >  	/*
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 146609ab42bb..23267859156f 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -244,7 +244,8 @@
> >  #define   MI_SEMAPHORE_SYNC_BVE	    (0<<16) /* VECS wait for BCS  (VEBSYNC) */
> >  #define   MI_SEMAPHORE_SYNC_VVE	    (1<<16) /* VECS wait for VCS  (VEVSYNC) */
> >  #define   MI_SEMAPHORE_SYNC_RVE	    (2<<16) /* VECS wait for RCS  (VERSYNC) */
> > -#define   MI_SEMAPHORE_SYNC_INVALID  (3<<16)
> > +#define   MI_SEMAPHORE_SYNC_INVALID (3<<16)
> > +#define   MI_SEMAPHORE_SYNC_MASK    (3<<16)
> >  #define MI_SET_CONTEXT		MI_INSTR(0x18, 0)
> >  #define   MI_MM_SPACE_GTT		(1<<8)
> >  #define   MI_MM_SPACE_PHYSICAL		(0<<8)
> > -- 
> > 1.8.1.4
> > 
> 
> -- 
> Ben Widawsky, Intel Open Source Technology Center

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
  2014-03-14 23:08 [PATCH 1/2] drm/i915: fix up semaphore_waits_for Daniel Vetter
  2014-03-14 23:08 ` [PATCH 2/2] drm/i915: Add FIXME for bdw semaphore detection in hancheck Daniel Vetter
@ 2014-03-15 15:46 ` Chris Wilson
  2014-03-15 16:31   ` Daniel Vetter
  2014-03-21 17:33 ` Mika Kuoppala
  2 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-03-15 15:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development, Ben Widawsky, Mika Kuoppala

On Sat, Mar 15, 2014 at 12:08:55AM +0100, Daniel Vetter wrote:
> There's an entire pile of issues in here:
> 
> - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
>   offset of the batch buffer when a batch is executed. Semaphores are
>   always emitted to the main ring, so we always want to look at that.

True, nice catch that would explain a hard hang quite neatly if we tried
to read from beyond the aperture.
 
> - Mask the obtained HEAD pointer with the actual ring size, which is
>   much smaller. Together with the above issue this resulted us in
>   trying to dereference a pointer way outside of the ring mmio
>   mapping. The resulting invalid access in interrupt context
>   (hangcheck is executed from timers) lead to a full blown kernel
>   panic. The fbcon panic handler then tried to frob our driver harder,
>   resulting in a full machine hang at least on my snb here where I've
>   stumbled over this.
> 
> - Handle ring wrapping correctly and be a bit more explicit about how
>   many dwords we're scanning. We probably should also scan more than
>   just 4 ...

You can ignore ring wrapping as valid commands cannot wrap, hence the
formulation of acthd_min.

 /*
  * Be paranoid and presume the hw has gone off into the wild -
  * our ring is smaller than what the hardware (and hence
  * HEAD_ADDR) allows.
  */
 head = I915_READ_HEAD(ring) & (ring->size - 1);
 head_min = max((int)head - 3 * 4, 0);
 while (head >= head_min) {
	cmd = ioread32(ring->virtual_start + head);
	if (cmd == ipehr)
		break;
	head -= 4;
 }

> +		/*
> +		 * Be paranoid and presume the hw has gone off into the wild -
> +		 * our ring is smaller than what the hardware (and hence
> +		 * HEAD_ADDR) allows. Also handles wrap-around.
> +		 */
> +		head &= ring->size - 1;
> +
> +		/* This here seems to blow up */
> +		cmd = ioread32(ring->virtual_start + head);
So what does this mean?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
  2014-03-15 15:46 ` [PATCH 1/2] drm/i915: fix up semaphore_waits_for Chris Wilson
@ 2014-03-15 16:31   ` Daniel Vetter
  2014-03-15 23:13     ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-03-15 16:31 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Intel Graphics Development,
	Mika Kuoppala, Ben Widawsky

On Sat, Mar 15, 2014 at 4:46 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Sat, Mar 15, 2014 at 12:08:55AM +0100, Daniel Vetter wrote:
>> There's an entire pile of issues in here:
>>
>> - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
>>   offset of the batch buffer when a batch is executed. Semaphores are
>>   always emitted to the main ring, so we always want to look at that.
>
> True, nice catch that would explain a hard hang quite neatly if we tried
> to read from beyond the aperture.
>
>> - Mask the obtained HEAD pointer with the actual ring size, which is
>>   much smaller. Together with the above issue this resulted us in
>>   trying to dereference a pointer way outside of the ring mmio
>>   mapping. The resulting invalid access in interrupt context
>>   (hangcheck is executed from timers) lead to a full blown kernel
>>   panic. The fbcon panic handler then tried to frob our driver harder,
>>   resulting in a full machine hang at least on my snb here where I've
>>   stumbled over this.
>>
>> - Handle ring wrapping correctly and be a bit more explicit about how
>>   many dwords we're scanning. We probably should also scan more than
>>   just 4 ...
>
> You can ignore ring wrapping as valid commands cannot wrap, hence the
> formulation of acthd_min.

Well I've thought that HEAD usually points at the next command. I'm
not sure about the exact semantics at wrap-around time but since
implementing it was trivially I've figured it's better to be paranoid.

>  /*
>   * Be paranoid and presume the hw has gone off into the wild -
>   * our ring is smaller than what the hardware (and hence
>   * HEAD_ADDR) allows.
>   */
>  head = I915_READ_HEAD(ring) & (ring->size - 1);
>  head_min = max((int)head - 3 * 4, 0);
>  while (head >= head_min) {
>         cmd = ioread32(ring->virtual_start + head);
>         if (cmd == ipehr)
>                 break;
>         head -= 4;
>  }

Yeah, this would also fix the immediate bug at hand. But my general
impression of this code is that it's trusting reconstructed state way
too much. Hence why I've gone for the more wordy version of firsting
oring out the right field from the register and then for restricting
with the size of the ring. I'll do a follow-up patch to give the the
same treatment to the semaphore signaller computation.

>> +             /*
>> +              * Be paranoid and presume the hw has gone off into the wild -
>> +              * our ring is smaller than what the hardware (and hence
>> +              * HEAD_ADDR) allows. Also handles wrap-around.
>> +              */
>> +             head &= ring->size - 1;
>> +
>> +             /* This here seems to blow up */
>> +             cmd = ioread32(ring->virtual_start + head);
> So what does this mean?

Oops, the comment was just a note from my oops decoding. The ioread32
is where we've blown. The comment should be removed since it's not
useful outside of my debug session.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
  2014-03-15 16:31   ` Daniel Vetter
@ 2014-03-15 23:13     ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2014-03-15 23:13 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development, Ben Widawsky, Mika Kuoppala

On Sat, Mar 15, 2014 at 05:31:04PM +0100, Daniel Vetter wrote:
> On Sat, Mar 15, 2014 at 4:46 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Sat, Mar 15, 2014 at 12:08:55AM +0100, Daniel Vetter wrote:
> >> There's an entire pile of issues in here:
> >>
> >> - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
> >>   offset of the batch buffer when a batch is executed. Semaphores are
> >>   always emitted to the main ring, so we always want to look at that.
> >
> > True, nice catch that would explain a hard hang quite neatly if we tried
> > to read from beyond the aperture.
> >
> >> - Mask the obtained HEAD pointer with the actual ring size, which is
> >>   much smaller. Together with the above issue this resulted us in
> >>   trying to dereference a pointer way outside of the ring mmio
> >>   mapping. The resulting invalid access in interrupt context
> >>   (hangcheck is executed from timers) lead to a full blown kernel
> >>   panic. The fbcon panic handler then tried to frob our driver harder,
> >>   resulting in a full machine hang at least on my snb here where I've
> >>   stumbled over this.
> >>
> >> - Handle ring wrapping correctly and be a bit more explicit about how
> >>   many dwords we're scanning. We probably should also scan more than
> >>   just 4 ...
> >
> > You can ignore ring wrapping as valid commands cannot wrap, hence the
> > formulation of acthd_min.
> 
> Well I've thought that HEAD usually points at the next command. I'm
> not sure about the exact semantics at wrap-around time but since
> implementing it was trivially I've figured it's better to be paranoid.

Nah, it points to ring->size upon parsing a command that exactly ends
upon the wrap around, But your point stands.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] drm/i915: make semaphore signaller detection more robust
  2014-03-15 11:44     ` Daniel Vetter
@ 2014-03-18  9:26       ` Daniel Vetter
  2014-03-19 15:05         ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-03-18  9:26 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky, Mika Kuoppala

Extract all this logic into a new helper function
semaphore_wait_to_signaller_ring because:

- The current code has way too much magic.

- The current code doesn't look at bi16, which encodes VECS signallers
  on HSW. Those are just added after the fact, so can't be encoded in
  a neat formula.

- It blindly trust the hardware for the dev_priv->ring array
  derefence, which given that we have a gpu hang at hand is scary. The
  current logic can't blow up since it limits its value range
  sufficiently, but that's a bit too tricky to rely on in my opinion.
  Especially when we start to add bdw support.

- I'm not a big fan of the explicit ring->semaphore_register list, but
  I think it's more robust to use the same mapping both when
  constructing the semaphore commands and when decoding them.

- Finally add a FIXME comment about lack of broadwell support here,
  like in the earlier ipehr semaphore cmd detection function.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_irq.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 0f3a6d791502..98f95ca77246 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2543,6 +2543,39 @@ ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr)
 }
 
 static struct intel_ring_buffer *
+semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, u32 ipehr)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ring_buffer *signaller;
+	int i;
+
+	if (INTEL_INFO(dev_priv->dev)->gen >= 8) {
+		/*
+		 * FIXME: gen8 semaphore support - currently we don't emit
+		 * semaphores on bdw anyway, but this needs to be addressed when
+		 * we merge that code.
+		 */
+		return NULL;
+	} else {
+		u32 sync_bits = ipehr & MI_SEMAPHORE_SYNC_MASK;
+
+		for_each_ring(signaller, dev_priv, i) {
+			if(ring == signaller)
+				continue;
+
+			if (sync_bits ==
+			    signaller->semaphore_register[ring->id])
+				return signaller;
+		}
+	}
+
+	DRM_ERROR("No signaller ring found for ring %i, ipehr 0x%08x\n",
+		  ring->id, ipehr);
+
+	return NULL;
+}
+
+static struct intel_ring_buffer *
 semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -2582,7 +2615,7 @@ semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
 		return NULL;
 
 	*seqno = ioread32(ring->virtual_start + head + 4) + 1;
-	return &dev_priv->ring[(ring->id + (((ipehr >> 17) & 1) + 1)) % 3];
+	return semaphore_wait_to_signaller_ring(ring, ipehr);
 }
 
 static int semaphore_passed(struct intel_ring_buffer *ring)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: make semaphore signaller detection more robust
  2014-03-18  9:26       ` [PATCH] drm/i915: make semaphore signaller detection more robust Daniel Vetter
@ 2014-03-19 15:05         ` Chris Wilson
  2014-03-19 15:35           ` Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-03-19 15:05 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development, Ben Widawsky, Mika Kuoppala

On Tue, Mar 18, 2014 at 10:26:04AM +0100, Daniel Vetter wrote:
> Extract all this logic into a new helper function
> semaphore_wait_to_signaller_ring because:
> 
> - The current code has way too much magic.
> 
> - The current code doesn't look at bi16, which encodes VECS signallers
>   on HSW. Those are just added after the fact, so can't be encoded in
>   a neat formula.
> 
> - It blindly trust the hardware for the dev_priv->ring array
>   derefence, which given that we have a gpu hang at hand is scary. The
>   current logic can't blow up since it limits its value range
>   sufficiently, but that's a bit too tricky to rely on in my opinion.
>   Especially when we start to add bdw support.

No, it does not blindly trust. What you just mean here is that to extend
it for hsw+ it is simpler to reimplement differently.
 
> - I'm not a big fan of the explicit ring->semaphore_register list, but
>   I think it's more robust to use the same mapping both when
>   constructing the semaphore commands and when decoding them.
> 
> - Finally add a FIXME comment about lack of broadwell support here,
>   like in the earlier ipehr semaphore cmd detection function.
> 
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Other than the digression above, where did MI_SEMAPHORE_SYNC_MASK spring
from? Early patch?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: make semaphore signaller detection more robust
  2014-03-19 15:05         ` Chris Wilson
@ 2014-03-19 15:35           ` Daniel Vetter
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-03-19 15:35 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Intel Graphics Development,
	Mika Kuoppala, Ben Widawsky

On Wed, Mar 19, 2014 at 4:05 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Tue, Mar 18, 2014 at 10:26:04AM +0100, Daniel Vetter wrote:
>> Extract all this logic into a new helper function
>> semaphore_wait_to_signaller_ring because:
>>
>> - The current code has way too much magic.
>>
>> - The current code doesn't look at bi16, which encodes VECS signallers
>>   on HSW. Those are just added after the fact, so can't be encoded in
>>   a neat formula.
>>
>> - It blindly trust the hardware for the dev_priv->ring array
>>   derefence, which given that we have a gpu hang at hand is scary. The
>>   current logic can't blow up since it limits its value range
>>   sufficiently, but that's a bit too tricky to rely on in my opinion.
>>   Especially when we start to add bdw support.
>
> No, it does not blindly trust. What you just mean here is that to extend
> it for hsw+ it is simpler to reimplement differently.

Note that I say that the current logic can't blow up due to the
limited value range ... I can drop the "blindly trust" and leave it at
"too tricky" when merging.

>> - I'm not a big fan of the explicit ring->semaphore_register list, but
>>   I think it's more robust to use the same mapping both when
>>   constructing the semaphore commands and when decoding them.
>>
>> - Finally add a FIXME comment about lack of broadwell support here,
>>   like in the earlier ipehr semaphore cmd detection function.
>>
>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> Cc: Ben Widawsky <ben@bwidawsk.net>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Other than the digression above, where did MI_SEMAPHORE_SYNC_MASK spring
> from? Early patch?

Yeah, the two patches this one here should be in-reply-to have that #define.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
  2014-03-14 23:08 [PATCH 1/2] drm/i915: fix up semaphore_waits_for Daniel Vetter
  2014-03-14 23:08 ` [PATCH 2/2] drm/i915: Add FIXME for bdw semaphore detection in hancheck Daniel Vetter
  2014-03-15 15:46 ` [PATCH 1/2] drm/i915: fix up semaphore_waits_for Chris Wilson
@ 2014-03-21 17:33 ` Mika Kuoppala
  2014-03-22 17:52   ` Daniel Vetter
  2 siblings, 1 reply; 16+ messages in thread
From: Mika Kuoppala @ 2014-03-21 17:33 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

Hi,

Daniel Vetter <daniel.vetter@ffwll.ch> writes:

> There's an entire pile of issues in here:
>
> - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
>   offset of the batch buffer when a batch is executed. Semaphores are
>   always emitted to the main ring, so we always want to look at that.

The ipehr check should make sure that we are at the ring and
acthd should not be at batch.

Regardless I agree that RING_HEAD is much more safer. When I have
tried to get bottom of the snb reset hang, I have noticed that
after reset the acthd is sometimes 0x0c even tho head is 0x00,
on snb.

> - Mask the obtained HEAD pointer with the actual ring size, which is
>   much smaller. Together with the above issue this resulted us in
>   trying to dereference a pointer way outside of the ring mmio
>   mapping. The resulting invalid access in interrupt context
>   (hangcheck is executed from timers) lead to a full blown kernel
>   panic. The fbcon panic handler then tried to frob our driver harder,
>   resulting in a full machine hang at least on my snb here where I've
>   stumbled over this.
>
> - Handle ring wrapping correctly and be a bit more explicit about how
>   many dwords we're scanning. We probably should also scan more than
>   just 4 ...

ipehr dont update on MI_NOOPS and the head should point to
the extra MI_NOOP after semaphore sequence. So 4 should be
enough. Perhaps revisit this when bdw gains semaphores.

> - Space out some of teh computations for readability.
>
> This prevents hard-hangs on my snb here. Verdict from QA is still
> pending, but the symptoms match.
>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100

The hard hangs are not so frequent with this patch but
they are still there. This patch should take care of
the oops we have been seeing, but there is still
something else to be found until #74100 can be marked as
fixed.

Very often after reset, when igt has pushed the quietence
batches into rings, blitter and video ring heads gets
moved properly but all updates to hardware status page and to
the sync registers are missing. And result is hang by hangcheck.
Repeat enough times and it is a hard hang.

> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Please remove the blowup comment and also update the
bugzilla tag. 

Patches 1-2 and the followup one are

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_irq.c | 38 ++++++++++++++++++++++++++------------
>  1 file changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index be2713f12e08..473372a6c97d 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2529,29 +2529,43 @@ static struct intel_ring_buffer *
>  semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
>  {
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	u32 cmd, ipehr, acthd, acthd_min;
> +	u32 cmd, ipehr, head;
> +	int i;
>  
>  	ipehr = I915_READ(RING_IPEHR(ring->mmio_base));
>  	if ((ipehr & ~(0x3 << 16)) !=
>  	    (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | MI_SEMAPHORE_REGISTER))
>  		return NULL;
>  
> -	/* ACTHD is likely pointing to the dword after the actual command,
> -	 * so scan backwards until we find the MBOX.
> +	/*
> +	 * HEAD is likely pointing to the dword after the actual command,
> +	 * so scan backwards until we find the MBOX. But limit it to just 3
> +	 * dwords. Note that we don't care about ACTHD here since that might
> +	 * point at at batch, and semaphores are always emitted into the
> +	 * ringbuffer itself.
>  	 */
> -	acthd = intel_ring_get_active_head(ring) & HEAD_ADDR;
> -	acthd_min = max((int)acthd - 3 * 4, 0);
> -	do {
> -		cmd = ioread32(ring->virtual_start + acthd);
> +	head = I915_READ_HEAD(ring) & HEAD_ADDR;
> +
> +	for (i = 4; i; --i) {
> +		/*
> +		 * Be paranoid and presume the hw has gone off into the wild -
> +		 * our ring is smaller than what the hardware (and hence
> +		 * HEAD_ADDR) allows. Also handles wrap-around.
> +		 */
> +		head &= ring->size - 1;
> +
> +		/* This here seems to blow up */
> +		cmd = ioread32(ring->virtual_start + head);
>  		if (cmd == ipehr)
>  			break;
>  
> -		acthd -= 4;
> -		if (acthd < acthd_min)
> -			return NULL;
> -	} while (1);
> +		head -= 4;
> +	}
> +
> +	if (!i)
> +		return NULL;
>  
> -	*seqno = ioread32(ring->virtual_start+acthd+4)+1;
> +	*seqno = ioread32(ring->virtual_start + head + 4) + 1;
>  	return &dev_priv->ring[(ring->id + (((ipehr >> 17) & 1) + 1)) % 3];
>  }
>  
> -- 
> 1.8.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
  2014-03-21 17:33 ` Mika Kuoppala
@ 2014-03-22 17:52   ` Daniel Vetter
  2014-03-24 22:37     ` Ben Widawsky
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-03-22 17:52 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: Daniel Vetter, Intel Graphics Development, Ben Widawsky

On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
> Hi,
> 
> Daniel Vetter <daniel.vetter@ffwll.ch> writes:
> 
> > There's an entire pile of issues in here:
> >
> > - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
> >   offset of the batch buffer when a batch is executed. Semaphores are
> >   always emitted to the main ring, so we always want to look at that.
> 
> The ipehr check should make sure that we are at the ring and
> acthd should not be at batch.
> 
> Regardless I agree that RING_HEAD is much more safer. When I have
> tried to get bottom of the snb reset hang, I have noticed that
> after reset the acthd is sometimes 0x0c even tho head is 0x00,
> on snb.

Hm, should we maybe mask out the lower bits, too? If you can confirm this,
can you please add a follow-up patch?

> > - Mask the obtained HEAD pointer with the actual ring size, which is
> >   much smaller. Together with the above issue this resulted us in
> >   trying to dereference a pointer way outside of the ring mmio
> >   mapping. The resulting invalid access in interrupt context
> >   (hangcheck is executed from timers) lead to a full blown kernel
> >   panic. The fbcon panic handler then tried to frob our driver harder,
> >   resulting in a full machine hang at least on my snb here where I've
> >   stumbled over this.
> >
> > - Handle ring wrapping correctly and be a bit more explicit about how
> >   many dwords we're scanning. We probably should also scan more than
> >   just 4 ...
> 
> ipehr dont update on MI_NOOPS and the head should point to
> the extra MI_NOOP after semaphore sequence. So 4 should be
> enough. Perhaps revisit this when bdw gains semaphores.

Yeah, Chris also mentioned that the HEAD should point at one dword past
the end of the ring, so should even work when there are no MI_NOOPs. In
any case this code is more robust in case hw engineers suddenly change the
behaviour.

> > - Space out some of teh computations for readability.
> >
> > This prevents hard-hangs on my snb here. Verdict from QA is still
> > pending, but the symptoms match.
> >
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Ben Widawsky <ben@bwidawsk.net>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
> 
> The hard hangs are not so frequent with this patch but
> they are still there. This patch should take care of
> the oops we have been seeing, but there is still
> something else to be found until #74100 can be marked as
> fixed.
> 
> Very often after reset, when igt has pushed the quietence
> batches into rings, blitter and video ring heads gets
> moved properly but all updates to hardware status page and to
> the sync registers are missing. And result is hang by hangcheck.
> Repeat enough times and it is a hard hang.
> 
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Please remove the blowup comment and also update the
> bugzilla tag. 

Yeah, QA also says that it doesn't fix the hard hangs, only seems to
reduce them a bit on certain setups. I've updated the commit message.

btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
greatly increase the chances that the last few message get correctly
transmitted through other means like netconsole.

> Patches 1-2 and the followup one are
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

Thanks for the review, patches merged.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
  2014-03-22 17:52   ` Daniel Vetter
@ 2014-03-24 22:37     ` Ben Widawsky
  2014-03-25  9:42       ` Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Ben Widawsky @ 2014-03-24 22:37 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development, Daniel Vetter

On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote:
> On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
> > Hi,
> > 
> > Daniel Vetter <daniel.vetter@ffwll.ch> writes:
> > 
> > > There's an entire pile of issues in here:
> > >
> > > - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
> > >   offset of the batch buffer when a batch is executed. Semaphores are
> > >   always emitted to the main ring, so we always want to look at that.
> > 
> > The ipehr check should make sure that we are at the ring and
> > acthd should not be at batch.
> > 
> > Regardless I agree that RING_HEAD is much more safer. When I have
> > tried to get bottom of the snb reset hang, I have noticed that
> > after reset the acthd is sometimes 0x0c even tho head is 0x00,
> > on snb.
> 
> Hm, should we maybe mask out the lower bits, too? If you can confirm this,
> can you please add a follow-up patch?
> 
> > > - Mask the obtained HEAD pointer with the actual ring size, which is
> > >   much smaller. Together with the above issue this resulted us in
> > >   trying to dereference a pointer way outside of the ring mmio
> > >   mapping. The resulting invalid access in interrupt context
> > >   (hangcheck is executed from timers) lead to a full blown kernel
> > >   panic. The fbcon panic handler then tried to frob our driver harder,
> > >   resulting in a full machine hang at least on my snb here where I've
> > >   stumbled over this.
> > >
> > > - Handle ring wrapping correctly and be a bit more explicit about how
> > >   many dwords we're scanning. We probably should also scan more than
> > >   just 4 ...
> > 
> > ipehr dont update on MI_NOOPS and the head should point to
> > the extra MI_NOOP after semaphore sequence. So 4 should be
> > enough. Perhaps revisit this when bdw gains semaphores.
> 
> Yeah, Chris also mentioned that the HEAD should point at one dword past
> the end of the ring, so should even work when there are no MI_NOOPs. In
> any case this code is more robust in case hw engineers suddenly change the
> behaviour.
> 
> > > - Space out some of teh computations for readability.
> > >
> > > This prevents hard-hangs on my snb here. Verdict from QA is still
> > > pending, but the symptoms match.
> > >
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
> > 
> > The hard hangs are not so frequent with this patch but
> > they are still there. This patch should take care of
> > the oops we have been seeing, but there is still
> > something else to be found until #74100 can be marked as
> > fixed.
> > 
> > Very often after reset, when igt has pushed the quietence
> > batches into rings, blitter and video ring heads gets
> > moved properly but all updates to hardware status page and to
> > the sync registers are missing. And result is hang by hangcheck.
> > Repeat enough times and it is a hard hang.
> > 
> > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > Please remove the blowup comment and also update the
> > bugzilla tag. 
> 
> Yeah, QA also says that it doesn't fix the hard hangs, only seems to
> reduce them a bit on certain setups. I've updated the commit message.
> 
> btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
> greatly increase the chances that the last few message get correctly
> transmitted through other means like netconsole.
> 
> > Patches 1-2 and the followup one are
> > 
> > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> 
> Thanks for the review, patches merged.
> -Daniel

Patch 2 was trivial to implement for gen8. This functionality is a lot
less trivial. I volunteered to do patch 2, are you going to port this to
gen8?

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
  2014-03-24 22:37     ` Ben Widawsky
@ 2014-03-25  9:42       ` Daniel Vetter
  2014-03-25 14:54         ` Ben Widawsky
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-03-25  9:42 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel Graphics Development, Daniel Vetter

On Mon, Mar 24, 2014 at 03:37:12PM -0700, Ben Widawsky wrote:
> On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote:
> > On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
> > > Hi,
> > > 
> > > Daniel Vetter <daniel.vetter@ffwll.ch> writes:
> > > 
> > > > There's an entire pile of issues in here:
> > > >
> > > > - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
> > > >   offset of the batch buffer when a batch is executed. Semaphores are
> > > >   always emitted to the main ring, so we always want to look at that.
> > > 
> > > The ipehr check should make sure that we are at the ring and
> > > acthd should not be at batch.
> > > 
> > > Regardless I agree that RING_HEAD is much more safer. When I have
> > > tried to get bottom of the snb reset hang, I have noticed that
> > > after reset the acthd is sometimes 0x0c even tho head is 0x00,
> > > on snb.
> > 
> > Hm, should we maybe mask out the lower bits, too? If you can confirm this,
> > can you please add a follow-up patch?
> > 
> > > > - Mask the obtained HEAD pointer with the actual ring size, which is
> > > >   much smaller. Together with the above issue this resulted us in
> > > >   trying to dereference a pointer way outside of the ring mmio
> > > >   mapping. The resulting invalid access in interrupt context
> > > >   (hangcheck is executed from timers) lead to a full blown kernel
> > > >   panic. The fbcon panic handler then tried to frob our driver harder,
> > > >   resulting in a full machine hang at least on my snb here where I've
> > > >   stumbled over this.
> > > >
> > > > - Handle ring wrapping correctly and be a bit more explicit about how
> > > >   many dwords we're scanning. We probably should also scan more than
> > > >   just 4 ...
> > > 
> > > ipehr dont update on MI_NOOPS and the head should point to
> > > the extra MI_NOOP after semaphore sequence. So 4 should be
> > > enough. Perhaps revisit this when bdw gains semaphores.
> > 
> > Yeah, Chris also mentioned that the HEAD should point at one dword past
> > the end of the ring, so should even work when there are no MI_NOOPs. In
> > any case this code is more robust in case hw engineers suddenly change the
> > behaviour.
> > 
> > > > - Space out some of teh computations for readability.
> > > >
> > > > This prevents hard-hangs on my snb here. Verdict from QA is still
> > > > pending, but the symptoms match.
> > > >
> > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
> > > 
> > > The hard hangs are not so frequent with this patch but
> > > they are still there. This patch should take care of
> > > the oops we have been seeing, but there is still
> > > something else to be found until #74100 can be marked as
> > > fixed.
> > > 
> > > Very often after reset, when igt has pushed the quietence
> > > batches into rings, blitter and video ring heads gets
> > > moved properly but all updates to hardware status page and to
> > > the sync registers are missing. And result is hang by hangcheck.
> > > Repeat enough times and it is a hard hang.
> > > 
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > 
> > > Please remove the blowup comment and also update the
> > > bugzilla tag. 
> > 
> > Yeah, QA also says that it doesn't fix the hard hangs, only seems to
> > reduce them a bit on certain setups. I've updated the commit message.
> > 
> > btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
> > greatly increase the chances that the last few message get correctly
> > transmitted through other means like netconsole.
> > 
> > > Patches 1-2 and the followup one are
> > > 
> > > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > 
> > Thanks for the review, patches merged.
> > -Daniel
> 
> Patch 2 was trivial to implement for gen8. This functionality is a lot
> less trivial. I volunteered to do patch 2, are you going to port this to
> gen8?

If you fill out the bdw pieces for patch 2&3 the only thing you need to
change here is to adjsut the number of dwords we walk backwards to make
sure that the bdw semaphore cmd still fits. Or at least that's been my
idea behind all this, maybe I've overlooked something.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
  2014-03-25  9:42       ` Daniel Vetter
@ 2014-03-25 14:54         ` Ben Widawsky
  2014-03-25 15:24           ` Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Ben Widawsky @ 2014-03-25 14:54 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development, Daniel Vetter

On Tue, Mar 25, 2014 at 10:42:09AM +0100, Daniel Vetter wrote:
> On Mon, Mar 24, 2014 at 03:37:12PM -0700, Ben Widawsky wrote:
> > On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote:
> > > On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
> > > > Hi,
> > > > 
> > > > Daniel Vetter <daniel.vetter@ffwll.ch> writes:
> > > > 
> > > > > There's an entire pile of issues in here:
> > > > >
> > > > > - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
> > > > >   offset of the batch buffer when a batch is executed. Semaphores are
> > > > >   always emitted to the main ring, so we always want to look at that.
> > > > 
> > > > The ipehr check should make sure that we are at the ring and
> > > > acthd should not be at batch.
> > > > 
> > > > Regardless I agree that RING_HEAD is much more safer. When I have
> > > > tried to get bottom of the snb reset hang, I have noticed that
> > > > after reset the acthd is sometimes 0x0c even tho head is 0x00,
> > > > on snb.
> > > 
> > > Hm, should we maybe mask out the lower bits, too? If you can confirm this,
> > > can you please add a follow-up patch?
> > > 
> > > > > - Mask the obtained HEAD pointer with the actual ring size, which is
> > > > >   much smaller. Together with the above issue this resulted us in
> > > > >   trying to dereference a pointer way outside of the ring mmio
> > > > >   mapping. The resulting invalid access in interrupt context
> > > > >   (hangcheck is executed from timers) lead to a full blown kernel
> > > > >   panic. The fbcon panic handler then tried to frob our driver harder,
> > > > >   resulting in a full machine hang at least on my snb here where I've
> > > > >   stumbled over this.
> > > > >
> > > > > - Handle ring wrapping correctly and be a bit more explicit about how
> > > > >   many dwords we're scanning. We probably should also scan more than
> > > > >   just 4 ...
> > > > 
> > > > ipehr dont update on MI_NOOPS and the head should point to
> > > > the extra MI_NOOP after semaphore sequence. So 4 should be
> > > > enough. Perhaps revisit this when bdw gains semaphores.
> > > 
> > > Yeah, Chris also mentioned that the HEAD should point at one dword past
> > > the end of the ring, so should even work when there are no MI_NOOPs. In
> > > any case this code is more robust in case hw engineers suddenly change the
> > > behaviour.
> > > 
> > > > > - Space out some of teh computations for readability.
> > > > >
> > > > > This prevents hard-hangs on my snb here. Verdict from QA is still
> > > > > pending, but the symptoms match.
> > > > >
> > > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
> > > > 
> > > > The hard hangs are not so frequent with this patch but
> > > > they are still there. This patch should take care of
> > > > the oops we have been seeing, but there is still
> > > > something else to be found until #74100 can be marked as
> > > > fixed.
> > > > 
> > > > Very often after reset, when igt has pushed the quietence
> > > > batches into rings, blitter and video ring heads gets
> > > > moved properly but all updates to hardware status page and to
> > > > the sync registers are missing. And result is hang by hangcheck.
> > > > Repeat enough times and it is a hard hang.
> > > > 
> > > > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > 
> > > > Please remove the blowup comment and also update the
> > > > bugzilla tag. 
> > > 
> > > Yeah, QA also says that it doesn't fix the hard hangs, only seems to
> > > reduce them a bit on certain setups. I've updated the commit message.
> > > 
> > > btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
> > > greatly increase the chances that the last few message get correctly
> > > transmitted through other means like netconsole.
> > > 
> > > > Patches 1-2 and the followup one are
> > > > 
> > > > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > > 
> > > Thanks for the review, patches merged.
> > > -Daniel
> > 
> > Patch 2 was trivial to implement for gen8. This functionality is a lot
> > less trivial. I volunteered to do patch 2, are you going to port this to
> > gen8?
> 
> If you fill out the bdw pieces for patch 2&3 the only thing you need to
> change here is to adjsut the number of dwords we walk backwards to make
> sure that the bdw semaphore cmd still fits. Or at least that's been my
> idea behind all this, maybe I've overlooked something.
> -Daniel

I don't think it's quite that easy, but I took it as a, "yes." Which one
is patch 3?

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
  2014-03-25 14:54         ` Ben Widawsky
@ 2014-03-25 15:24           ` Daniel Vetter
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-03-25 15:24 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel Graphics Development

On Tue, Mar 25, 2014 at 3:54 PM, Ben Widawsky <ben@bwidawsk.net> wrote:
> I don't think it's quite that easy, but I took it as a, "yes." Which one
> is patch 3?

"drm/i915: make semaphore signaller detection more robust" - it was a
follow-up patch in this thread, hence no 3/3. If the relevant bits on
bdw have moved out of the the cmd dword I guess we need to keep a
cache of the relevant dword and supply it to the function. Was too
lazy to check that though.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-03-25 15:24 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-14 23:08 [PATCH 1/2] drm/i915: fix up semaphore_waits_for Daniel Vetter
2014-03-14 23:08 ` [PATCH 2/2] drm/i915: Add FIXME for bdw semaphore detection in hancheck Daniel Vetter
2014-03-15  0:38   ` Ben Widawsky
2014-03-15 11:44     ` Daniel Vetter
2014-03-18  9:26       ` [PATCH] drm/i915: make semaphore signaller detection more robust Daniel Vetter
2014-03-19 15:05         ` Chris Wilson
2014-03-19 15:35           ` Daniel Vetter
2014-03-15 15:46 ` [PATCH 1/2] drm/i915: fix up semaphore_waits_for Chris Wilson
2014-03-15 16:31   ` Daniel Vetter
2014-03-15 23:13     ` Chris Wilson
2014-03-21 17:33 ` Mika Kuoppala
2014-03-22 17:52   ` Daniel Vetter
2014-03-24 22:37     ` Ben Widawsky
2014-03-25  9:42       ` Daniel Vetter
2014-03-25 14:54         ` Ben Widawsky
2014-03-25 15:24           ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.