All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Detect page faults during hangcheck
@ 2014-12-05 14:15 Chris Wilson
  2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson
  2014-12-05 21:03 ` [PATCH 1/2] drm/i915: Detect page faults during hangcheck Daniel Vetter
  0 siblings, 2 replies; 16+ messages in thread
From: Chris Wilson @ 2014-12-05 14:15 UTC (permalink / raw)
  To: intel-gfx

On Sandybridge+, the GPU provides the ERROR register for detecting page
faults. Hook this up to our hangcheck so that we can dump the error
state soon after such an event occurs. This would be better inside an
interrupt handler, but it serves a purpose here as it detects that our
initial context setup is invalid...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c     | 5 +++++
 drivers/gpu/drm/i915/intel_uncore.c | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 7913a72ce30a..eb2149b941e4 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2969,6 +2969,11 @@ static void i915_hangcheck_elapsed(unsigned long data)
 	if (!i915.enable_hangcheck)
 		return;
 
+	if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) {
+		i915_handle_error(dev, false, "GPU reported a page fault");
+		I915_WRITE(ERROR_GEN6, 0);
+	}
+
 	for_each_ring(ring, dev_priv, i) {
 		u64 acthd;
 		u32 seqno;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 0655b44651a7..67f2e24c5bc5 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -535,6 +535,8 @@ static void __intel_uncore_early_sanitize(struct drm_device *dev,
 	if (IS_GEN6(dev) || IS_GEN7(dev))
 		__raw_i915_write32(dev_priv, GTFIFODBG,
 				   __raw_i915_read32(dev_priv, GTFIFODBG));
+	if (INTEL_INFO(dev)->gen >= 6)
+		__raw_i915_write32(dev_priv, ERROR_GEN6, 0);
 
 	intel_uncore_forcewake_reset(dev, restore_forcewake);
 }
-- 
2.1.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 14:15 [PATCH 1/2] drm/i915: Detect page faults during hangcheck Chris Wilson
@ 2014-12-05 14:15 ` Chris Wilson
  2014-12-05 14:31   ` Ville Syrjälä
  2015-01-20 14:55   ` Mika Kuoppala
  2014-12-05 21:03 ` [PATCH 1/2] drm/i915: Detect page faults during hangcheck Daniel Vetter
  1 sibling, 2 replies; 16+ messages in thread
From: Chris Wilson @ 2014-12-05 14:15 UTC (permalink / raw)
  To: intel-gfx

Currently we initialise the rings, add the first context switch to the
ring and execute our golden state then enable (aliasing or full) ppgtt.
However, as we enable ppgtt using direct MMIO but load the PD using
MI_LRI, we end up executing the context switch and golden render state
with an invalid PD generating page faults. To solve this issue, first do
the ppgtt PD setup, then set the default context and write the commands
to run the render state into the ring, before we activate the ring. This
allows us to be sure that the register state is valid before we begin
execution.

This was spotted when writing the seqno/request conversion, but only with
the ERROR capture did I realise that it was a necessity now.

RFC: cleanup the error handling in i915_gem_init_hw.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c1c11418231b..c13842d3cbc9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
 	 */
 	init_unused_rings(dev);
 
-	for_each_ring(ring, dev_priv, i) {
-		ret = ring->init_hw(ring);
-		if (ret)
-			return ret;
-	}
-
 	for (i = 0; i < NUM_L3_SLICES(dev); i++)
 		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
 
+	ret = i915_ppgtt_init_hw(dev);
+	if (ret && ret != -EIO) {
+		DRM_ERROR("PPGTT enable failed %d\n", ret);
+		i915_gem_cleanup_ringbuffer(dev);
+	}
+
 	/*
 	 * XXX: Contexts should only be initialized once. Doing a switch to the
 	 * default context switch however is something we'd like to do after
@@ -4820,10 +4820,10 @@ i915_gem_init_hw(struct drm_device *dev)
 		return ret;
 	}
 
-	ret = i915_ppgtt_init_hw(dev);
-	if (ret && ret != -EIO) {
-		DRM_ERROR("PPGTT enable failed %d\n", ret);
-		i915_gem_cleanup_ringbuffer(dev);
+	for_each_ring(ring, dev_priv, i) {
+		ret = ring->init_hw(ring);
+		if (ret)
+			return ret;
 	}
 
 	return ret;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index de38b04135d2..95ebce73d46d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -585,6 +585,9 @@ static int init_ring_common(struct intel_engine_cs *ring)
 	I915_WRITE_HEAD(ring, 0);
 	(void)I915_READ_HEAD(ring);
 
+	I915_WRITE_HEAD(ring, ringbuf->head);
+	I915_WRITE_TAIL(ring, ringbuf->head);
+
 	I915_WRITE_CTL(ring,
 			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES)
 			| RING_VALID);
@@ -592,7 +595,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
 	/* If the head is still not zero, the ring is dead */
 	if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 &&
 		     I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) &&
-		     (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) {
+		     (I915_READ_HEAD(ring) & HEAD_ADDR) == ringbuf->head, 50)) {
 		DRM_ERROR("%s initialization failed "
 			  "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n",
 			  ring->name,
@@ -603,9 +606,9 @@ static int init_ring_common(struct intel_engine_cs *ring)
 		goto out;
 	}
 
+	I915_WRITE_TAIL(ring, ringbuf->tail);
+
 	ringbuf->last_retired_head = -1;
-	ringbuf->head = I915_READ_HEAD(ring);
-	ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
 	intel_ring_update_space(ringbuf);
 
 	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
-- 
2.1.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson
@ 2014-12-05 14:31   ` Ville Syrjälä
  2014-12-05 14:38     ` Chris Wilson
  2015-01-20 14:55   ` Mika Kuoppala
  1 sibling, 1 reply; 16+ messages in thread
From: Ville Syrjälä @ 2014-12-05 14:31 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> Currently we initialise the rings, add the first context switch to the
> ring and execute our golden state then enable (aliasing or full) ppgtt.
> However, as we enable ppgtt using direct MMIO but load the PD using
> MI_LRI, we end up executing the context switch and golden render state
> with an invalid PD generating page faults. To solve this issue, first do
> the ppgtt PD setup, then set the default context and write the commands
> to run the render state into the ring, before we activate the ring. This
> allows us to be sure that the register state is valid before we begin
> execution.
> 
> This was spotted when writing the seqno/request conversion, but only with
> the ERROR capture did I realise that it was a necessity now.
> 
> RFC: cleanup the error handling in i915_gem_init_hw.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
>  2 files changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c1c11418231b..c13842d3cbc9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
>  	 */
>  	init_unused_rings(dev);
>  
> -	for_each_ring(ring, dev_priv, i) {
> -		ret = ring->init_hw(ring);
> -		if (ret)
> -			return ret;
> -	}
> -
>  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
>  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);

This is going to assume ring->head/tail are already valid?

>  
> +	ret = i915_ppgtt_init_hw(dev);
> +	if (ret && ret != -EIO) {
> +		DRM_ERROR("PPGTT enable failed %d\n", ret);
> +		i915_gem_cleanup_ringbuffer(dev);
> +	}
> +
>  	/*
>  	 * XXX: Contexts should only be initialized once. Doing a switch to the
>  	 * default context switch however is something we'd like to do after
> @@ -4820,10 +4820,10 @@ i915_gem_init_hw(struct drm_device *dev)
>  		return ret;
>  	}
>  
> -	ret = i915_ppgtt_init_hw(dev);
> -	if (ret && ret != -EIO) {
> -		DRM_ERROR("PPGTT enable failed %d\n", ret);
> -		i915_gem_cleanup_ringbuffer(dev);
> +	for_each_ring(ring, dev_priv, i) {
> +		ret = ring->init_hw(ring);
> +		if (ret)
> +			return ret;
>  	}
>  
>  	return ret;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index de38b04135d2..95ebce73d46d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -585,6 +585,9 @@ static int init_ring_common(struct intel_engine_cs *ring)
>  	I915_WRITE_HEAD(ring, 0);
>  	(void)I915_READ_HEAD(ring);
>  
> +	I915_WRITE_HEAD(ring, ringbuf->head);
> +	I915_WRITE_TAIL(ring, ringbuf->head);

Here too.

So what happens on GPU reset?

> +
>  	I915_WRITE_CTL(ring,
>  			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES)
>  			| RING_VALID);
> @@ -592,7 +595,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
>  	/* If the head is still not zero, the ring is dead */
>  	if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 &&
>  		     I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) &&
> -		     (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) {
> +		     (I915_READ_HEAD(ring) & HEAD_ADDR) == ringbuf->head, 50)) {
>  		DRM_ERROR("%s initialization failed "
>  			  "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n",
>  			  ring->name,
> @@ -603,9 +606,9 @@ static int init_ring_common(struct intel_engine_cs *ring)
>  		goto out;
>  	}
>  
> +	I915_WRITE_TAIL(ring, ringbuf->tail);
> +
>  	ringbuf->last_retired_head = -1;
> -	ringbuf->head = I915_READ_HEAD(ring);
> -	ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
>  	intel_ring_update_space(ringbuf);
>  
>  	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
> -- 
> 2.1.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 14:31   ` Ville Syrjälä
@ 2014-12-05 14:38     ` Chris Wilson
  2014-12-05 14:53       ` Ville Syrjälä
  2014-12-05 14:58       ` Daniel Vetter
  0 siblings, 2 replies; 16+ messages in thread
From: Chris Wilson @ 2014-12-05 14:38 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > Currently we initialise the rings, add the first context switch to the
> > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > However, as we enable ppgtt using direct MMIO but load the PD using
> > MI_LRI, we end up executing the context switch and golden render state
> > with an invalid PD generating page faults. To solve this issue, first do
> > the ppgtt PD setup, then set the default context and write the commands
> > to run the render state into the ring, before we activate the ring. This
> > allows us to be sure that the register state is valid before we begin
> > execution.
> > 
> > This was spotted when writing the seqno/request conversion, but only with
> > the ERROR capture did I realise that it was a necessity now.
> > 
> > RFC: cleanup the error handling in i915_gem_init_hw.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> >  2 files changed, 16 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index c1c11418231b..c13842d3cbc9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> >  	 */
> >  	init_unused_rings(dev);
> >  
> > -	for_each_ring(ring, dev_priv, i) {
> > -		ret = ring->init_hw(ring);
> > -		if (ret)
> > -			return ret;
> > -	}
> > -
> >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> 
> This is going to assume ring->head/tail are already valid?

We write into the ring obj, not the ring itself, which should be setup
during the various intel_init_engine, i.e. the backing storage is
independent of the actual registers.

If you take it to the logical conclusion, each operation on the ring is
just a transaction that is buffered until it is commited to the
hardware, or rolled back in its entirety if aborted. We can then queue
up multiple transactions before first enabling the hardware, or equally
preserve such transactions across a GPU reset.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 14:38     ` Chris Wilson
@ 2014-12-05 14:53       ` Ville Syrjälä
  2014-12-05 16:26         ` Chris Wilson
  2014-12-05 14:58       ` Daniel Vetter
  1 sibling, 1 reply; 16+ messages in thread
From: Ville Syrjälä @ 2014-12-05 14:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > Currently we initialise the rings, add the first context switch to the
> > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > MI_LRI, we end up executing the context switch and golden render state
> > > with an invalid PD generating page faults. To solve this issue, first do
> > > the ppgtt PD setup, then set the default context and write the commands
> > > to run the render state into the ring, before we activate the ring. This
> > > allows us to be sure that the register state is valid before we begin
> > > execution.
> > > 
> > > This was spotted when writing the seqno/request conversion, but only with
> > > the ERROR capture did I realise that it was a necessity now.
> > > 
> > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index c1c11418231b..c13842d3cbc9 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > >  	 */
> > >  	init_unused_rings(dev);
> > >  
> > > -	for_each_ring(ring, dev_priv, i) {
> > > -		ret = ring->init_hw(ring);
> > > -		if (ret)
> > > -			return ret;
> > > -	}
> > > -
> > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > 
> > This is going to assume ring->head/tail are already valid?
> 
> We write into the ring obj, not the ring itself, which should be setup
> during the various intel_init_engine, i.e. the backing storage is
> independent of the actual registers.

I mean the software shadows, not the registers themselves. When the GPU
hangs I expect rign->head != ring->tail. So what makes those two identical
again after the GPU reset?

> 
> If you take it to the logical conclusion, each operation on the ring is
> just a transaction that is buffered until it is commited to the
> hardware, or rolled back in its entirety if aborted. We can then queue
> up multiple transactions before first enabling the hardware, or equally
> preserve such transactions across a GPU reset.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 14:38     ` Chris Wilson
  2014-12-05 14:53       ` Ville Syrjälä
@ 2014-12-05 14:58       ` Daniel Vetter
  2014-12-05 16:23         ` Chris Wilson
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-12-05 14:58 UTC (permalink / raw)
  To: Chris Wilson, Ville Syrjälä, intel-gfx

On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > Currently we initialise the rings, add the first context switch to the
> > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > MI_LRI, we end up executing the context switch and golden render state
> > > with an invalid PD generating page faults. To solve this issue, first do
> > > the ppgtt PD setup, then set the default context and write the commands
> > > to run the render state into the ring, before we activate the ring. This
> > > allows us to be sure that the register state is valid before we begin
> > > execution.
> > > 
> > > This was spotted when writing the seqno/request conversion, but only with
> > > the ERROR capture did I realise that it was a necessity now.
> > > 
> > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index c1c11418231b..c13842d3cbc9 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > >  	 */
> > >  	init_unused_rings(dev);
> > >  
> > > -	for_each_ring(ring, dev_priv, i) {
> > > -		ret = ring->init_hw(ring);
> > > -		if (ret)
> > > -			return ret;
> > > -	}
> > > -
> > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > 
> > This is going to assume ring->head/tail are already valid?
> 
> We write into the ring obj, not the ring itself, which should be setup
> during the various intel_init_engine, i.e. the backing storage is
> independent of the actual registers.

But there's still intel_ring_advance which calls ->write_tail all over the
place. So we drop all these mmio writes into nirvana since we'll reset the
ring later on?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 14:58       ` Daniel Vetter
@ 2014-12-05 16:23         ` Chris Wilson
  2014-12-05 21:01           ` Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-12-05 16:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Dec 05, 2014 at 03:58:40PM +0100, Daniel Vetter wrote:
> On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > > Currently we initialise the rings, add the first context switch to the
> > > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > > MI_LRI, we end up executing the context switch and golden render state
> > > > with an invalid PD generating page faults. To solve this issue, first do
> > > > the ppgtt PD setup, then set the default context and write the commands
> > > > to run the render state into the ring, before we activate the ring. This
> > > > allows us to be sure that the register state is valid before we begin
> > > > execution.
> > > > 
> > > > This was spotted when writing the seqno/request conversion, but only with
> > > > the ERROR capture did I realise that it was a necessity now.
> > > > 
> > > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index c1c11418231b..c13842d3cbc9 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > > >  	 */
> > > >  	init_unused_rings(dev);
> > > >  
> > > > -	for_each_ring(ring, dev_priv, i) {
> > > > -		ret = ring->init_hw(ring);
> > > > -		if (ret)
> > > > -			return ret;
> > > > -	}
> > > > -
> > > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > > 
> > > This is going to assume ring->head/tail are already valid?
> > 
> > We write into the ring obj, not the ring itself, which should be setup
> > during the various intel_init_engine, i.e. the backing storage is
> > independent of the actual registers.
> 
> But there's still intel_ring_advance which calls ->write_tail all over the
> place. So we drop all these mmio writes into nirvana since we'll reset the
> ring later on?

intel_ring_advance() doesn't do the register update, it just updates
ring->tail. And even if it did, whilst the ring is disabled, nobody is
listening, right?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 14:53       ` Ville Syrjälä
@ 2014-12-05 16:26         ` Chris Wilson
  2014-12-05 17:03           ` Ville Syrjälä
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-12-05 16:26 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote:
> On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > > Currently we initialise the rings, add the first context switch to the
> > > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > > MI_LRI, we end up executing the context switch and golden render state
> > > > with an invalid PD generating page faults. To solve this issue, first do
> > > > the ppgtt PD setup, then set the default context and write the commands
> > > > to run the render state into the ring, before we activate the ring. This
> > > > allows us to be sure that the register state is valid before we begin
> > > > execution.
> > > > 
> > > > This was spotted when writing the seqno/request conversion, but only with
> > > > the ERROR capture did I realise that it was a necessity now.
> > > > 
> > > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index c1c11418231b..c13842d3cbc9 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > > >  	 */
> > > >  	init_unused_rings(dev);
> > > >  
> > > > -	for_each_ring(ring, dev_priv, i) {
> > > > -		ret = ring->init_hw(ring);
> > > > -		if (ret)
> > > > -			return ret;
> > > > -	}
> > > > -
> > > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > > 
> > > This is going to assume ring->head/tail are already valid?
> > 
> > We write into the ring obj, not the ring itself, which should be setup
> > during the various intel_init_engine, i.e. the backing storage is
> > independent of the actual registers.
> 
> I mean the software shadows, not the registers themselves. When the GPU
> hangs I expect rign->head != ring->tail. So what makes those two identical
> again after the GPU reset?

Why would they be equal after reset? At the moment, we discard all
outstanding requests which makes them equal. We have been discussing the
need to execute pending batches from non-guitly Batches for yonks, in
which case we would not expect ring->head == ring->tail after a reset.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 16:26         ` Chris Wilson
@ 2014-12-05 17:03           ` Ville Syrjälä
  2015-01-21 10:12             ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Ville Syrjälä @ 2014-12-05 17:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Fri, Dec 05, 2014 at 04:26:23PM +0000, Chris Wilson wrote:
> On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote:
> > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > > > Currently we initialise the rings, add the first context switch to the
> > > > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > > > MI_LRI, we end up executing the context switch and golden render state
> > > > > with an invalid PD generating page faults. To solve this issue, first do
> > > > > the ppgtt PD setup, then set the default context and write the commands
> > > > > to run the render state into the ring, before we activate the ring. This
> > > > > allows us to be sure that the register state is valid before we begin
> > > > > execution.
> > > > > 
> > > > > This was spotted when writing the seqno/request conversion, but only with
> > > > > the ERROR capture did I realise that it was a necessity now.
> > > > > 
> > > > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > > > 
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > > > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > > index c1c11418231b..c13842d3cbc9 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > > > >  	 */
> > > > >  	init_unused_rings(dev);
> > > > >  
> > > > > -	for_each_ring(ring, dev_priv, i) {
> > > > > -		ret = ring->init_hw(ring);
> > > > > -		if (ret)
> > > > > -			return ret;
> > > > > -	}
> > > > > -
> > > > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > > > 
> > > > This is going to assume ring->head/tail are already valid?
> > > 
> > > We write into the ring obj, not the ring itself, which should be setup
> > > during the various intel_init_engine, i.e. the backing storage is
> > > independent of the actual registers.
> > 
> > I mean the software shadows, not the registers themselves. When the GPU
> > hangs I expect rign->head != ring->tail. So what makes those two identical
> > again after the GPU reset?
> 
> Why would they be equal after reset?

They wouldn't. That's my whole point.

> At the moment, we discard all
> outstanding requests which makes them equal.

OK, so you're saying somehwere we set them to some sane values. But
I don't see any obvious code for that. The obvious code was in
init_ring_common(), but you just removed said code in this patch.

So unless I'm totally misreading the code, ->head and ->tail will be
left at their pre-reset values, then i915_gem_l3_remap() will pile on
some new stuff at ->tail, and then later you restart the ring from
->head, which would still include all the discarded junk from before
the reset.

> We have been discussing the
> need to execute pending batches from non-guitly Batches for yonks, in
> which case we would not expect ring->head == ring->tail after a reset.

Yeah, but let's not confuse the discussion with advanced topics.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 16:23         ` Chris Wilson
@ 2014-12-05 21:01           ` Daniel Vetter
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-12-05 21:01 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Ville Syrjälä, intel-gfx

On Fri, Dec 05, 2014 at 04:23:56PM +0000, Chris Wilson wrote:
> On Fri, Dec 05, 2014 at 03:58:40PM +0100, Daniel Vetter wrote:
> > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > > > Currently we initialise the rings, add the first context switch to the
> > > > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > > > MI_LRI, we end up executing the context switch and golden render state
> > > > > with an invalid PD generating page faults. To solve this issue, first do
> > > > > the ppgtt PD setup, then set the default context and write the commands
> > > > > to run the render state into the ring, before we activate the ring. This
> > > > > allows us to be sure that the register state is valid before we begin
> > > > > execution.
> > > > > 
> > > > > This was spotted when writing the seqno/request conversion, but only with
> > > > > the ERROR capture did I realise that it was a necessity now.
> > > > > 
> > > > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > > > 
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > > > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > > index c1c11418231b..c13842d3cbc9 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > > > >  	 */
> > > > >  	init_unused_rings(dev);
> > > > >  
> > > > > -	for_each_ring(ring, dev_priv, i) {
> > > > > -		ret = ring->init_hw(ring);
> > > > > -		if (ret)
> > > > > -			return ret;
> > > > > -	}
> > > > > -
> > > > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > > > 
> > > > This is going to assume ring->head/tail are already valid?
> > > 
> > > We write into the ring obj, not the ring itself, which should be setup
> > > during the various intel_init_engine, i.e. the backing storage is
> > > independent of the actual registers.
> > 
> > But there's still intel_ring_advance which calls ->write_tail all over the
> > place. So we drop all these mmio writes into nirvana since we'll reset the
> > ring later on?
> 
> intel_ring_advance() doesn't do the register update, it just updates
> ring->tail. And even if it did, whilst the ring is disabled, nobody is
> listening, right?

Oh right, mixed that up. So all well from my pov for this patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: Detect page faults during hangcheck
  2014-12-05 14:15 [PATCH 1/2] drm/i915: Detect page faults during hangcheck Chris Wilson
  2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson
@ 2014-12-05 21:03 ` Daniel Vetter
  2015-01-21 10:05   ` Chris Wilson
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-12-05 21:03 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Dec 05, 2014 at 02:15:21PM +0000, Chris Wilson wrote:
> On Sandybridge+, the GPU provides the ERROR register for detecting page
> faults. Hook this up to our hangcheck so that we can dump the error
> state soon after such an event occurs. This would be better inside an
> interrupt handler, but it serves a purpose here as it detects that our
> initial context setup is invalid...
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_irq.c     | 5 +++++
>  drivers/gpu/drm/i915/intel_uncore.c | 2 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 7913a72ce30a..eb2149b941e4 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2969,6 +2969,11 @@ static void i915_hangcheck_elapsed(unsigned long data)
>  	if (!i915.enable_hangcheck)
>  		return;
>  
> +	if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) {
> +		i915_handle_error(dev, false, "GPU reported a page fault");

Is the full hangcheck state actually useful for debugging these
pagefaults? The gpu doesn't seem to fall over completely, so I guess ACTHD
and friends are somewhere in nirvana.

Enabling the interrupts would definitely be useful though. I think all the
handler code is written already (perhaps a few missing drm_debug lines),
it's just that we don't enable these interrupt sources by default.
-Daniel

> +		I915_WRITE(ERROR_GEN6, 0);
> +	}
> +
>  	for_each_ring(ring, dev_priv, i) {
>  		u64 acthd;
>  		u32 seqno;
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 0655b44651a7..67f2e24c5bc5 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -535,6 +535,8 @@ static void __intel_uncore_early_sanitize(struct drm_device *dev,
>  	if (IS_GEN6(dev) || IS_GEN7(dev))
>  		__raw_i915_write32(dev_priv, GTFIFODBG,
>  				   __raw_i915_read32(dev_priv, GTFIFODBG));
> +	if (INTEL_INFO(dev)->gen >= 6)
> +		__raw_i915_write32(dev_priv, ERROR_GEN6, 0);
>  
>  	intel_uncore_forcewake_reset(dev, restore_forcewake);
>  }
> -- 
> 2.1.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson
  2014-12-05 14:31   ` Ville Syrjälä
@ 2015-01-20 14:55   ` Mika Kuoppala
  2015-01-21 10:09     ` Chris Wilson
  1 sibling, 1 reply; 16+ messages in thread
From: Mika Kuoppala @ 2015-01-20 14:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Currently we initialise the rings, add the first context switch to the
> ring and execute our golden state then enable (aliasing or full) ppgtt.
> However, as we enable ppgtt using direct MMIO but load the PD using
> MI_LRI, we end up executing the context switch and golden render state
> with an invalid PD generating page faults. To solve this issue, first do
> the ppgtt PD setup, then set the default context and write the commands
> to run the render state into the ring, before we activate the ring. This
> allows us to be sure that the register state is valid before we begin
> execution.
>
> This was spotted when writing the seqno/request conversion, but only with
> the ERROR capture did I realise that it was a necessity now.
>
> RFC: cleanup the error handling in i915_gem_init_hw.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
>  2 files changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c1c11418231b..c13842d3cbc9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
>  	 */
>  	init_unused_rings(dev);
>  
> -	for_each_ring(ring, dev_priv, i) {
> -		ret = ring->init_hw(ring);
> -		if (ret)
> -			return ret;
> -	}
> -
>  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
>  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
>  

Should we move this after the ppgtt has been init? (or alternatively
let do_switch handle it)

> +	ret = i915_ppgtt_init_hw(dev);
> +	if (ret && ret != -EIO) {
> +		DRM_ERROR("PPGTT enable failed %d\n", ret);
> +		i915_gem_cleanup_ringbuffer(dev);
> +	}
> +
>  	/*
>  	 * XXX: Contexts should only be initialized once. Doing a switch to the
>  	 * default context switch however is something we'd like to do after
> @@ -4820,10 +4820,10 @@ i915_gem_init_hw(struct drm_device *dev)
>  		return ret;
>  	}

The comment needs tweaking as it reads 

/* Context switching requires rings (for * the do_switch), but before
 * enabling PPGTT. So don't move this."
*/

If I read the patch correctly you want to keep the ring head/tail
bookkeeping intact so that you can push the ctx_switch into the buffer
before the actual ring hw is activated?

Does this then enable you to get rid of the whole pre emitting of
l3remap stuff and just let the do_switch handle l3remap?

-Mika

> -	ret = i915_ppgtt_init_hw(dev);
> -	if (ret && ret != -EIO) {
> -		DRM_ERROR("PPGTT enable failed %d\n", ret);
> -		i915_gem_cleanup_ringbuffer(dev);
> +	for_each_ring(ring, dev_priv, i) {
> +		ret = ring->init_hw(ring);
> +		if (ret)
> +			return ret;
>  	}
>  
>  	return ret;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index de38b04135d2..95ebce73d46d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -585,6 +585,9 @@ static int init_ring_common(struct intel_engine_cs *ring)
>  	I915_WRITE_HEAD(ring, 0);
>  	(void)I915_READ_HEAD(ring);
>  
> +	I915_WRITE_HEAD(ring, ringbuf->head);
> +	I915_WRITE_TAIL(ring, ringbuf->head);
> +
>  	I915_WRITE_CTL(ring,
>  			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES)
>  			| RING_VALID);
> @@ -592,7 +595,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
>  	/* If the head is still not zero, the ring is dead */
>  	if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 &&
>  		     I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) &&
> -		     (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) {
> +		     (I915_READ_HEAD(ring) & HEAD_ADDR) == ringbuf->head, 50)) {
>  		DRM_ERROR("%s initialization failed "
>  			  "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n",
>  			  ring->name,
> @@ -603,9 +606,9 @@ static int init_ring_common(struct intel_engine_cs *ring)
>  		goto out;
>  	}
>  
> +	I915_WRITE_TAIL(ring, ringbuf->tail);
> +
>  	ringbuf->last_retired_head = -1;
> -	ringbuf->head = I915_READ_HEAD(ring);
> -	ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
>  	intel_ring_update_space(ringbuf);
>  
>  	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
> -- 
> 2.1.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/i915: Detect page faults during hangcheck
  2014-12-05 21:03 ` [PATCH 1/2] drm/i915: Detect page faults during hangcheck Daniel Vetter
@ 2015-01-21 10:05   ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2015-01-21 10:05 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Dec 05, 2014 at 10:03:42PM +0100, Daniel Vetter wrote:
> On Fri, Dec 05, 2014 at 02:15:21PM +0000, Chris Wilson wrote:
> > On Sandybridge+, the GPU provides the ERROR register for detecting page
> > faults. Hook this up to our hangcheck so that we can dump the error
> > state soon after such an event occurs. This would be better inside an
> > interrupt handler, but it serves a purpose here as it detects that our
> > initial context setup is invalid...
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c     | 5 +++++
> >  drivers/gpu/drm/i915/intel_uncore.c | 2 ++
> >  2 files changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 7913a72ce30a..eb2149b941e4 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -2969,6 +2969,11 @@ static void i915_hangcheck_elapsed(unsigned long data)
> >  	if (!i915.enable_hangcheck)
> >  		return;
> >  
> > +	if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) {
> > +		i915_handle_error(dev, false, "GPU reported a page fault");
> 
> Is the full hangcheck state actually useful for debugging these
> pagefaults? The gpu doesn't seem to fall over completely, so I guess ACTHD
> and friends are somewhere in nirvana.

Yes, it was. In this case ACTHD wasn't in nirvana, but close inspection
showed that the environment must have changed since the execution of the
batch and the GPU fault.
 
> Enabling the interrupts would definitely be useful though. I think all the
> handler code is written already (perhaps a few missing drm_debug lines),
> it's just that we don't enable these interrupt sources by default.

That would be even better.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2015-01-20 14:55   ` Mika Kuoppala
@ 2015-01-21 10:09     ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2015-01-21 10:09 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, Jan 20, 2015 at 04:55:56PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Currently we initialise the rings, add the first context switch to the
> > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > However, as we enable ppgtt using direct MMIO but load the PD using
> > MI_LRI, we end up executing the context switch and golden render state
> > with an invalid PD generating page faults. To solve this issue, first do
> > the ppgtt PD setup, then set the default context and write the commands
> > to run the render state into the ring, before we activate the ring. This
> > allows us to be sure that the register state is valid before we begin
> > execution.
> >
> > This was spotted when writing the seqno/request conversion, but only with
> > the ERROR capture did I realise that it was a necessity now.
> >
> > RFC: cleanup the error handling in i915_gem_init_hw.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> >  2 files changed, 16 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index c1c11418231b..c13842d3cbc9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> >  	 */
> >  	init_unused_rings(dev);
> >  
> > -	for_each_ring(ring, dev_priv, i) {
> > -		ret = ring->init_hw(ring);
> > -		if (ret)
> > -			return ret;
> > -	}
> > -
> >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> >  
> 
> Should we move this after the ppgtt has been init? (or alternatively
> let do_switch handle it)

I did the latter. We treat it as part of the context state.
 
> > +	ret = i915_ppgtt_init_hw(dev);
> > +	if (ret && ret != -EIO) {
> > +		DRM_ERROR("PPGTT enable failed %d\n", ret);
> > +		i915_gem_cleanup_ringbuffer(dev);
> > +	}
> > +
> >  	/*
> >  	 * XXX: Contexts should only be initialized once. Doing a switch to the
> >  	 * default context switch however is something we'd like to do after
> > @@ -4820,10 +4820,10 @@ i915_gem_init_hw(struct drm_device *dev)
> >  		return ret;
> >  	}
> 
> The comment needs tweaking as it reads 
> 
> /* Context switching requires rings (for * the do_switch), but before
>  * enabling PPGTT. So don't move this."
> */
> 
> If I read the patch correctly you want to keep the ring head/tail
> bookkeeping intact so that you can push the ctx_switch into the buffer
> before the actual ring hw is activated?

Yes.
 
> Does this then enable you to get rid of the whole pre emitting of
> l3remap stuff and just let the do_switch handle l3remap?

Yes. I even tried to get you to read patches that did that several
months ago...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2014-12-05 17:03           ` Ville Syrjälä
@ 2015-01-21 10:12             ` Chris Wilson
  2015-01-21 11:21               ` Ville Syrjälä
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2015-01-21 10:12 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Fri, Dec 05, 2014 at 07:03:42PM +0200, Ville Syrjälä wrote:
> On Fri, Dec 05, 2014 at 04:26:23PM +0000, Chris Wilson wrote:
> > On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote:
> > > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> > > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > > > > Currently we initialise the rings, add the first context switch to the
> > > > > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > > > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > > > > MI_LRI, we end up executing the context switch and golden render state
> > > > > > with an invalid PD generating page faults. To solve this issue, first do
> > > > > > the ppgtt PD setup, then set the default context and write the commands
> > > > > > to run the render state into the ring, before we activate the ring. This
> > > > > > allows us to be sure that the register state is valid before we begin
> > > > > > execution.
> > > > > > 
> > > > > > This was spotted when writing the seqno/request conversion, but only with
> > > > > > the ERROR capture did I realise that it was a necessity now.
> > > > > > 
> > > > > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > > > > 
> > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > ---
> > > > > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > > > > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > > > index c1c11418231b..c13842d3cbc9 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > > > > >  	 */
> > > > > >  	init_unused_rings(dev);
> > > > > >  
> > > > > > -	for_each_ring(ring, dev_priv, i) {
> > > > > > -		ret = ring->init_hw(ring);
> > > > > > -		if (ret)
> > > > > > -			return ret;
> > > > > > -	}
> > > > > > -
> > > > > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > > > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > > > > 
> > > > > This is going to assume ring->head/tail are already valid?
> > > > 
> > > > We write into the ring obj, not the ring itself, which should be setup
> > > > during the various intel_init_engine, i.e. the backing storage is
> > > > independent of the actual registers.
> > > 
> > > I mean the software shadows, not the registers themselves. When the GPU
> > > hangs I expect rign->head != ring->tail. So what makes those two identical
> > > again after the GPU reset?
> > 
> > Why would they be equal after reset?
> 
> They wouldn't. That's my whole point.
> 
> > At the moment, we discard all
> > outstanding requests which makes them equal.
> 
> OK, so you're saying somehwere we set them to some sane values. But
> I don't see any obvious code for that. The obvious code was in
> init_ring_common(), but you just removed said code in this patch.

When we clear the request list, we should be updating the position of
last known head, which should then be used to set reset the ring->head
(which in turn is used to reprogram the hw on reset/enable).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state
  2015-01-21 10:12             ` Chris Wilson
@ 2015-01-21 11:21               ` Ville Syrjälä
  0 siblings, 0 replies; 16+ messages in thread
From: Ville Syrjälä @ 2015-01-21 11:21 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Wed, Jan 21, 2015 at 10:12:12AM +0000, Chris Wilson wrote:
> On Fri, Dec 05, 2014 at 07:03:42PM +0200, Ville Syrjälä wrote:
> > On Fri, Dec 05, 2014 at 04:26:23PM +0000, Chris Wilson wrote:
> > > On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote:
> > > > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> > > > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > > > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > > > > > Currently we initialise the rings, add the first context switch to the
> > > > > > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > > > > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > > > > > MI_LRI, we end up executing the context switch and golden render state
> > > > > > > with an invalid PD generating page faults. To solve this issue, first do
> > > > > > > the ppgtt PD setup, then set the default context and write the commands
> > > > > > > to run the render state into the ring, before we activate the ring. This
> > > > > > > allows us to be sure that the register state is valid before we begin
> > > > > > > execution.
> > > > > > > 
> > > > > > > This was spotted when writing the seqno/request conversion, but only with
> > > > > > > the ERROR capture did I realise that it was a necessity now.
> > > > > > > 
> > > > > > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > > > > > 
> > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > > > > > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > > > > index c1c11418231b..c13842d3cbc9 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > > > > > >  	 */
> > > > > > >  	init_unused_rings(dev);
> > > > > > >  
> > > > > > > -	for_each_ring(ring, dev_priv, i) {
> > > > > > > -		ret = ring->init_hw(ring);
> > > > > > > -		if (ret)
> > > > > > > -			return ret;
> > > > > > > -	}
> > > > > > > -
> > > > > > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > > > > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > > > > > 
> > > > > > This is going to assume ring->head/tail are already valid?
> > > > > 
> > > > > We write into the ring obj, not the ring itself, which should be setup
> > > > > during the various intel_init_engine, i.e. the backing storage is
> > > > > independent of the actual registers.
> > > > 
> > > > I mean the software shadows, not the registers themselves. When the GPU
> > > > hangs I expect rign->head != ring->tail. So what makes those two identical
> > > > again after the GPU reset?
> > > 
> > > Why would they be equal after reset?
> > 
> > They wouldn't. That's my whole point.
> > 
> > > At the moment, we discard all
> > > outstanding requests which makes them equal.
> > 
> > OK, so you're saying somehwere we set them to some sane values. But
> > I don't see any obvious code for that. The obvious code was in
> > init_ring_common(), but you just removed said code in this patch.
> 
> When we clear the request list, we should be updating the position of
> last known head, which should then be used to set reset the ring->head
> (which in turn is used to reprogram the hw on reset/enable).

It's been a while, but IIRC the problem seemed to me that we don't
call i915_gem_retire_requests_ring() on reset and hence never update
ring->last_retired_head. And even if we did, we don't seem to call
intel_ring_update_space() until the end of init_ring_common() so we
wouldn't do the ring->head=ring->last_retired_head assignment either.
Thus ring->head would be left as its pre-reset value.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-01-21 11:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-05 14:15 [PATCH 1/2] drm/i915: Detect page faults during hangcheck Chris Wilson
2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson
2014-12-05 14:31   ` Ville Syrjälä
2014-12-05 14:38     ` Chris Wilson
2014-12-05 14:53       ` Ville Syrjälä
2014-12-05 16:26         ` Chris Wilson
2014-12-05 17:03           ` Ville Syrjälä
2015-01-21 10:12             ` Chris Wilson
2015-01-21 11:21               ` Ville Syrjälä
2014-12-05 14:58       ` Daniel Vetter
2014-12-05 16:23         ` Chris Wilson
2014-12-05 21:01           ` Daniel Vetter
2015-01-20 14:55   ` Mika Kuoppala
2015-01-21 10:09     ` Chris Wilson
2014-12-05 21:03 ` [PATCH 1/2] drm/i915: Detect page faults during hangcheck Daniel Vetter
2015-01-21 10:05   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.