* [PATCH 1/2] drm/i915: Detect page faults during hangcheck @ 2014-12-05 14:15 Chris Wilson 2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson 2014-12-05 21:03 ` [PATCH 1/2] drm/i915: Detect page faults during hangcheck Daniel Vetter 0 siblings, 2 replies; 16+ messages in thread From: Chris Wilson @ 2014-12-05 14:15 UTC (permalink / raw) To: intel-gfx On Sandybridge+, the GPU provides the ERROR register for detecting page faults. Hook this up to our hangcheck so that we can dump the error state soon after such an event occurs. This would be better inside an interrupt handler, but it serves a purpose here as it detects that our initial context setup is invalid... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_irq.c | 5 +++++ drivers/gpu/drm/i915/intel_uncore.c | 2 ++ 2 files changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 7913a72ce30a..eb2149b941e4 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2969,6 +2969,11 @@ static void i915_hangcheck_elapsed(unsigned long data) if (!i915.enable_hangcheck) return; + if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) { + i915_handle_error(dev, false, "GPU reported a page fault"); + I915_WRITE(ERROR_GEN6, 0); + } + for_each_ring(ring, dev_priv, i) { u64 acthd; u32 seqno; diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 0655b44651a7..67f2e24c5bc5 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -535,6 +535,8 @@ static void __intel_uncore_early_sanitize(struct drm_device *dev, if (IS_GEN6(dev) || IS_GEN7(dev)) __raw_i915_write32(dev_priv, GTFIFODBG, __raw_i915_read32(dev_priv, GTFIFODBG)); + if (INTEL_INFO(dev)->gen >= 6) + __raw_i915_write32(dev_priv, ERROR_GEN6, 0); intel_uncore_forcewake_reset(dev, restore_forcewake); } -- 2.1.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 14:15 [PATCH 1/2] drm/i915: Detect page faults during hangcheck Chris Wilson @ 2014-12-05 14:15 ` Chris Wilson 2014-12-05 14:31 ` Ville Syrjälä 2015-01-20 14:55 ` Mika Kuoppala 2014-12-05 21:03 ` [PATCH 1/2] drm/i915: Detect page faults during hangcheck Daniel Vetter 1 sibling, 2 replies; 16+ messages in thread From: Chris Wilson @ 2014-12-05 14:15 UTC (permalink / raw) To: intel-gfx Currently we initialise the rings, add the first context switch to the ring and execute our golden state then enable (aliasing or full) ppgtt. However, as we enable ppgtt using direct MMIO but load the PD using MI_LRI, we end up executing the context switch and golden render state with an invalid PD generating page faults. To solve this issue, first do the ppgtt PD setup, then set the default context and write the commands to run the render state into the ring, before we activate the ring. This allows us to be sure that the register state is valid before we begin execution. This was spotted when writing the seqno/request conversion, but only with the ERROR capture did I realise that it was a necessity now. RFC: cleanup the error handling in i915_gem_init_hw. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- 2 files changed, 16 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c1c11418231b..c13842d3cbc9 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) */ init_unused_rings(dev); - for_each_ring(ring, dev_priv, i) { - ret = ring->init_hw(ring); - if (ret) - return ret; - } - for (i = 0; i < NUM_L3_SLICES(dev); i++) i915_gem_l3_remap(&dev_priv->ring[RCS], i); + ret = i915_ppgtt_init_hw(dev); + if (ret && ret != -EIO) { + DRM_ERROR("PPGTT enable failed %d\n", ret); + i915_gem_cleanup_ringbuffer(dev); + } + /* * XXX: Contexts should only be initialized once. Doing a switch to the * default context switch however is something we'd like to do after @@ -4820,10 +4820,10 @@ i915_gem_init_hw(struct drm_device *dev) return ret; } - ret = i915_ppgtt_init_hw(dev); - if (ret && ret != -EIO) { - DRM_ERROR("PPGTT enable failed %d\n", ret); - i915_gem_cleanup_ringbuffer(dev); + for_each_ring(ring, dev_priv, i) { + ret = ring->init_hw(ring); + if (ret) + return ret; } return ret; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index de38b04135d2..95ebce73d46d 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -585,6 +585,9 @@ static int init_ring_common(struct intel_engine_cs *ring) I915_WRITE_HEAD(ring, 0); (void)I915_READ_HEAD(ring); + I915_WRITE_HEAD(ring, ringbuf->head); + I915_WRITE_TAIL(ring, ringbuf->head); + I915_WRITE_CTL(ring, ((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID); @@ -592,7 +595,7 @@ static int init_ring_common(struct intel_engine_cs *ring) /* If the head is still not zero, the ring is dead */ if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 && I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) && - (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) { + (I915_READ_HEAD(ring) & HEAD_ADDR) == ringbuf->head, 50)) { DRM_ERROR("%s initialization failed " "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n", ring->name, @@ -603,9 +606,9 @@ static int init_ring_common(struct intel_engine_cs *ring) goto out; } + I915_WRITE_TAIL(ring, ringbuf->tail); + ringbuf->last_retired_head = -1; - ringbuf->head = I915_READ_HEAD(ring); - ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR; intel_ring_update_space(ringbuf); memset(&ring->hangcheck, 0, sizeof(ring->hangcheck)); -- 2.1.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson @ 2014-12-05 14:31 ` Ville Syrjälä 2014-12-05 14:38 ` Chris Wilson 2015-01-20 14:55 ` Mika Kuoppala 1 sibling, 1 reply; 16+ messages in thread From: Ville Syrjälä @ 2014-12-05 14:31 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > Currently we initialise the rings, add the first context switch to the > ring and execute our golden state then enable (aliasing or full) ppgtt. > However, as we enable ppgtt using direct MMIO but load the PD using > MI_LRI, we end up executing the context switch and golden render state > with an invalid PD generating page faults. To solve this issue, first do > the ppgtt PD setup, then set the default context and write the commands > to run the render state into the ring, before we activate the ring. This > allows us to be sure that the register state is valid before we begin > execution. > > This was spotted when writing the seqno/request conversion, but only with > the ERROR capture did I realise that it was a necessity now. > > RFC: cleanup the error handling in i915_gem_init_hw. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > 2 files changed, 16 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index c1c11418231b..c13842d3cbc9 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > */ > init_unused_rings(dev); > > - for_each_ring(ring, dev_priv, i) { > - ret = ring->init_hw(ring); > - if (ret) > - return ret; > - } > - > for (i = 0; i < NUM_L3_SLICES(dev); i++) > i915_gem_l3_remap(&dev_priv->ring[RCS], i); This is going to assume ring->head/tail are already valid? > > + ret = i915_ppgtt_init_hw(dev); > + if (ret && ret != -EIO) { > + DRM_ERROR("PPGTT enable failed %d\n", ret); > + i915_gem_cleanup_ringbuffer(dev); > + } > + > /* > * XXX: Contexts should only be initialized once. Doing a switch to the > * default context switch however is something we'd like to do after > @@ -4820,10 +4820,10 @@ i915_gem_init_hw(struct drm_device *dev) > return ret; > } > > - ret = i915_ppgtt_init_hw(dev); > - if (ret && ret != -EIO) { > - DRM_ERROR("PPGTT enable failed %d\n", ret); > - i915_gem_cleanup_ringbuffer(dev); > + for_each_ring(ring, dev_priv, i) { > + ret = ring->init_hw(ring); > + if (ret) > + return ret; > } > > return ret; > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index de38b04135d2..95ebce73d46d 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -585,6 +585,9 @@ static int init_ring_common(struct intel_engine_cs *ring) > I915_WRITE_HEAD(ring, 0); > (void)I915_READ_HEAD(ring); > > + I915_WRITE_HEAD(ring, ringbuf->head); > + I915_WRITE_TAIL(ring, ringbuf->head); Here too. So what happens on GPU reset? > + > I915_WRITE_CTL(ring, > ((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) > | RING_VALID); > @@ -592,7 +595,7 @@ static int init_ring_common(struct intel_engine_cs *ring) > /* If the head is still not zero, the ring is dead */ > if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 && > I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) && > - (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) { > + (I915_READ_HEAD(ring) & HEAD_ADDR) == ringbuf->head, 50)) { > DRM_ERROR("%s initialization failed " > "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n", > ring->name, > @@ -603,9 +606,9 @@ static int init_ring_common(struct intel_engine_cs *ring) > goto out; > } > > + I915_WRITE_TAIL(ring, ringbuf->tail); > + > ringbuf->last_retired_head = -1; > - ringbuf->head = I915_READ_HEAD(ring); > - ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR; > intel_ring_update_space(ringbuf); > > memset(&ring->hangcheck, 0, sizeof(ring->hangcheck)); > -- > 2.1.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 14:31 ` Ville Syrjälä @ 2014-12-05 14:38 ` Chris Wilson 2014-12-05 14:53 ` Ville Syrjälä 2014-12-05 14:58 ` Daniel Vetter 0 siblings, 2 replies; 16+ messages in thread From: Chris Wilson @ 2014-12-05 14:38 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote: > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > > Currently we initialise the rings, add the first context switch to the > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > However, as we enable ppgtt using direct MMIO but load the PD using > > MI_LRI, we end up executing the context switch and golden render state > > with an invalid PD generating page faults. To solve this issue, first do > > the ppgtt PD setup, then set the default context and write the commands > > to run the render state into the ring, before we activate the ring. This > > allows us to be sure that the register state is valid before we begin > > execution. > > > > This was spotted when writing the seqno/request conversion, but only with > > the ERROR capture did I realise that it was a necessity now. > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index c1c11418231b..c13842d3cbc9 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > */ > > init_unused_rings(dev); > > > > - for_each_ring(ring, dev_priv, i) { > > - ret = ring->init_hw(ring); > > - if (ret) > > - return ret; > > - } > > - > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > This is going to assume ring->head/tail are already valid? We write into the ring obj, not the ring itself, which should be setup during the various intel_init_engine, i.e. the backing storage is independent of the actual registers. If you take it to the logical conclusion, each operation on the ring is just a transaction that is buffered until it is commited to the hardware, or rolled back in its entirety if aborted. We can then queue up multiple transactions before first enabling the hardware, or equally preserve such transactions across a GPU reset. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 14:38 ` Chris Wilson @ 2014-12-05 14:53 ` Ville Syrjälä 2014-12-05 16:26 ` Chris Wilson 2014-12-05 14:58 ` Daniel Vetter 1 sibling, 1 reply; 16+ messages in thread From: Ville Syrjälä @ 2014-12-05 14:53 UTC (permalink / raw) To: Chris Wilson, intel-gfx On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote: > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote: > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > > > Currently we initialise the rings, add the first context switch to the > > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > > However, as we enable ppgtt using direct MMIO but load the PD using > > > MI_LRI, we end up executing the context switch and golden render state > > > with an invalid PD generating page faults. To solve this issue, first do > > > the ppgtt PD setup, then set the default context and write the commands > > > to run the render state into the ring, before we activate the ring. This > > > allows us to be sure that the register state is valid before we begin > > > execution. > > > > > > This was spotted when writing the seqno/request conversion, but only with > > > the ERROR capture did I realise that it was a necessity now. > > > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > --- > > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > > index c1c11418231b..c13842d3cbc9 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > > */ > > > init_unused_rings(dev); > > > > > > - for_each_ring(ring, dev_priv, i) { > > > - ret = ring->init_hw(ring); > > > - if (ret) > > > - return ret; > > > - } > > > - > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > > > This is going to assume ring->head/tail are already valid? > > We write into the ring obj, not the ring itself, which should be setup > during the various intel_init_engine, i.e. the backing storage is > independent of the actual registers. I mean the software shadows, not the registers themselves. When the GPU hangs I expect rign->head != ring->tail. So what makes those two identical again after the GPU reset? > > If you take it to the logical conclusion, each operation on the ring is > just a transaction that is buffered until it is commited to the > hardware, or rolled back in its entirety if aborted. We can then queue > up multiple transactions before first enabling the hardware, or equally > preserve such transactions across a GPU reset. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 14:53 ` Ville Syrjälä @ 2014-12-05 16:26 ` Chris Wilson 2014-12-05 17:03 ` Ville Syrjälä 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2014-12-05 16:26 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote: > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote: > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote: > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > > > > Currently we initialise the rings, add the first context switch to the > > > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > > > However, as we enable ppgtt using direct MMIO but load the PD using > > > > MI_LRI, we end up executing the context switch and golden render state > > > > with an invalid PD generating page faults. To solve this issue, first do > > > > the ppgtt PD setup, then set the default context and write the commands > > > > to run the render state into the ring, before we activate the ring. This > > > > allows us to be sure that the register state is valid before we begin > > > > execution. > > > > > > > > This was spotted when writing the seqno/request conversion, but only with > > > > the ERROR capture did I realise that it was a necessity now. > > > > > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > --- > > > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > > > index c1c11418231b..c13842d3cbc9 100644 > > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > > > */ > > > > init_unused_rings(dev); > > > > > > > > - for_each_ring(ring, dev_priv, i) { > > > > - ret = ring->init_hw(ring); > > > > - if (ret) > > > > - return ret; > > > > - } > > > > - > > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > > > > > This is going to assume ring->head/tail are already valid? > > > > We write into the ring obj, not the ring itself, which should be setup > > during the various intel_init_engine, i.e. the backing storage is > > independent of the actual registers. > > I mean the software shadows, not the registers themselves. When the GPU > hangs I expect rign->head != ring->tail. So what makes those two identical > again after the GPU reset? Why would they be equal after reset? At the moment, we discard all outstanding requests which makes them equal. We have been discussing the need to execute pending batches from non-guitly Batches for yonks, in which case we would not expect ring->head == ring->tail after a reset. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 16:26 ` Chris Wilson @ 2014-12-05 17:03 ` Ville Syrjälä 2015-01-21 10:12 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Ville Syrjälä @ 2014-12-05 17:03 UTC (permalink / raw) To: Chris Wilson, intel-gfx On Fri, Dec 05, 2014 at 04:26:23PM +0000, Chris Wilson wrote: > On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote: > > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote: > > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote: > > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > > > > > Currently we initialise the rings, add the first context switch to the > > > > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > > > > However, as we enable ppgtt using direct MMIO but load the PD using > > > > > MI_LRI, we end up executing the context switch and golden render state > > > > > with an invalid PD generating page faults. To solve this issue, first do > > > > > the ppgtt PD setup, then set the default context and write the commands > > > > > to run the render state into the ring, before we activate the ring. This > > > > > allows us to be sure that the register state is valid before we begin > > > > > execution. > > > > > > > > > > This was spotted when writing the seqno/request conversion, but only with > > > > > the ERROR capture did I realise that it was a necessity now. > > > > > > > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > --- > > > > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > > > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > > > > index c1c11418231b..c13842d3cbc9 100644 > > > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > > > > */ > > > > > init_unused_rings(dev); > > > > > > > > > > - for_each_ring(ring, dev_priv, i) { > > > > > - ret = ring->init_hw(ring); > > > > > - if (ret) > > > > > - return ret; > > > > > - } > > > > > - > > > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > > > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > > > > > > > This is going to assume ring->head/tail are already valid? > > > > > > We write into the ring obj, not the ring itself, which should be setup > > > during the various intel_init_engine, i.e. the backing storage is > > > independent of the actual registers. > > > > I mean the software shadows, not the registers themselves. When the GPU > > hangs I expect rign->head != ring->tail. So what makes those two identical > > again after the GPU reset? > > Why would they be equal after reset? They wouldn't. That's my whole point. > At the moment, we discard all > outstanding requests which makes them equal. OK, so you're saying somehwere we set them to some sane values. But I don't see any obvious code for that. The obvious code was in init_ring_common(), but you just removed said code in this patch. So unless I'm totally misreading the code, ->head and ->tail will be left at their pre-reset values, then i915_gem_l3_remap() will pile on some new stuff at ->tail, and then later you restart the ring from ->head, which would still include all the discarded junk from before the reset. > We have been discussing the > need to execute pending batches from non-guitly Batches for yonks, in > which case we would not expect ring->head == ring->tail after a reset. Yeah, but let's not confuse the discussion with advanced topics. -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 17:03 ` Ville Syrjälä @ 2015-01-21 10:12 ` Chris Wilson 2015-01-21 11:21 ` Ville Syrjälä 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-01-21 10:12 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx On Fri, Dec 05, 2014 at 07:03:42PM +0200, Ville Syrjälä wrote: > On Fri, Dec 05, 2014 at 04:26:23PM +0000, Chris Wilson wrote: > > On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote: > > > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote: > > > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote: > > > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > > > > > > Currently we initialise the rings, add the first context switch to the > > > > > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > > > > > However, as we enable ppgtt using direct MMIO but load the PD using > > > > > > MI_LRI, we end up executing the context switch and golden render state > > > > > > with an invalid PD generating page faults. To solve this issue, first do > > > > > > the ppgtt PD setup, then set the default context and write the commands > > > > > > to run the render state into the ring, before we activate the ring. This > > > > > > allows us to be sure that the register state is valid before we begin > > > > > > execution. > > > > > > > > > > > > This was spotted when writing the seqno/request conversion, but only with > > > > > > the ERROR capture did I realise that it was a necessity now. > > > > > > > > > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > --- > > > > > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > > > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > > > > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > > > > > index c1c11418231b..c13842d3cbc9 100644 > > > > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > > > > > */ > > > > > > init_unused_rings(dev); > > > > > > > > > > > > - for_each_ring(ring, dev_priv, i) { > > > > > > - ret = ring->init_hw(ring); > > > > > > - if (ret) > > > > > > - return ret; > > > > > > - } > > > > > > - > > > > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > > > > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > > > > > > > > > This is going to assume ring->head/tail are already valid? > > > > > > > > We write into the ring obj, not the ring itself, which should be setup > > > > during the various intel_init_engine, i.e. the backing storage is > > > > independent of the actual registers. > > > > > > I mean the software shadows, not the registers themselves. When the GPU > > > hangs I expect rign->head != ring->tail. So what makes those two identical > > > again after the GPU reset? > > > > Why would they be equal after reset? > > They wouldn't. That's my whole point. > > > At the moment, we discard all > > outstanding requests which makes them equal. > > OK, so you're saying somehwere we set them to some sane values. But > I don't see any obvious code for that. The obvious code was in > init_ring_common(), but you just removed said code in this patch. When we clear the request list, we should be updating the position of last known head, which should then be used to set reset the ring->head (which in turn is used to reprogram the hw on reset/enable). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2015-01-21 10:12 ` Chris Wilson @ 2015-01-21 11:21 ` Ville Syrjälä 0 siblings, 0 replies; 16+ messages in thread From: Ville Syrjälä @ 2015-01-21 11:21 UTC (permalink / raw) To: Chris Wilson, intel-gfx On Wed, Jan 21, 2015 at 10:12:12AM +0000, Chris Wilson wrote: > On Fri, Dec 05, 2014 at 07:03:42PM +0200, Ville Syrjälä wrote: > > On Fri, Dec 05, 2014 at 04:26:23PM +0000, Chris Wilson wrote: > > > On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote: > > > > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote: > > > > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote: > > > > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > > > > > > > Currently we initialise the rings, add the first context switch to the > > > > > > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > > > > > > However, as we enable ppgtt using direct MMIO but load the PD using > > > > > > > MI_LRI, we end up executing the context switch and golden render state > > > > > > > with an invalid PD generating page faults. To solve this issue, first do > > > > > > > the ppgtt PD setup, then set the default context and write the commands > > > > > > > to run the render state into the ring, before we activate the ring. This > > > > > > > allows us to be sure that the register state is valid before we begin > > > > > > > execution. > > > > > > > > > > > > > > This was spotted when writing the seqno/request conversion, but only with > > > > > > > the ERROR capture did I realise that it was a necessity now. > > > > > > > > > > > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > > > > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > > --- > > > > > > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > > > > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > > > > > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > > > > > > index c1c11418231b..c13842d3cbc9 100644 > > > > > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > > > > > > */ > > > > > > > init_unused_rings(dev); > > > > > > > > > > > > > > - for_each_ring(ring, dev_priv, i) { > > > > > > > - ret = ring->init_hw(ring); > > > > > > > - if (ret) > > > > > > > - return ret; > > > > > > > - } > > > > > > > - > > > > > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > > > > > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > > > > > > > > > > > This is going to assume ring->head/tail are already valid? > > > > > > > > > > We write into the ring obj, not the ring itself, which should be setup > > > > > during the various intel_init_engine, i.e. the backing storage is > > > > > independent of the actual registers. > > > > > > > > I mean the software shadows, not the registers themselves. When the GPU > > > > hangs I expect rign->head != ring->tail. So what makes those two identical > > > > again after the GPU reset? > > > > > > Why would they be equal after reset? > > > > They wouldn't. That's my whole point. > > > > > At the moment, we discard all > > > outstanding requests which makes them equal. > > > > OK, so you're saying somehwere we set them to some sane values. But > > I don't see any obvious code for that. The obvious code was in > > init_ring_common(), but you just removed said code in this patch. > > When we clear the request list, we should be updating the position of > last known head, which should then be used to set reset the ring->head > (which in turn is used to reprogram the hw on reset/enable). It's been a while, but IIRC the problem seemed to me that we don't call i915_gem_retire_requests_ring() on reset and hence never update ring->last_retired_head. And even if we did, we don't seem to call intel_ring_update_space() until the end of init_ring_common() so we wouldn't do the ring->head=ring->last_retired_head assignment either. Thus ring->head would be left as its pre-reset value. -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 14:38 ` Chris Wilson 2014-12-05 14:53 ` Ville Syrjälä @ 2014-12-05 14:58 ` Daniel Vetter 2014-12-05 16:23 ` Chris Wilson 1 sibling, 1 reply; 16+ messages in thread From: Daniel Vetter @ 2014-12-05 14:58 UTC (permalink / raw) To: Chris Wilson, Ville Syrjälä, intel-gfx On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote: > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote: > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > > > Currently we initialise the rings, add the first context switch to the > > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > > However, as we enable ppgtt using direct MMIO but load the PD using > > > MI_LRI, we end up executing the context switch and golden render state > > > with an invalid PD generating page faults. To solve this issue, first do > > > the ppgtt PD setup, then set the default context and write the commands > > > to run the render state into the ring, before we activate the ring. This > > > allows us to be sure that the register state is valid before we begin > > > execution. > > > > > > This was spotted when writing the seqno/request conversion, but only with > > > the ERROR capture did I realise that it was a necessity now. > > > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > --- > > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > > index c1c11418231b..c13842d3cbc9 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > > */ > > > init_unused_rings(dev); > > > > > > - for_each_ring(ring, dev_priv, i) { > > > - ret = ring->init_hw(ring); > > > - if (ret) > > > - return ret; > > > - } > > > - > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > > > This is going to assume ring->head/tail are already valid? > > We write into the ring obj, not the ring itself, which should be setup > during the various intel_init_engine, i.e. the backing storage is > independent of the actual registers. But there's still intel_ring_advance which calls ->write_tail all over the place. So we drop all these mmio writes into nirvana since we'll reset the ring later on? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 14:58 ` Daniel Vetter @ 2014-12-05 16:23 ` Chris Wilson 2014-12-05 21:01 ` Daniel Vetter 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2014-12-05 16:23 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx On Fri, Dec 05, 2014 at 03:58:40PM +0100, Daniel Vetter wrote: > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote: > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote: > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > > > > Currently we initialise the rings, add the first context switch to the > > > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > > > However, as we enable ppgtt using direct MMIO but load the PD using > > > > MI_LRI, we end up executing the context switch and golden render state > > > > with an invalid PD generating page faults. To solve this issue, first do > > > > the ppgtt PD setup, then set the default context and write the commands > > > > to run the render state into the ring, before we activate the ring. This > > > > allows us to be sure that the register state is valid before we begin > > > > execution. > > > > > > > > This was spotted when writing the seqno/request conversion, but only with > > > > the ERROR capture did I realise that it was a necessity now. > > > > > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > --- > > > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > > > index c1c11418231b..c13842d3cbc9 100644 > > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > > > */ > > > > init_unused_rings(dev); > > > > > > > > - for_each_ring(ring, dev_priv, i) { > > > > - ret = ring->init_hw(ring); > > > > - if (ret) > > > > - return ret; > > > > - } > > > > - > > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > > > > > This is going to assume ring->head/tail are already valid? > > > > We write into the ring obj, not the ring itself, which should be setup > > during the various intel_init_engine, i.e. the backing storage is > > independent of the actual registers. > > But there's still intel_ring_advance which calls ->write_tail all over the > place. So we drop all these mmio writes into nirvana since we'll reset the > ring later on? intel_ring_advance() doesn't do the register update, it just updates ring->tail. And even if it did, whilst the ring is disabled, nobody is listening, right? -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 16:23 ` Chris Wilson @ 2014-12-05 21:01 ` Daniel Vetter 0 siblings, 0 replies; 16+ messages in thread From: Daniel Vetter @ 2014-12-05 21:01 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, Ville Syrjälä, intel-gfx On Fri, Dec 05, 2014 at 04:23:56PM +0000, Chris Wilson wrote: > On Fri, Dec 05, 2014 at 03:58:40PM +0100, Daniel Vetter wrote: > > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote: > > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote: > > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote: > > > > > Currently we initialise the rings, add the first context switch to the > > > > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > > > > However, as we enable ppgtt using direct MMIO but load the PD using > > > > > MI_LRI, we end up executing the context switch and golden render state > > > > > with an invalid PD generating page faults. To solve this issue, first do > > > > > the ppgtt PD setup, then set the default context and write the commands > > > > > to run the render state into the ring, before we activate the ring. This > > > > > allows us to be sure that the register state is valid before we begin > > > > > execution. > > > > > > > > > > This was spotted when writing the seqno/request conversion, but only with > > > > > the ERROR capture did I realise that it was a necessity now. > > > > > > > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > --- > > > > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > > > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > > > > index c1c11418231b..c13842d3cbc9 100644 > > > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > > > > */ > > > > > init_unused_rings(dev); > > > > > > > > > > - for_each_ring(ring, dev_priv, i) { > > > > > - ret = ring->init_hw(ring); > > > > > - if (ret) > > > > > - return ret; > > > > > - } > > > > > - > > > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > > > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > > > > > > > This is going to assume ring->head/tail are already valid? > > > > > > We write into the ring obj, not the ring itself, which should be setup > > > during the various intel_init_engine, i.e. the backing storage is > > > independent of the actual registers. > > > > But there's still intel_ring_advance which calls ->write_tail all over the > > place. So we drop all these mmio writes into nirvana since we'll reset the > > ring later on? > > intel_ring_advance() doesn't do the register update, it just updates > ring->tail. And even if it did, whilst the ring is disabled, nobody is > listening, right? Oh right, mixed that up. So all well from my pov for this patch. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson 2014-12-05 14:31 ` Ville Syrjälä @ 2015-01-20 14:55 ` Mika Kuoppala 2015-01-21 10:09 ` Chris Wilson 1 sibling, 1 reply; 16+ messages in thread From: Mika Kuoppala @ 2015-01-20 14:55 UTC (permalink / raw) To: Chris Wilson, intel-gfx Chris Wilson <chris@chris-wilson.co.uk> writes: > Currently we initialise the rings, add the first context switch to the > ring and execute our golden state then enable (aliasing or full) ppgtt. > However, as we enable ppgtt using direct MMIO but load the PD using > MI_LRI, we end up executing the context switch and golden render state > with an invalid PD generating page faults. To solve this issue, first do > the ppgtt PD setup, then set the default context and write the commands > to run the render state into the ring, before we activate the ring. This > allows us to be sure that the register state is valid before we begin > execution. > > This was spotted when writing the seqno/request conversion, but only with > the ERROR capture did I realise that it was a necessity now. > > RFC: cleanup the error handling in i915_gem_init_hw. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > 2 files changed, 16 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index c1c11418231b..c13842d3cbc9 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > */ > init_unused_rings(dev); > > - for_each_ring(ring, dev_priv, i) { > - ret = ring->init_hw(ring); > - if (ret) > - return ret; > - } > - > for (i = 0; i < NUM_L3_SLICES(dev); i++) > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > Should we move this after the ppgtt has been init? (or alternatively let do_switch handle it) > + ret = i915_ppgtt_init_hw(dev); > + if (ret && ret != -EIO) { > + DRM_ERROR("PPGTT enable failed %d\n", ret); > + i915_gem_cleanup_ringbuffer(dev); > + } > + > /* > * XXX: Contexts should only be initialized once. Doing a switch to the > * default context switch however is something we'd like to do after > @@ -4820,10 +4820,10 @@ i915_gem_init_hw(struct drm_device *dev) > return ret; > } The comment needs tweaking as it reads /* Context switching requires rings (for * the do_switch), but before * enabling PPGTT. So don't move this." */ If I read the patch correctly you want to keep the ring head/tail bookkeeping intact so that you can push the ctx_switch into the buffer before the actual ring hw is activated? Does this then enable you to get rid of the whole pre emitting of l3remap stuff and just let the do_switch handle l3remap? -Mika > - ret = i915_ppgtt_init_hw(dev); > - if (ret && ret != -EIO) { > - DRM_ERROR("PPGTT enable failed %d\n", ret); > - i915_gem_cleanup_ringbuffer(dev); > + for_each_ring(ring, dev_priv, i) { > + ret = ring->init_hw(ring); > + if (ret) > + return ret; > } > > return ret; > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index de38b04135d2..95ebce73d46d 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -585,6 +585,9 @@ static int init_ring_common(struct intel_engine_cs *ring) > I915_WRITE_HEAD(ring, 0); > (void)I915_READ_HEAD(ring); > > + I915_WRITE_HEAD(ring, ringbuf->head); > + I915_WRITE_TAIL(ring, ringbuf->head); > + > I915_WRITE_CTL(ring, > ((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) > | RING_VALID); > @@ -592,7 +595,7 @@ static int init_ring_common(struct intel_engine_cs *ring) > /* If the head is still not zero, the ring is dead */ > if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 && > I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) && > - (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) { > + (I915_READ_HEAD(ring) & HEAD_ADDR) == ringbuf->head, 50)) { > DRM_ERROR("%s initialization failed " > "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n", > ring->name, > @@ -603,9 +606,9 @@ static int init_ring_common(struct intel_engine_cs *ring) > goto out; > } > > + I915_WRITE_TAIL(ring, ringbuf->tail); > + > ringbuf->last_retired_head = -1; > - ringbuf->head = I915_READ_HEAD(ring); > - ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR; > intel_ring_update_space(ringbuf); > > memset(&ring->hangcheck, 0, sizeof(ring->hangcheck)); > -- > 2.1.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2015-01-20 14:55 ` Mika Kuoppala @ 2015-01-21 10:09 ` Chris Wilson 0 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2015-01-21 10:09 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx On Tue, Jan 20, 2015 at 04:55:56PM +0200, Mika Kuoppala wrote: > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > Currently we initialise the rings, add the first context switch to the > > ring and execute our golden state then enable (aliasing or full) ppgtt. > > However, as we enable ppgtt using direct MMIO but load the PD using > > MI_LRI, we end up executing the context switch and golden render state > > with an invalid PD generating page faults. To solve this issue, first do > > the ppgtt PD setup, then set the default context and write the commands > > to run the render state into the ring, before we activate the ring. This > > allows us to be sure that the register state is valid before we begin > > execution. > > > > This was spotted when writing the seqno/request conversion, but only with > > the ERROR capture did I realise that it was a necessity now. > > > > RFC: cleanup the error handling in i915_gem_init_hw. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++---------- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > > 2 files changed, 16 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index c1c11418231b..c13842d3cbc9 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev) > > */ > > init_unused_rings(dev); > > > > - for_each_ring(ring, dev_priv, i) { > > - ret = ring->init_hw(ring); > > - if (ret) > > - return ret; > > - } > > - > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > i915_gem_l3_remap(&dev_priv->ring[RCS], i); > > > > Should we move this after the ppgtt has been init? (or alternatively > let do_switch handle it) I did the latter. We treat it as part of the context state. > > + ret = i915_ppgtt_init_hw(dev); > > + if (ret && ret != -EIO) { > > + DRM_ERROR("PPGTT enable failed %d\n", ret); > > + i915_gem_cleanup_ringbuffer(dev); > > + } > > + > > /* > > * XXX: Contexts should only be initialized once. Doing a switch to the > > * default context switch however is something we'd like to do after > > @@ -4820,10 +4820,10 @@ i915_gem_init_hw(struct drm_device *dev) > > return ret; > > } > > The comment needs tweaking as it reads > > /* Context switching requires rings (for * the do_switch), but before > * enabling PPGTT. So don't move this." > */ > > If I read the patch correctly you want to keep the ring head/tail > bookkeeping intact so that you can push the ctx_switch into the buffer > before the actual ring hw is activated? Yes. > Does this then enable you to get rid of the whole pre emitting of > l3remap stuff and just let the do_switch handle l3remap? Yes. I even tried to get you to read patches that did that several months ago... -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] drm/i915: Detect page faults during hangcheck 2014-12-05 14:15 [PATCH 1/2] drm/i915: Detect page faults during hangcheck Chris Wilson 2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson @ 2014-12-05 21:03 ` Daniel Vetter 2015-01-21 10:05 ` Chris Wilson 1 sibling, 1 reply; 16+ messages in thread From: Daniel Vetter @ 2014-12-05 21:03 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx On Fri, Dec 05, 2014 at 02:15:21PM +0000, Chris Wilson wrote: > On Sandybridge+, the GPU provides the ERROR register for detecting page > faults. Hook this up to our hangcheck so that we can dump the error > state soon after such an event occurs. This would be better inside an > interrupt handler, but it serves a purpose here as it detects that our > initial context setup is invalid... > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_irq.c | 5 +++++ > drivers/gpu/drm/i915/intel_uncore.c | 2 ++ > 2 files changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index 7913a72ce30a..eb2149b941e4 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -2969,6 +2969,11 @@ static void i915_hangcheck_elapsed(unsigned long data) > if (!i915.enable_hangcheck) > return; > > + if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) { > + i915_handle_error(dev, false, "GPU reported a page fault"); Is the full hangcheck state actually useful for debugging these pagefaults? The gpu doesn't seem to fall over completely, so I guess ACTHD and friends are somewhere in nirvana. Enabling the interrupts would definitely be useful though. I think all the handler code is written already (perhaps a few missing drm_debug lines), it's just that we don't enable these interrupt sources by default. -Daniel > + I915_WRITE(ERROR_GEN6, 0); > + } > + > for_each_ring(ring, dev_priv, i) { > u64 acthd; > u32 seqno; > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index 0655b44651a7..67f2e24c5bc5 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -535,6 +535,8 @@ static void __intel_uncore_early_sanitize(struct drm_device *dev, > if (IS_GEN6(dev) || IS_GEN7(dev)) > __raw_i915_write32(dev_priv, GTFIFODBG, > __raw_i915_read32(dev_priv, GTFIFODBG)); > + if (INTEL_INFO(dev)->gen >= 6) > + __raw_i915_write32(dev_priv, ERROR_GEN6, 0); > > intel_uncore_forcewake_reset(dev, restore_forcewake); > } > -- > 2.1.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] drm/i915: Detect page faults during hangcheck 2014-12-05 21:03 ` [PATCH 1/2] drm/i915: Detect page faults during hangcheck Daniel Vetter @ 2015-01-21 10:05 ` Chris Wilson 0 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2015-01-21 10:05 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx On Fri, Dec 05, 2014 at 10:03:42PM +0100, Daniel Vetter wrote: > On Fri, Dec 05, 2014 at 02:15:21PM +0000, Chris Wilson wrote: > > On Sandybridge+, the GPU provides the ERROR register for detecting page > > faults. Hook this up to our hangcheck so that we can dump the error > > state soon after such an event occurs. This would be better inside an > > interrupt handler, but it serves a purpose here as it detects that our > > initial context setup is invalid... > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_irq.c | 5 +++++ > > drivers/gpu/drm/i915/intel_uncore.c | 2 ++ > > 2 files changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > > index 7913a72ce30a..eb2149b941e4 100644 > > --- a/drivers/gpu/drm/i915/i915_irq.c > > +++ b/drivers/gpu/drm/i915/i915_irq.c > > @@ -2969,6 +2969,11 @@ static void i915_hangcheck_elapsed(unsigned long data) > > if (!i915.enable_hangcheck) > > return; > > > > + if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) { > > + i915_handle_error(dev, false, "GPU reported a page fault"); > > Is the full hangcheck state actually useful for debugging these > pagefaults? The gpu doesn't seem to fall over completely, so I guess ACTHD > and friends are somewhere in nirvana. Yes, it was. In this case ACTHD wasn't in nirvana, but close inspection showed that the environment must have changed since the execution of the batch and the GPU fault. > Enabling the interrupts would definitely be useful though. I think all the > handler code is written already (perhaps a few missing drm_debug lines), > it's just that we don't enable these interrupt sources by default. That would be even better. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-01-21 11:21 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-12-05 14:15 [PATCH 1/2] drm/i915: Detect page faults during hangcheck Chris Wilson 2014-12-05 14:15 ` [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Chris Wilson 2014-12-05 14:31 ` Ville Syrjälä 2014-12-05 14:38 ` Chris Wilson 2014-12-05 14:53 ` Ville Syrjälä 2014-12-05 16:26 ` Chris Wilson 2014-12-05 17:03 ` Ville Syrjälä 2015-01-21 10:12 ` Chris Wilson 2015-01-21 11:21 ` Ville Syrjälä 2014-12-05 14:58 ` Daniel Vetter 2014-12-05 16:23 ` Chris Wilson 2014-12-05 21:01 ` Daniel Vetter 2015-01-20 14:55 ` Mika Kuoppala 2015-01-21 10:09 ` Chris Wilson 2014-12-05 21:03 ` [PATCH 1/2] drm/i915: Detect page faults during hangcheck Daniel Vetter 2015-01-21 10:05 ` Chris Wilson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.