All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
@ 2017-02-15 11:36 Chris Wilson
  2017-02-15 11:36 ` [PATCH 2/2] drm/i915: Share the heavyweight sync-flush irq barrier with gen5 Chris Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Chris Wilson @ 2017-02-15 11:36 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

I tried lightening the barrier in commit 9b9ed3093613 ("drm/i915: Remove
forcewake dance from seqno/irq barrier on legacy gen6+") but that
appears slightly on the optimistic side as we detect a very rare missed
interrupt on a few machines. This patch goes super heavy with a
force-waked sync-flush dance (enforces a delay as we perform a
round-trip with the GPU during which it flushes it write caches - these
should already be flushed just prior to the interrupt and so should be
fairly light as we respond to the interrupt). The saving grace to using
such a heavy barrier is that we only apply it following once an interrupt,
and only if the expect seqno hasn't landed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99816
Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 69c30a76d395..d1f408938479 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1492,25 +1492,26 @@ static void
 gen6_seqno_barrier(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-
-	/* Workaround to force correct ordering between irq and seqno writes on
-	 * ivb (and maybe also on snb) by reading from a CS register (like
-	 * ACTHD) before reading the status page.
-	 *
-	 * Note that this effectively stalls the read by the time it takes to
-	 * do a memory transaction, which more or less ensures that the write
-	 * from the GPU has sufficient time to invalidate the CPU cacheline.
-	 * Alternatively we could delay the interrupt from the CS ring to give
-	 * the write time to land, but that would incur a delay after every
-	 * batch i.e. much more frequent than a delay when waiting for the
-	 * interrupt (with the same net latency).
+	i915_reg_t reg = RING_INSTPM(engine->mmio_base);
+
+	/* Tell the gpu to flush its render cache (which should mostly be
+	 * emptied as we flushed it before the interrupt) and wait for it
+	 * to signal completion. This imposes a round trip that is at least
+	 * as long as the memory access to memory from the GPU and the
+	 * uncached mmio - that should be sufficient for the outstanding
+	 * write of the seqno to land!
 	 *
-	 * Also note that to prevent whole machine hangs on gen7, we have to
-	 * take the spinlock to guard against concurrent cacheline access.
+	 * Unfortunately, this requires the GPU to be kicked out of rc6 as
+	 * we wait (both incurring extra delay in acquiring the wakeref) and
+	 * extra power.
 	 */
-	spin_lock_irq(&dev_priv->uncore.lock);
-	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
-	spin_unlock_irq(&dev_priv->uncore.lock);
+
+	intel_uncore_forcewake_get(dev_priv, engine->fw_domains);
+	I915_WRITE_FW(reg, _MASKED_BIT_ENABLE(INSTPM_SYNC_FLUSH));
+	WARN_ON_ONCE(intel_wait_for_register_fw(dev_priv,
+						reg, INSTPM_SYNC_FLUSH, 0,
+						10));
+	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);
 }
 
 static void
@@ -2593,6 +2594,11 @@ static void intel_ring_init_irq(struct drm_i915_private *dev_priv,
 {
 	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << engine->irq_shift;
 
+	engine->fw_domains =
+		intel_uncore_forcewake_for_reg(dev_priv,
+					       RING_INSTPM(engine->mmio_base),
+					       FW_REG_WRITE | FW_REG_READ);
+
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable = gen8_irq_enable;
 		engine->irq_disable = gen8_irq_disable;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] drm/i915: Share the heavyweight sync-flush irq barrier with gen5
  2017-02-15 11:36 [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+ Chris Wilson
@ 2017-02-15 11:36 ` Chris Wilson
  2017-02-15 11:59 ` [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+ Ville Syrjälä
  2017-02-15 13:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] " Patchwork
  2 siblings, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2017-02-15 11:36 UTC (permalink / raw)
  To: intel-gfx

Currently we use an empirically derived sleep for waiting fr the seqno to be
coherent following an interrupt on Ironlake. However, that is
occasionally too short and we get a missed-interrupt warning from CI. We
can use the sync-flush dance now employed for gen6 for a more realistic
delay, and hopefully kill 2 birds with one barrier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 22 ++--------------------
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d1f408938479..9673eaf05ec9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1473,24 +1473,6 @@ gen6_ring_sync_to(struct drm_i915_gem_request *req,
 static void
 gen5_seqno_barrier(struct intel_engine_cs *engine)
 {
-	/* MI_STORE are internally buffered by the GPU and not flushed
-	 * either by MI_FLUSH or SyncFlush or any other combination of
-	 * MI commands.
-	 *
-	 * "Only the submission of the store operation is guaranteed.
-	 * The write result will be complete (coherent) some time later
-	 * (this is practically a finite period but there is no guaranteed
-	 * latency)."
-	 *
-	 * Empirically, we observe that we need a delay of at least 75us to
-	 * be sure that the seqno write is visible by the CPU.
-	 */
-	usleep_range(125, 250);
-}
-
-static void
-gen6_seqno_barrier(struct intel_engine_cs *engine)
-{
 	struct drm_i915_private *dev_priv = engine->i915;
 	i915_reg_t reg = RING_INSTPM(engine->mmio_base);
 
@@ -2602,11 +2584,11 @@ static void intel_ring_init_irq(struct drm_i915_private *dev_priv,
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable = gen8_irq_enable;
 		engine->irq_disable = gen8_irq_disable;
-		engine->irq_seqno_barrier = gen6_seqno_barrier;
+		engine->irq_seqno_barrier = gen5_seqno_barrier;
 	} else if (INTEL_GEN(dev_priv) >= 6) {
 		engine->irq_enable = gen6_irq_enable;
 		engine->irq_disable = gen6_irq_disable;
-		engine->irq_seqno_barrier = gen6_seqno_barrier;
+		engine->irq_seqno_barrier = gen5_seqno_barrier;
 	} else if (INTEL_GEN(dev_priv) >= 5) {
 		engine->irq_enable = gen5_irq_enable;
 		engine->irq_disable = gen5_irq_disable;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
  2017-02-15 11:36 [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+ Chris Wilson
  2017-02-15 11:36 ` [PATCH 2/2] drm/i915: Share the heavyweight sync-flush irq barrier with gen5 Chris Wilson
@ 2017-02-15 11:59 ` Ville Syrjälä
  2017-02-15 12:15   ` Chris Wilson
  2017-02-15 13:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] " Patchwork
  2 siblings, 1 reply; 9+ messages in thread
From: Ville Syrjälä @ 2017-02-15 11:59 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Mika Kuoppala

On Wed, Feb 15, 2017 at 11:36:08AM +0000, Chris Wilson wrote:
> I tried lightening the barrier in commit 9b9ed3093613 ("drm/i915: Remove
> forcewake dance from seqno/irq barrier on legacy gen6+") but that
> appears slightly on the optimistic side as we detect a very rare missed
> interrupt on a few machines. This patch goes super heavy with a
> force-waked sync-flush dance (enforces a delay as we perform a
> round-trip with the GPU during which it flushes it write caches - these
> should already be flushed just prior to the interrupt and so should be
> fairly light as we respond to the interrupt). The saving grace to using
> such a heavy barrier is that we only apply it following once an interrupt,
> and only if the expect seqno hasn't landed.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99816
> Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +++++++++++++++++++--------------
>  1 file changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 69c30a76d395..d1f408938479 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1492,25 +1492,26 @@ static void
>  gen6_seqno_barrier(struct intel_engine_cs *engine)
>  {
>  	struct drm_i915_private *dev_priv = engine->i915;
> -
> -	/* Workaround to force correct ordering between irq and seqno writes on
> -	 * ivb (and maybe also on snb) by reading from a CS register (like
> -	 * ACTHD) before reading the status page.
> -	 *
> -	 * Note that this effectively stalls the read by the time it takes to
> -	 * do a memory transaction, which more or less ensures that the write
> -	 * from the GPU has sufficient time to invalidate the CPU cacheline.
> -	 * Alternatively we could delay the interrupt from the CS ring to give
> -	 * the write time to land, but that would incur a delay after every
> -	 * batch i.e. much more frequent than a delay when waiting for the
> -	 * interrupt (with the same net latency).
> +	i915_reg_t reg = RING_INSTPM(engine->mmio_base);
> +
> +	/* Tell the gpu to flush its render cache (which should mostly be
> +	 * emptied as we flushed it before the interrupt) and wait for it
> +	 * to signal completion. This imposes a round trip that is at least
> +	 * as long as the memory access to memory from the GPU and the
> +	 * uncached mmio - that should be sufficient for the outstanding
> +	 * write of the seqno to land!
>  	 *
> -	 * Also note that to prevent whole machine hangs on gen7, we have to
> -	 * take the spinlock to guard against concurrent cacheline access.
> +	 * Unfortunately, this requires the GPU to be kicked out of rc6 as
> +	 * we wait (both incurring extra delay in acquiring the wakeref) and
> +	 * extra power.
>  	 */
> -	spin_lock_irq(&dev_priv->uncore.lock);
> -	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
> -	spin_unlock_irq(&dev_priv->uncore.lock);
> +
> +	intel_uncore_forcewake_get(dev_priv, engine->fw_domains);
> +	I915_WRITE_FW(reg, _MASKED_BIT_ENABLE(INSTPM_SYNC_FLUSH));
> +	WARN_ON_ONCE(intel_wait_for_register_fw(dev_priv,
> +						reg, INSTPM_SYNC_FLUSH, 0,
> +						10));
> +	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);

Why do I have a nagging feeling that sync flush is somehow busted? Maybe
I'm confusing it with the operation flush thing, which we disable.

Hmm. The spec does say sync flush should only be issued after stopping the
rings.

If we were to use it, couldn't we also tell it to write the status into
the HWS and poll that instead?

>  }
>  
>  static void
> @@ -2593,6 +2594,11 @@ static void intel_ring_init_irq(struct drm_i915_private *dev_priv,
>  {
>  	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << engine->irq_shift;
>  
> +	engine->fw_domains =
> +		intel_uncore_forcewake_for_reg(dev_priv,
> +					       RING_INSTPM(engine->mmio_base),
> +					       FW_REG_WRITE | FW_REG_READ);
> +
>  	if (INTEL_GEN(dev_priv) >= 8) {
>  		engine->irq_enable = gen8_irq_enable;
>  		engine->irq_disable = gen8_irq_disable;
> -- 
> 2.11.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
  2017-02-15 11:59 ` [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+ Ville Syrjälä
@ 2017-02-15 12:15   ` Chris Wilson
  2017-02-15 12:32     ` Ville Syrjälä
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2017-02-15 12:15 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx, Mika Kuoppala

On Wed, Feb 15, 2017 at 01:59:59PM +0200, Ville Syrjälä wrote:
> On Wed, Feb 15, 2017 at 11:36:08AM +0000, Chris Wilson wrote:
> > I tried lightening the barrier in commit 9b9ed3093613 ("drm/i915: Remove
> > forcewake dance from seqno/irq barrier on legacy gen6+") but that
> > appears slightly on the optimistic side as we detect a very rare missed
> > interrupt on a few machines. This patch goes super heavy with a
> > force-waked sync-flush dance (enforces a delay as we perform a
> > round-trip with the GPU during which it flushes it write caches - these
> > should already be flushed just prior to the interrupt and so should be
> > fairly light as we respond to the interrupt). The saving grace to using
> > such a heavy barrier is that we only apply it following once an interrupt,
> > and only if the expect seqno hasn't landed.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99816
> > Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +++++++++++++++++++--------------
> >  1 file changed, 23 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 69c30a76d395..d1f408938479 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1492,25 +1492,26 @@ static void
> >  gen6_seqno_barrier(struct intel_engine_cs *engine)
> >  {
> >  	struct drm_i915_private *dev_priv = engine->i915;
> > -
> > -	/* Workaround to force correct ordering between irq and seqno writes on
> > -	 * ivb (and maybe also on snb) by reading from a CS register (like
> > -	 * ACTHD) before reading the status page.
> > -	 *
> > -	 * Note that this effectively stalls the read by the time it takes to
> > -	 * do a memory transaction, which more or less ensures that the write
> > -	 * from the GPU has sufficient time to invalidate the CPU cacheline.
> > -	 * Alternatively we could delay the interrupt from the CS ring to give
> > -	 * the write time to land, but that would incur a delay after every
> > -	 * batch i.e. much more frequent than a delay when waiting for the
> > -	 * interrupt (with the same net latency).
> > +	i915_reg_t reg = RING_INSTPM(engine->mmio_base);
> > +
> > +	/* Tell the gpu to flush its render cache (which should mostly be
> > +	 * emptied as we flushed it before the interrupt) and wait for it
> > +	 * to signal completion. This imposes a round trip that is at least
> > +	 * as long as the memory access to memory from the GPU and the
> > +	 * uncached mmio - that should be sufficient for the outstanding
> > +	 * write of the seqno to land!
> >  	 *
> > -	 * Also note that to prevent whole machine hangs on gen7, we have to
> > -	 * take the spinlock to guard against concurrent cacheline access.
> > +	 * Unfortunately, this requires the GPU to be kicked out of rc6 as
> > +	 * we wait (both incurring extra delay in acquiring the wakeref) and
> > +	 * extra power.
> >  	 */
> > -	spin_lock_irq(&dev_priv->uncore.lock);
> > -	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
> > -	spin_unlock_irq(&dev_priv->uncore.lock);
> > +
> > +	intel_uncore_forcewake_get(dev_priv, engine->fw_domains);
> > +	I915_WRITE_FW(reg, _MASKED_BIT_ENABLE(INSTPM_SYNC_FLUSH));
> > +	WARN_ON_ONCE(intel_wait_for_register_fw(dev_priv,
> > +						reg, INSTPM_SYNC_FLUSH, 0,
> > +						10));
> > +	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);
> 
> Why do I have a nagging feeling that sync flush is somehow busted? Maybe
> I'm confusing it with the operation flush thing, which we disable.

Note we are not using it as an actual flush, just a means for a round
trip with hw. Using it as an estimate for the roundtrip of GPU with memory
is a bonus, but afaik the MI_STORE_DATA itself is independent of the
caches and even SyncFlush.
 
> Hmm. The spec does say sync flush should only be issued after stopping the
> rings.

I was working on the basis that if you wanted to guarrantee that all
writes have been flushed and no new ones created, you would have to stop
the rings (and that was what the spec meant).

> If we were to use it, couldn't we also tell it to write the status into
> the HWS and poll that instead?

Hmm, that's a reasonable idea. We can assume that the writes into the
HWS (seqno, then the round trip) are ordered so by polling on the
HWS for the out-of-band semaphore should give us the right timing.

Now I have to remember what writes into and what masks the HWS...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
  2017-02-15 12:15   ` Chris Wilson
@ 2017-02-15 12:32     ` Ville Syrjälä
  2017-02-15 16:00       ` Chris Wilson
  0 siblings, 1 reply; 9+ messages in thread
From: Ville Syrjälä @ 2017-02-15 12:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Mika Kuoppala

On Wed, Feb 15, 2017 at 12:15:43PM +0000, Chris Wilson wrote:
> On Wed, Feb 15, 2017 at 01:59:59PM +0200, Ville Syrjälä wrote:
> > On Wed, Feb 15, 2017 at 11:36:08AM +0000, Chris Wilson wrote:
> > > I tried lightening the barrier in commit 9b9ed3093613 ("drm/i915: Remove
> > > forcewake dance from seqno/irq barrier on legacy gen6+") but that
> > > appears slightly on the optimistic side as we detect a very rare missed
> > > interrupt on a few machines. This patch goes super heavy with a
> > > force-waked sync-flush dance (enforces a delay as we perform a
> > > round-trip with the GPU during which it flushes it write caches - these
> > > should already be flushed just prior to the interrupt and so should be
> > > fairly light as we respond to the interrupt). The saving grace to using
> > > such a heavy barrier is that we only apply it following once an interrupt,
> > > and only if the expect seqno hasn't landed.
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99816
> > > Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+")
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +++++++++++++++++++--------------
> > >  1 file changed, 23 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 69c30a76d395..d1f408938479 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -1492,25 +1492,26 @@ static void
> > >  gen6_seqno_barrier(struct intel_engine_cs *engine)
> > >  {
> > >  	struct drm_i915_private *dev_priv = engine->i915;
> > > -
> > > -	/* Workaround to force correct ordering between irq and seqno writes on
> > > -	 * ivb (and maybe also on snb) by reading from a CS register (like
> > > -	 * ACTHD) before reading the status page.
> > > -	 *
> > > -	 * Note that this effectively stalls the read by the time it takes to
> > > -	 * do a memory transaction, which more or less ensures that the write
> > > -	 * from the GPU has sufficient time to invalidate the CPU cacheline.
> > > -	 * Alternatively we could delay the interrupt from the CS ring to give
> > > -	 * the write time to land, but that would incur a delay after every
> > > -	 * batch i.e. much more frequent than a delay when waiting for the
> > > -	 * interrupt (with the same net latency).
> > > +	i915_reg_t reg = RING_INSTPM(engine->mmio_base);
> > > +
> > > +	/* Tell the gpu to flush its render cache (which should mostly be
> > > +	 * emptied as we flushed it before the interrupt) and wait for it
> > > +	 * to signal completion. This imposes a round trip that is at least
> > > +	 * as long as the memory access to memory from the GPU and the
> > > +	 * uncached mmio - that should be sufficient for the outstanding
> > > +	 * write of the seqno to land!
> > >  	 *
> > > -	 * Also note that to prevent whole machine hangs on gen7, we have to
> > > -	 * take the spinlock to guard against concurrent cacheline access.
> > > +	 * Unfortunately, this requires the GPU to be kicked out of rc6 as
> > > +	 * we wait (both incurring extra delay in acquiring the wakeref) and
> > > +	 * extra power.
> > >  	 */
> > > -	spin_lock_irq(&dev_priv->uncore.lock);
> > > -	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
> > > -	spin_unlock_irq(&dev_priv->uncore.lock);
> > > +
> > > +	intel_uncore_forcewake_get(dev_priv, engine->fw_domains);
> > > +	I915_WRITE_FW(reg, _MASKED_BIT_ENABLE(INSTPM_SYNC_FLUSH));
> > > +	WARN_ON_ONCE(intel_wait_for_register_fw(dev_priv,
> > > +						reg, INSTPM_SYNC_FLUSH, 0,
> > > +						10));
> > > +	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);
> > 
> > Why do I have a nagging feeling that sync flush is somehow busted? Maybe
> > I'm confusing it with the operation flush thing, which we disable.
> 
> Note we are not using it as an actual flush, just a means for a round
> trip with hw. Using it as an estimate for the roundtrip of GPU with memory
> is a bonus, but afaik the MI_STORE_DATA itself is independent of the
> caches and even SyncFlush.
>  
> > Hmm. The spec does say sync flush should only be issued after stopping the
> > rings.
> 
> I was working on the basis that if you wanted to guarrantee that all
> writes have been flushed and no new ones created, you would have to stop
> the rings (and that was what the spec meant).
> 
> > If we were to use it, couldn't we also tell it to write the status into
> > the HWS and poll that instead?
> 
> Hmm, that's a reasonable idea. We can assume that the writes into the
> HWS (seqno, then the round trip) are ordered so by polling on the
> HWS for the out-of-band semaphore should give us the right timing.
> 
> Now I have to remember what writes into and what masks the HWS...

Speaking of which, our HWSTAM stuff in the irq code looks quite
fubar. We're unmasking all sorts of weird bits on pre-snb platforms,
and that happens well before we setup up the HWS IIRC. And on snb+
we aren't doing anything apparently, so we just hope that everything
starts out masked I guess. The execlist code does explicitly mask
everything.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
  2017-02-15 11:36 [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+ Chris Wilson
  2017-02-15 11:36 ` [PATCH 2/2] drm/i915: Share the heavyweight sync-flush irq barrier with gen5 Chris Wilson
  2017-02-15 11:59 ` [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+ Ville Syrjälä
@ 2017-02-15 13:52 ` Patchwork
  2017-02-15 16:03   ` Chris Wilson
  2 siblings, 1 reply; 9+ messages in thread
From: Patchwork @ 2017-02-15 13:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
URL   : https://patchwork.freedesktop.org/series/19697/
State : failure

== Summary ==

Series 19697v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/19697/revisions/1/mbox/

Test gem_ringfill:
        Subgroup basic-default-hang:
                pass       -> DMESG-WARN (fi-snb-2600)
                pass       -> DMESG-WARN (fi-ivb-3520m)
                pass       -> DMESG-WARN (fi-ilk-650)
                pass       -> DMESG-WARN (fi-byt-n2820)
                pass       -> DMESG-WARN (fi-hsw-4770)
                pass       -> DMESG-WARN (fi-hsw-4770r)
                pass       -> DMESG-WARN (fi-snb-2520m)
                pass       -> DMESG-WARN (fi-byt-j1900)
                pass       -> DMESG-WARN (fi-ivb-3770)
Test gem_sync:
        Subgroup basic-store-each:
                pass       -> FAIL       (fi-ilk-650)

fi-bdw-5557u     total:252  pass:241  dwarn:0   dfail:0   fail:0   skip:11 
fi-bsw-n3050     total:252  pass:213  dwarn:0   dfail:0   fail:0   skip:39 
fi-bxt-j4205     total:252  pass:233  dwarn:0   dfail:0   fail:0   skip:19 
fi-bxt-t5700     total:83   pass:70   dwarn:0   dfail:0   fail:0   skip:12 
fi-byt-j1900     total:252  pass:224  dwarn:1   dfail:0   fail:0   skip:27 
fi-byt-n2820     total:252  pass:220  dwarn:1   dfail:0   fail:0   skip:31 
fi-hsw-4770      total:252  pass:235  dwarn:1   dfail:0   fail:0   skip:16 
fi-hsw-4770r     total:252  pass:235  dwarn:1   dfail:0   fail:0   skip:16 
fi-ilk-650       total:252  pass:200  dwarn:1   dfail:0   fail:1   skip:50 
fi-ivb-3520m     total:252  pass:233  dwarn:1   dfail:0   fail:0   skip:18 
fi-ivb-3770      total:252  pass:233  dwarn:1   dfail:0   fail:0   skip:18 
fi-kbl-7500u     total:252  pass:234  dwarn:0   dfail:0   fail:0   skip:18 
fi-skl-6260u     total:252  pass:242  dwarn:0   dfail:0   fail:0   skip:10 
fi-skl-6700hq    total:252  pass:235  dwarn:0   dfail:0   fail:0   skip:17 
fi-skl-6700k     total:252  pass:230  dwarn:4   dfail:0   fail:0   skip:18 
fi-skl-6770hq    total:252  pass:242  dwarn:0   dfail:0   fail:0   skip:10 
fi-snb-2520m     total:252  pass:223  dwarn:1   dfail:0   fail:0   skip:28 
fi-snb-2600      total:252  pass:222  dwarn:1   dfail:0   fail:0   skip:29 

78640dff39ba539190c519d5b210c9e28fb886bb drm-tip: 2017y-02m-15d-10h-08m-38s UTC integration manifest
eabc4fa drm/i915: Share the heavyweight sync-flush irq barrier with gen5
e8e3958 drm/i915: Use a heavyweight irq-seqno barrier for gen6+

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3822/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
  2017-02-15 12:32     ` Ville Syrjälä
@ 2017-02-15 16:00       ` Chris Wilson
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2017-02-15 16:00 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx, Mika Kuoppala

On Wed, Feb 15, 2017 at 02:32:18PM +0200, Ville Syrjälä wrote:
> On Wed, Feb 15, 2017 at 12:15:43PM +0000, Chris Wilson wrote:
> > On Wed, Feb 15, 2017 at 01:59:59PM +0200, Ville Syrjälä wrote:
> > > On Wed, Feb 15, 2017 at 11:36:08AM +0000, Chris Wilson wrote:
> > > > I tried lightening the barrier in commit 9b9ed3093613 ("drm/i915: Remove
> > > > forcewake dance from seqno/irq barrier on legacy gen6+") but that
> > > > appears slightly on the optimistic side as we detect a very rare missed
> > > > interrupt on a few machines. This patch goes super heavy with a
> > > > force-waked sync-flush dance (enforces a delay as we perform a
> > > > round-trip with the GPU during which it flushes it write caches - these
> > > > should already be flushed just prior to the interrupt and so should be
> > > > fairly light as we respond to the interrupt). The saving grace to using
> > > > such a heavy barrier is that we only apply it following once an interrupt,
> > > > and only if the expect seqno hasn't landed.
> > > > 
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99816
> > > > Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+")
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +++++++++++++++++++--------------
> > > >  1 file changed, 23 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > index 69c30a76d395..d1f408938479 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -1492,25 +1492,26 @@ static void
> > > >  gen6_seqno_barrier(struct intel_engine_cs *engine)
> > > >  {
> > > >  	struct drm_i915_private *dev_priv = engine->i915;
> > > > -
> > > > -	/* Workaround to force correct ordering between irq and seqno writes on
> > > > -	 * ivb (and maybe also on snb) by reading from a CS register (like
> > > > -	 * ACTHD) before reading the status page.
> > > > -	 *
> > > > -	 * Note that this effectively stalls the read by the time it takes to
> > > > -	 * do a memory transaction, which more or less ensures that the write
> > > > -	 * from the GPU has sufficient time to invalidate the CPU cacheline.
> > > > -	 * Alternatively we could delay the interrupt from the CS ring to give
> > > > -	 * the write time to land, but that would incur a delay after every
> > > > -	 * batch i.e. much more frequent than a delay when waiting for the
> > > > -	 * interrupt (with the same net latency).
> > > > +	i915_reg_t reg = RING_INSTPM(engine->mmio_base);
> > > > +
> > > > +	/* Tell the gpu to flush its render cache (which should mostly be
> > > > +	 * emptied as we flushed it before the interrupt) and wait for it
> > > > +	 * to signal completion. This imposes a round trip that is at least
> > > > +	 * as long as the memory access to memory from the GPU and the
> > > > +	 * uncached mmio - that should be sufficient for the outstanding
> > > > +	 * write of the seqno to land!
> > > >  	 *
> > > > -	 * Also note that to prevent whole machine hangs on gen7, we have to
> > > > -	 * take the spinlock to guard against concurrent cacheline access.
> > > > +	 * Unfortunately, this requires the GPU to be kicked out of rc6 as
> > > > +	 * we wait (both incurring extra delay in acquiring the wakeref) and
> > > > +	 * extra power.
> > > >  	 */
> > > > -	spin_lock_irq(&dev_priv->uncore.lock);
> > > > -	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
> > > > -	spin_unlock_irq(&dev_priv->uncore.lock);
> > > > +
> > > > +	intel_uncore_forcewake_get(dev_priv, engine->fw_domains);
> > > > +	I915_WRITE_FW(reg, _MASKED_BIT_ENABLE(INSTPM_SYNC_FLUSH));
> > > > +	WARN_ON_ONCE(intel_wait_for_register_fw(dev_priv,
> > > > +						reg, INSTPM_SYNC_FLUSH, 0,
> > > > +						10));
> > > > +	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);
> > > 
> > > Why do I have a nagging feeling that sync flush is somehow busted? Maybe
> > > I'm confusing it with the operation flush thing, which we disable.
> > 
> > Note we are not using it as an actual flush, just a means for a round
> > trip with hw. Using it as an estimate for the roundtrip of GPU with memory
> > is a bonus, but afaik the MI_STORE_DATA itself is independent of the
> > caches and even SyncFlush.
> >  
> > > Hmm. The spec does say sync flush should only be issued after stopping the
> > > rings.
> > 
> > I was working on the basis that if you wanted to guarrantee that all
> > writes have been flushed and no new ones created, you would have to stop
> > the rings (and that was what the spec meant).
> > 
> > > If we were to use it, couldn't we also tell it to write the status into
> > > the HWS and poll that instead?
> > 
> > Hmm, that's a reasonable idea. We can assume that the writes into the
> > HWS (seqno, then the round trip) are ordered so by polling on the
> > HWS for the out-of-band semaphore should give us the right timing.
> > 
> > Now I have to remember what writes into and what masks the HWS...
> 
> Speaking of which, our HWSTAM stuff in the irq code looks quite
> fubar. We're unmasking all sorts of weird bits on pre-snb platforms,
> and that happens well before we setup up the HWS IIRC. And on snb+
> we aren't doing anything apparently, so we just hope that everything
> starts out masked I guess. The execlist code does explicitly mask
> everything.

Annoyingly the method changes between gen2-gen5 and gen6+. Before snb,
sync status (in hwstam) gets toggled on every sync flush. After snb, it
requires explicit clearing after processing the interrupt. That and I'm
just not making it work on anything other than the render ring.

Whereas probing INSTPM is working for both (gen5+). Doing an uncached
mmio read on INSTPM is at least as good as we had before, and now we
have the extra delay for forcewake! (The main benefit of solving HWSTAM
would be to eliminate the extra forcewake.)

Atm I'm liking the INSTPM polling much more than broken attempts to use
HWS.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
  2017-02-15 13:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] " Patchwork
@ 2017-02-15 16:03   ` Chris Wilson
  2017-02-15 16:09     ` Chris Wilson
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2017-02-15 16:03 UTC (permalink / raw)
  To: intel-gfx

On Wed, Feb 15, 2017 at 01:52:12PM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: series starting with [1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
> URL   : https://patchwork.freedesktop.org/series/19697/
> State : failure
> 
> == Summary ==
> 
> Series 19697v1 Series without cover letter
> https://patchwork.freedesktop.org/api/1.0/series/19697/revisions/1/mbox/
> 
> Test gem_sync:
>         Subgroup basic-store-each:
>                 pass       -> FAIL       (fi-ilk-650)

But that rules out this method as being the "ideal" delay. Back to the
drawing board.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
  2017-02-15 16:03   ` Chris Wilson
@ 2017-02-15 16:09     ` Chris Wilson
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2017-02-15 16:09 UTC (permalink / raw)
  To: intel-gfx

On Wed, Feb 15, 2017 at 04:03:33PM +0000, Chris Wilson wrote:
> On Wed, Feb 15, 2017 at 01:52:12PM -0000, Patchwork wrote:
> > == Series Details ==
> > 
> > Series: series starting with [1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+
> > URL   : https://patchwork.freedesktop.org/series/19697/
> > State : failure
> > 
> > == Summary ==
> > 
> > Series 19697v1 Series without cover letter
> > https://patchwork.freedesktop.org/api/1.0/series/19697/revisions/1/mbox/
> > 
> > Test gem_sync:
> >         Subgroup basic-store-each:
> >                 pass       -> FAIL       (fi-ilk-650)
> 
> But that rules out this method as being the "ideal" delay. Back to the
> drawing board.

And the failure on hang tells me that the SyncFlush only occurs between
CS steps (and not inside batches). As demonstrated aptly by the hanging
batch, that is itself less than ideal.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-02-15 16:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-15 11:36 [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+ Chris Wilson
2017-02-15 11:36 ` [PATCH 2/2] drm/i915: Share the heavyweight sync-flush irq barrier with gen5 Chris Wilson
2017-02-15 11:59 ` [PATCH 1/2] drm/i915: Use a heavyweight irq-seqno barrier for gen6+ Ville Syrjälä
2017-02-15 12:15   ` Chris Wilson
2017-02-15 12:32     ` Ville Syrjälä
2017-02-15 16:00       ` Chris Wilson
2017-02-15 13:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] " Patchwork
2017-02-15 16:03   ` Chris Wilson
2017-02-15 16:09     ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.