All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Reduce number of register access during IVB+ interrupt handling
@ 2013-09-23 10:25 Chris Wilson
  2013-09-23 12:57 ` Paulo Zanoni
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2013-09-23 10:25 UTC (permalink / raw)
  To: intel-gfx

Register access is particularly obnoxious on Sandybridge and later due to
the extra work we must do around every read or write. The effect is
magnified on Haswell, as we have per-operation sanity checking
magnifying the number of reads and writes.

Interrupt handling is supposed to be fast, yet due to the sanity checks
around the register accesss it is not as fast as it could be. If we look
closer, most of the common register operations are reading values we
already know, and redundant flushes. Eliminate these by storing the
desired values rather than reading them back during the interrupt
routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h |  4 ++--
 drivers/gpu/drm/i915/i915_irq.c | 22 ++++++++--------------
 2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index db8e4d0..f182a23 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1259,8 +1259,8 @@ typedef struct drm_i915_private {
 	/* DPIO indirect register protection */
 	struct mutex dpio_lock;
 
-	/** Cached value of IMR to avoid reads in updating the bitfield */
-	u32 irq_mask;
+	/** Cached value of IMR/IER to avoid reads in updating the bitfield */
+	u32 irq_mask, irq_enable;
 	u32 gt_irq_mask;
 	u32 pm_irq_mask;
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index d9eebca..76725c6 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1428,7 +1428,7 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 {
 	struct drm_device *dev = (struct drm_device *) arg;
 	drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
-	u32 de_iir, gt_iir, de_ier, sde_ier = 0;
+	u32 de_iir, gt_iir;
 	irqreturn_t ret = IRQ_NONE;
 
 	atomic_inc(&dev_priv->irq_received);
@@ -1438,20 +1438,15 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 	intel_uncore_check_errors(dev);
 
 	/* disable master interrupt before clearing iir  */
-	de_ier = I915_READ(DEIER);
-	I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
-	POSTING_READ(DEIER);
+	I915_WRITE(DEIER, dev_priv->irq_enable & ~DE_MASTER_IRQ_CONTROL);
 
 	/* Disable south interrupts. We'll only write to SDEIIR once, so further
 	 * interrupts will will be stored on its back queue, and then we'll be
 	 * able to process them after we restore SDEIER (as soon as we restore
 	 * it, we'll get an interrupt if SDEIIR still has something to process
 	 * due to its back queue). */
-	if (!HAS_PCH_NOP(dev)) {
-		sde_ier = I915_READ(SDEIER);
+	if (!HAS_PCH_NOP(dev))
 		I915_WRITE(SDEIER, 0);
-		POSTING_READ(SDEIER);
-	}
 
 	gt_iir = I915_READ(GTIIR);
 	if (gt_iir) {
@@ -1482,12 +1477,10 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 		}
 	}
 
-	I915_WRITE(DEIER, de_ier);
+	if (!HAS_PCH_NOP(dev))
+		I915_WRITE(SDEIER, ~0);
+	I915_WRITE(DEIER, dev_priv->irq_enable);
 	POSTING_READ(DEIER);
-	if (!HAS_PCH_NOP(dev)) {
-		I915_WRITE(SDEIER, sde_ier);
-		POSTING_READ(SDEIER);
-	}
 
 	return ret;
 }
@@ -2343,11 +2336,12 @@ static int ironlake_irq_postinstall(struct drm_device *dev)
 	}
 
 	dev_priv->irq_mask = ~display_mask;
+	dev_priv->irq_enable = display_mask | extra_mask;
 
 	/* should always can generate irq */
 	I915_WRITE(DEIIR, I915_READ(DEIIR));
 	I915_WRITE(DEIMR, dev_priv->irq_mask);
-	I915_WRITE(DEIER, display_mask | extra_mask);
+	I915_WRITE(DEIER, dev_priv->irq_enable);
 	POSTING_READ(DEIER);
 
 	gen5_gt_irq_postinstall(dev);
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: Reduce number of register access during IVB+ interrupt handling
  2013-09-23 10:25 [PATCH] drm/i915: Reduce number of register access during IVB+ interrupt handling Chris Wilson
@ 2013-09-23 12:57 ` Paulo Zanoni
  2013-09-23 13:06   ` Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Paulo Zanoni @ 2013-09-23 12:57 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

2013/9/23 Chris Wilson <chris@chris-wilson.co.uk>:
> Register access is particularly obnoxious on Sandybridge and later due to
> the extra work we must do around every read or write. The effect is
> magnified on Haswell, as we have per-operation sanity checking
> magnifying the number of reads and writes.
>
> Interrupt handling is supposed to be fast, yet due to the sanity checks
> around the register accesss it is not as fast as it could be. If we look
> closer, most of the common register operations are reading values we
> already know, and redundant flushes. Eliminate these by storing the
> desired values rather than reading them back during the interrupt
> routine.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |  4 ++--
>  drivers/gpu/drm/i915/i915_irq.c | 22 ++++++++--------------
>  2 files changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index db8e4d0..f182a23 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1259,8 +1259,8 @@ typedef struct drm_i915_private {
>         /* DPIO indirect register protection */
>         struct mutex dpio_lock;
>
> -       /** Cached value of IMR to avoid reads in updating the bitfield */
> -       u32 irq_mask;
> +       /** Cached value of IMR/IER to avoid reads in updating the bitfield */
> +       u32 irq_mask, irq_enable;
>         u32 gt_irq_mask;
>         u32 pm_irq_mask;
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index d9eebca..76725c6 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1428,7 +1428,7 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
>  {
>         struct drm_device *dev = (struct drm_device *) arg;
>         drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
> -       u32 de_iir, gt_iir, de_ier, sde_ier = 0;
> +       u32 de_iir, gt_iir;
>         irqreturn_t ret = IRQ_NONE;
>
>         atomic_inc(&dev_priv->irq_received);
> @@ -1438,20 +1438,15 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
>         intel_uncore_check_errors(dev);
>
>         /* disable master interrupt before clearing iir  */
> -       de_ier = I915_READ(DEIER);
> -       I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
> -       POSTING_READ(DEIER);
> +       I915_WRITE(DEIER, dev_priv->irq_enable & ~DE_MASTER_IRQ_CONTROL);
>
>         /* Disable south interrupts. We'll only write to SDEIIR once, so further
>          * interrupts will will be stored on its back queue, and then we'll be
>          * able to process them after we restore SDEIER (as soon as we restore
>          * it, we'll get an interrupt if SDEIIR still has something to process
>          * due to its back queue). */
> -       if (!HAS_PCH_NOP(dev)) {
> -               sde_ier = I915_READ(SDEIER);
> +       if (!HAS_PCH_NOP(dev))
>                 I915_WRITE(SDEIER, 0);
> -               POSTING_READ(SDEIER);
> -       }
>
>         gt_iir = I915_READ(GTIIR);
>         if (gt_iir) {
> @@ -1482,12 +1477,10 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
>                 }
>         }
>
> -       I915_WRITE(DEIER, de_ier);
> +       if (!HAS_PCH_NOP(dev))
> +               I915_WRITE(SDEIER, ~0);

Please don't change the relative order of DEIER and SDEIER. As far as
I remember, this order is required to prevent the complete stop of
SDEIER interrupts when we're getting too many of them (e.g.,
underruns).


> +       I915_WRITE(DEIER, dev_priv->irq_enable);
>         POSTING_READ(DEIER);
> -       if (!HAS_PCH_NOP(dev)) {
> -               I915_WRITE(SDEIER, sde_ier);
> -               POSTING_READ(SDEIER);
> -       }
>
>         return ret;
>  }
> @@ -2343,11 +2336,12 @@ static int ironlake_irq_postinstall(struct drm_device *dev)
>         }
>
>         dev_priv->irq_mask = ~display_mask;
> +       dev_priv->irq_enable = display_mask | extra_mask;
>
>         /* should always can generate irq */
>         I915_WRITE(DEIIR, I915_READ(DEIIR));
>         I915_WRITE(DEIMR, dev_priv->irq_mask);
> -       I915_WRITE(DEIER, display_mask | extra_mask);
> +       I915_WRITE(DEIER, dev_priv->irq_enable);
>         POSTING_READ(DEIER);
>
>         gen5_gt_irq_postinstall(dev);
> --
> 1.8.4.rc3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Paulo Zanoni

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: Reduce number of register access during IVB+ interrupt handling
  2013-09-23 12:57 ` Paulo Zanoni
@ 2013-09-23 13:06   ` Chris Wilson
  2013-09-23 14:39     ` Paulo Zanoni
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2013-09-23 13:06 UTC (permalink / raw)
  To: Paulo Zanoni; +Cc: Intel Graphics Development

On Mon, Sep 23, 2013 at 09:57:37AM -0300, Paulo Zanoni wrote:
> 2013/9/23 Chris Wilson <chris@chris-wilson.co.uk>:
> > Register access is particularly obnoxious on Sandybridge and later due to
> > the extra work we must do around every read or write. The effect is
> > magnified on Haswell, as we have per-operation sanity checking
> > magnifying the number of reads and writes.
> >
> > Interrupt handling is supposed to be fast, yet due to the sanity checks
> > around the register accesss it is not as fast as it could be. If we look
> > closer, most of the common register operations are reading values we
> > already know, and redundant flushes. Eliminate these by storing the
> > desired values rather than reading them back during the interrupt
> > routine.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h |  4 ++--
> >  drivers/gpu/drm/i915/i915_irq.c | 22 ++++++++--------------
> >  2 files changed, 10 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index db8e4d0..f182a23 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1259,8 +1259,8 @@ typedef struct drm_i915_private {
> >         /* DPIO indirect register protection */
> >         struct mutex dpio_lock;
> >
> > -       /** Cached value of IMR to avoid reads in updating the bitfield */
> > -       u32 irq_mask;
> > +       /** Cached value of IMR/IER to avoid reads in updating the bitfield */
> > +       u32 irq_mask, irq_enable;
> >         u32 gt_irq_mask;
> >         u32 pm_irq_mask;
> >
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index d9eebca..76725c6 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1428,7 +1428,7 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
> >  {
> >         struct drm_device *dev = (struct drm_device *) arg;
> >         drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
> > -       u32 de_iir, gt_iir, de_ier, sde_ier = 0;
> > +       u32 de_iir, gt_iir;
> >         irqreturn_t ret = IRQ_NONE;
> >
> >         atomic_inc(&dev_priv->irq_received);
> > @@ -1438,20 +1438,15 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
> >         intel_uncore_check_errors(dev);
> >
> >         /* disable master interrupt before clearing iir  */
> > -       de_ier = I915_READ(DEIER);
> > -       I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
> > -       POSTING_READ(DEIER);
> > +       I915_WRITE(DEIER, dev_priv->irq_enable & ~DE_MASTER_IRQ_CONTROL);
> >
> >         /* Disable south interrupts. We'll only write to SDEIIR once, so further
> >          * interrupts will will be stored on its back queue, and then we'll be
> >          * able to process them after we restore SDEIER (as soon as we restore
> >          * it, we'll get an interrupt if SDEIIR still has something to process
> >          * due to its back queue). */
> > -       if (!HAS_PCH_NOP(dev)) {
> > -               sde_ier = I915_READ(SDEIER);
> > +       if (!HAS_PCH_NOP(dev))
> >                 I915_WRITE(SDEIER, 0);
> > -               POSTING_READ(SDEIER);
> > -       }
> >
> >         gt_iir = I915_READ(GTIIR);
> >         if (gt_iir) {
> > @@ -1482,12 +1477,10 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
> >                 }
> >         }
> >
> > -       I915_WRITE(DEIER, de_ier);
> > +       if (!HAS_PCH_NOP(dev))
> > +               I915_WRITE(SDEIER, ~0);
> 
> Please don't change the relative order of DEIER and SDEIER. As far as
> I remember, this order is required to prevent the complete stop of
> SDEIER interrupts when we're getting too many of them (e.g.,
> underruns).

The comments refer to the ordering of DEIIR vs SDEIIR, not IER and in
particular the master-irq enable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: Reduce number of register access during IVB+ interrupt handling
  2013-09-23 13:06   ` Chris Wilson
@ 2013-09-23 14:39     ` Paulo Zanoni
  2013-09-23 15:07       ` Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Paulo Zanoni @ 2013-09-23 14:39 UTC (permalink / raw)
  To: Chris Wilson, Paulo Zanoni, Intel Graphics Development

2013/9/23 Chris Wilson <chris@chris-wilson.co.uk>:
> On Mon, Sep 23, 2013 at 09:57:37AM -0300, Paulo Zanoni wrote:
>> 2013/9/23 Chris Wilson <chris@chris-wilson.co.uk>:
>> > Register access is particularly obnoxious on Sandybridge and later due to
>> > the extra work we must do around every read or write. The effect is
>> > magnified on Haswell, as we have per-operation sanity checking
>> > magnifying the number of reads and writes.
>> >
>> > Interrupt handling is supposed to be fast, yet due to the sanity checks
>> > around the register accesss it is not as fast as it could be. If we look
>> > closer, most of the common register operations are reading values we
>> > already know, and redundant flushes. Eliminate these by storing the
>> > desired values rather than reading them back during the interrupt
>> > routine.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > ---
>> >  drivers/gpu/drm/i915/i915_drv.h |  4 ++--
>> >  drivers/gpu/drm/i915/i915_irq.c | 22 ++++++++--------------
>> >  2 files changed, 10 insertions(+), 16 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> > index db8e4d0..f182a23 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.h
>> > +++ b/drivers/gpu/drm/i915/i915_drv.h
>> > @@ -1259,8 +1259,8 @@ typedef struct drm_i915_private {
>> >         /* DPIO indirect register protection */
>> >         struct mutex dpio_lock;
>> >
>> > -       /** Cached value of IMR to avoid reads in updating the bitfield */
>> > -       u32 irq_mask;
>> > +       /** Cached value of IMR/IER to avoid reads in updating the bitfield */
>> > +       u32 irq_mask, irq_enable;
>> >         u32 gt_irq_mask;
>> >         u32 pm_irq_mask;
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> > index d9eebca..76725c6 100644
>> > --- a/drivers/gpu/drm/i915/i915_irq.c
>> > +++ b/drivers/gpu/drm/i915/i915_irq.c
>> > @@ -1428,7 +1428,7 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
>> >  {
>> >         struct drm_device *dev = (struct drm_device *) arg;
>> >         drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
>> > -       u32 de_iir, gt_iir, de_ier, sde_ier = 0;
>> > +       u32 de_iir, gt_iir;
>> >         irqreturn_t ret = IRQ_NONE;
>> >
>> >         atomic_inc(&dev_priv->irq_received);
>> > @@ -1438,20 +1438,15 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
>> >         intel_uncore_check_errors(dev);
>> >
>> >         /* disable master interrupt before clearing iir  */
>> > -       de_ier = I915_READ(DEIER);
>> > -       I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
>> > -       POSTING_READ(DEIER);
>> > +       I915_WRITE(DEIER, dev_priv->irq_enable & ~DE_MASTER_IRQ_CONTROL);
>> >
>> >         /* Disable south interrupts. We'll only write to SDEIIR once, so further
>> >          * interrupts will will be stored on its back queue, and then we'll be
>> >          * able to process them after we restore SDEIER (as soon as we restore
>> >          * it, we'll get an interrupt if SDEIIR still has something to process
>> >          * due to its back queue). */
>> > -       if (!HAS_PCH_NOP(dev)) {
>> > -               sde_ier = I915_READ(SDEIER);
>> > +       if (!HAS_PCH_NOP(dev))
>> >                 I915_WRITE(SDEIER, 0);
>> > -               POSTING_READ(SDEIER);
>> > -       }
>> >
>> >         gt_iir = I915_READ(GTIIR);
>> >         if (gt_iir) {
>> > @@ -1482,12 +1477,10 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
>> >                 }
>> >         }
>> >
>> > -       I915_WRITE(DEIER, de_ier);
>> > +       if (!HAS_PCH_NOP(dev))
>> > +               I915_WRITE(SDEIER, ~0);
>>
>> Please don't change the relative order of DEIER and SDEIER. As far as
>> I remember, this order is required to prevent the complete stop of
>> SDEIER interrupts when we're getting too many of them (e.g.,
>> underruns).
>
> The comments refer to the ordering of DEIIR vs SDEIIR, not IER and in
> particular the master-irq enable.

Maybe we need better comments. I did these things in the beginning of
the year, so my memory is not exactly precise. But as far as I
remember, the idea of restoring SDEIER after we restore DEIER is that
with this, any pending south interrupts would propagate to the north
DEIMR bit, follow the interrupt route, then trigger a new interrupt,
which would call the interrupt handler again. If we restore SDEIER
first we won't trigger an interrupt (since the master bit in DEIER is
disabled) and we'll get stuck because the DEIER bit responsible for
the south interrupts only triggers an interrupt when it flips to 1,
but on this case it's already 1.

There's this email where I talk about this:
http://lists.freedesktop.org/archives/intel-gfx/2013-February/024962.html


But maybe I was wrong. Still, inverting the order should probably be
on its own patch due to the risks.


And since we're trying to save reads/writes, perhaps another idea
would be to only clear SDEIER after we read DEIIR and confirm that we
do have a south interrupt pending.


Also, just out of curiosity: do we get any noticeable performance
difference on any workload? Depending on the impact I'd even vote to
disable-by-default some of the debugging features.

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre



-- 
Paulo Zanoni

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: Reduce number of register access during IVB+ interrupt handling
  2013-09-23 14:39     ` Paulo Zanoni
@ 2013-09-23 15:07       ` Chris Wilson
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Wilson @ 2013-09-23 15:07 UTC (permalink / raw)
  To: Paulo Zanoni; +Cc: Intel Graphics Development

On Mon, Sep 23, 2013 at 11:39:11AM -0300, Paulo Zanoni wrote:
> Also, just out of curiosity: do we get any noticeable performance
> difference on any workload? Depending on the impact I'd even vote to
> disable-by-default some of the debugging features.

The tests are typically GPU bound, we are waiting upon breadcrumb irqs
after all. So impact on total throughput is marginal (<1%), but our
interrupt routines dominant the CPU profiles, which is helped but the
unclaimed check is a really, really, slow read and still masks the
useful work.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-23 15:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-23 10:25 [PATCH] drm/i915: Reduce number of register access during IVB+ interrupt handling Chris Wilson
2013-09-23 12:57 ` Paulo Zanoni
2013-09-23 13:06   ` Chris Wilson
2013-09-23 14:39     ` Paulo Zanoni
2013-09-23 15:07       ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.