[PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler

intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler
@ 2011-09-08 12:00 Daniel Vetter
  2011-09-08 12:00 ` [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func Daniel Vetter
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Daniel Vetter @ 2011-09-08 12:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

Quoting Chris Wilson's more concise description:

"Ah I think I see the problem. As you point out we only mask the current
interrupt received, so that if we have a task pending (and so IMR != 0) we
actually unmask the pending interrupt and so could receive it again before the
tasklet is finally kicked off by the grumpy scheduler."

We need the hw to issue PM interrupts A, B, A while the scheduler is hating us
and refuses to run the rps work item. On receiving PM interrupt A we hit the
WARN because

dev_priv->pm_iir == PM_A | PM_B

Also add a posting read as suggested by Chris to ensure proper ordering of the
writes to PMIMR and PMIIR. Just in case somebody weakens write ordering.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9cbb0cd..2fdd9f9 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -536,8 +536,9 @@ static irqreturn_t ivybridge_irq_handler(DRM_IRQ_ARGS)
 		unsigned long flags;
 		spin_lock_irqsave(&dev_priv->rps_lock, flags);
 		WARN(dev_priv->pm_iir & pm_iir, "Missed a PM interrupt\n");
-		I915_WRITE(GEN6_PMIMR, pm_iir);
 		dev_priv->pm_iir |= pm_iir;
+		I915_WRITE(GEN6_PMIMR, dev_priv->pm_iir);
+		POSTING_READ(GEN6_PMIMR);
 		spin_unlock_irqrestore(&dev_priv->rps_lock, flags);
 		queue_work(dev_priv->wq, &dev_priv->rps_work);
 	}
@@ -649,8 +650,9 @@ static irqreturn_t ironlake_irq_handler(DRM_IRQ_ARGS)
 		unsigned long flags;
 		spin_lock_irqsave(&dev_priv->rps_lock, flags);
 		WARN(dev_priv->pm_iir & pm_iir, "Missed a PM interrupt\n");
-		I915_WRITE(GEN6_PMIMR, pm_iir);
 		dev_priv->pm_iir |= pm_iir;
+		I915_WRITE(GEN6_PMIMR, dev_priv->pm_iir);
+		POSTING_READ(GEN6_PMIMR);
 		spin_unlock_irqrestore(&dev_priv->rps_lock, flags);
 		queue_work(dev_priv->wq, &dev_priv->rps_work);
 	}
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-08 12:00 [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler Daniel Vetter
@ 2011-09-08 12:00 ` Daniel Vetter
  2011-09-08 15:19   ` Ben Widawsky
  2011-09-08 15:25   ` [PATCH] " Daniel Vetter
  2011-09-08 12:00 ` [PATCH 3/3] drm/i915: properly cancel rps_work on module unload v2 Daniel Vetter
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Daniel Vetter @ 2011-09-08 12:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

This patch closes the following race:

We get a PM interrupt A, mask it, set dev_priv->iir = PM_A and kick of the
work item. Scheduler isn't grumpy, so the work queue takes rps_lock,
grabs pm_iir = dev_priv->pm_iir and pm_imr = READ(PMIMR). Note that
pm_imr == pm_iir because we've just masked the interrupt we've got.

Now hw sends out PM interrupt B (not masked), we process it and mask
it.  Later on the irq handler also clears PMIIR.

Then the work item proceeds and at the end clears PMIMR. Because
(local) pm_imr == pm_iir we have
        pm_imr & ~pm_iir == 0
so all interrupts are enabled.

Hardware is still interrupt-happy, and sends out a new PM interrupt B.
PMIMR doesn't mask B (it does not mask anything), PMIIR is cleared, so
we get it and hit the WARN in the interrupt handler (because
dev_priv->pm_iir == PM_B).

That's why I've moved the
        WRITE(PMIMR, 0)
up under the protection of the rps_lock. And write an uncoditional 0
to PMIMR, because that's what we'll do anyway.

This races looks much more likely because we can arbitrarily extend
the window by grabing dev->struct mutex right after the irq handler
has processed the first PM_B interrupt.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2fdd9f9..21ebcbd 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -383,6 +383,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	pm_iir = dev_priv->pm_iir;
 	dev_priv->pm_iir = 0;
 	pm_imr = I915_READ(GEN6_PMIMR);
+	I915_WRITE(GEN6_PMIMR, 0);
 	spin_unlock_irq(&dev_priv->rps_lock);
 
 	if (!pm_iir)
@@ -420,7 +421,6 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	 * an *extremely* unlikely race with gen6_rps_enable() that is prevented
 	 * by holding struct_mutex for the duration of the write.
 	 */
-	I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
 	mutex_unlock(&dev_priv->dev->struct_mutex);
 }
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-08 12:00 ` [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func Daniel Vetter
@ 2011-09-08 15:19   ` Ben Widawsky
  2011-09-08 15:25   ` [PATCH] " Daniel Vetter
  1 sibling, 0 replies; 19+ messages in thread
From: Ben Widawsky @ 2011-09-08 15:19 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Sep 08, 2011 at 02:00:21PM +0200, Daniel Vetter wrote:
> This patch closes the following race:
> 
> We get a PM interrupt A, mask it, set dev_priv->iir = PM_A and kick of the
> work item. Scheduler isn't grumpy, so the work queue takes rps_lock,
> grabs pm_iir = dev_priv->pm_iir and pm_imr = READ(PMIMR). Note that
> pm_imr == pm_iir because we've just masked the interrupt we've got.
> 
> Now hw sends out PM interrupt B (not masked), we process it and mask
> it.  Later on the irq handler also clears PMIIR.
> 
> Then the work item proceeds and at the end clears PMIMR. Because
> (local) pm_imr == pm_iir we have
>         pm_imr & ~pm_iir == 0
> so all interrupts are enabled.
> 
> Hardware is still interrupt-happy, and sends out a new PM interrupt B.
> PMIMR doesn't mask B (it does not mask anything), PMIIR is cleared, so
> we get it and hit the WARN in the interrupt handler (because
> dev_priv->pm_iir == PM_B).
> 
> That's why I've moved the
>         WRITE(PMIMR, 0)
> up under the protection of the rps_lock. And write an uncoditional 0
> to PMIMR, because that's what we'll do anyway.
> 
> This races looks much more likely because we can arbitrarily extend
> the window by grabing dev->struct mutex right after the irq handler
> has processed the first PM_B interrupt.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-08 12:00 ` [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func Daniel Vetter
  2011-09-08 15:19   ` Ben Widawsky
@ 2011-09-08 15:25   ` Daniel Vetter
  2011-09-08 20:49     ` Chris Wilson
  1 sibling, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2011-09-08 15:25 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

This patch closes the following race:

We get a PM interrupt A, mask it, set dev_priv->iir = PM_A and kick of the
work item. Scheduler isn't grumpy, so the work queue takes rps_lock,
grabs pm_iir = dev_priv->pm_iir and pm_imr = READ(PMIMR). Note that
pm_imr == pm_iir because we've just masked the interrupt we've got.

Now hw sends out PM interrupt B (not masked), we process it and mask
it.  Later on the irq handler also clears PMIIR.

Then the work item proceeds and at the end clears PMIMR. Because
(local) pm_imr == pm_iir we have
        pm_imr & ~pm_iir == 0
so all interrupts are enabled.

Hardware is still interrupt-happy, and sends out a new PM interrupt B.
PMIMR doesn't mask B (it does not mask anything), PMIIR is cleared, so
we get it and hit the WARN in the interrupt handler (because
dev_priv->pm_iir == PM_B).

That's why I've moved the
        WRITE(PMIMR, 0)
up under the protection of the rps_lock. And write an uncoditional 0
to PMIMR, because that's what we'll do anyway.

This races looks much more likely because we can arbitrarily extend
the window by grabing dev->struct mutex right after the irq handler
has processed the first PM_B interrupt.

v2: Chris Wilson pointed out some dead code and a now misleading
comment to kill.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c |   10 ++--------
 1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2fdd9f9..d5a2188 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -377,12 +377,12 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
 						    rps_work);
 	u8 new_delay = dev_priv->cur_delay;
-	u32 pm_iir, pm_imr;
+	u32 pm_iir;
 
 	spin_lock_irq(&dev_priv->rps_lock);
 	pm_iir = dev_priv->pm_iir;
 	dev_priv->pm_iir = 0;
-	pm_imr = I915_READ(GEN6_PMIMR);
+	I915_WRITE(GEN6_PMIMR, 0);
 	spin_unlock_irq(&dev_priv->rps_lock);
 
 	if (!pm_iir)
@@ -415,12 +415,6 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	gen6_set_rps(dev_priv->dev, new_delay);
 	dev_priv->cur_delay = new_delay;
 
-	/*
-	 * rps_lock not held here because clearing is non-destructive. There is
-	 * an *extremely* unlikely race with gen6_rps_enable() that is prevented
-	 * by holding struct_mutex for the duration of the write.
-	 */
-	I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
 	mutex_unlock(&dev_priv->dev->struct_mutex);
 }
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-08 15:25   ` [PATCH] " Daniel Vetter
@ 2011-09-08 20:49     ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2011-09-08 20:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

On Thu,  8 Sep 2011 17:25:05 +0200, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> This patch closes the following race:
> 
> We get a PM interrupt A, mask it, set dev_priv->iir = PM_A and kick of the
> work item. Scheduler isn't grumpy, so the work queue takes rps_lock,
> grabs pm_iir = dev_priv->pm_iir and pm_imr = READ(PMIMR). Note that
> pm_imr == pm_iir because we've just masked the interrupt we've got.
> 
> Now hw sends out PM interrupt B (not masked), we process it and mask
> it.  Later on the irq handler also clears PMIIR.
> 
> Then the work item proceeds and at the end clears PMIMR. Because
> (local) pm_imr == pm_iir we have
>         pm_imr & ~pm_iir == 0
> so all interrupts are enabled.
> 
> Hardware is still interrupt-happy, and sends out a new PM interrupt B.
> PMIMR doesn't mask B (it does not mask anything), PMIIR is cleared, so
> we get it and hit the WARN in the interrupt handler (because
> dev_priv->pm_iir == PM_B).
> 
> That's why I've moved the
>         WRITE(PMIMR, 0)
> up under the protection of the rps_lock. And write an uncoditional 0
> to PMIMR, because that's what we'll do anyway.
> 
> This races looks much more likely because we can arbitrarily extend
> the window by grabing dev->struct mutex right after the irq handler
> has processed the first PM_B interrupt.
> 
> v2: Chris Wilson pointed out some dead code and a now misleading
> comment to kill.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

I've stared at this a long time to find meaning to the warning about
having to do the IMR write under the mutex. Having found none, this was
also the patch that I wrote for myself to fix the bug identified by
Daniel.

In this case, the bug is simply that the value of PM_IMR may change since
caching it in pm_imr (and similarly dev_priv->pm_iir will have the
correspoding bit now set) and so we may loose an interrupt masking when
setting it later. Again, this opens the possibility for a second
interrupt to arrive and hit the WARN (and kick off a redundant work
function.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/3] drm/i915: properly cancel rps_work on module unload v2
  2011-09-08 12:00 [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler Daniel Vetter
  2011-09-08 12:00 ` [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func Daniel Vetter
@ 2011-09-08 12:00 ` Daniel Vetter
  2011-09-08 20:51   ` Chris Wilson
  2011-09-08 20:43 ` [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler Chris Wilson
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2011-09-08 12:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

The rps disabling code wasn't properly cancelling outstanding work
items. Also add a comment that explains why we're not racing with
the work item that could unmask interrupts - that piece of code
confused me quite a bit.

v2: Ben Widawsky pointed out that the first patch would deadlock
(and a few lesser problems). All corrected.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/intel_display.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 56a8554..2e425be 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -7484,6 +7484,10 @@ void gen6_disable_rps(struct drm_device *dev)
 	I915_WRITE(GEN6_RPNSWREQ, 1 << 31);
 	I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
 	I915_WRITE(GEN6_PMIER, 0);
+	/* Complete PM interrupt masking here doesn't race with the rps work
+	 * item again unmasking PM interrupts because that is using a different
+	 * register (PMIMR) to mask PM interrupts. The only risk is in leaving
+	 * stale bits in PMIIR and PMIMR which gen6_enable_rps will clean up. */
 
 	spin_lock_irq(&dev_priv->rps_lock);
 	dev_priv->pm_iir = 0;
@@ -8478,6 +8482,7 @@ void intel_modeset_cleanup(struct drm_device *dev)
 	 * enqueue unpin/hotplug work. */
 	drm_irq_uninstall(dev);
 	cancel_work_sync(&dev_priv->hotplug_work);
+	cancel_work_sync(&dev_priv->rps_work);
 
 	/* flush any delayed tasks or pending work */
 	flush_scheduled_work();
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] drm/i915: properly cancel rps_work on module unload v2
  2011-09-08 12:00 ` [PATCH 3/3] drm/i915: properly cancel rps_work on module unload v2 Daniel Vetter
@ 2011-09-08 20:51   ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2011-09-08 20:51 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

On Thu,  8 Sep 2011 14:00:22 +0200, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> The rps disabling code wasn't properly cancelling outstanding work
> items. Also add a comment that explains why we're not racing with
> the work item that could unmask interrupts - that piece of code
> confused me quite a bit.
> 
> v2: Ben Widawsky pointed out that the first patch would deadlock
> (and a few lesser problems). All corrected.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

And since presumably the chip is in a completely different thermal
envelope and thawed, waking up the task function is pointless after
resume. ;-)

With the original deadlocks gone,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler
  2011-09-08 12:00 [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler Daniel Vetter
  2011-09-08 12:00 ` [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func Daniel Vetter
  2011-09-08 12:00 ` [PATCH 3/3] drm/i915: properly cancel rps_work on module unload v2 Daniel Vetter
@ 2011-09-08 20:43 ` Chris Wilson
  2011-09-23 16:08 ` Daniel Vetter
  2012-04-15  9:06 ` Chris Wilson
  4 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2011-09-08 20:43 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

On Thu,  8 Sep 2011 14:00:20 +0200, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> Quoting Chris Wilson's more concise description:
> 
> "Ah I think I see the problem. As you point out we only mask the current
> interrupt received, so that if we have a task pending (and so IMR != 0) we
> actually unmask the pending interrupt and so could receive it again before the
> tasklet is finally kicked off by the grumpy scheduler."
> 
> We need the hw to issue PM interrupts A, B, A while the scheduler is hating us
> and refuses to run the rps work item. On receiving PM interrupt A we hit the
> WARN because
> 
> dev_priv->pm_iir == PM_A | PM_B
> 
> Also add a posting read as suggested by Chris to ensure proper ordering of the
> writes to PMIMR and PMIIR. Just in case somebody weakens write ordering.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

The bug Daniel found here is not that we do the reg write two lines too
early, but that we were writing the wrong value into the IMR. The effect
was to unmask an already pending IRQ and so we could hit the WARN. Other
than the WARN, the only other side-effect would be that we would kick
off more work functions than required - the bug should not be affecting
system stability...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler
  2011-09-08 12:00 [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler Daniel Vetter
                   ` (2 preceding siblings ...)
  2011-09-08 20:43 ` [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler Chris Wilson
@ 2011-09-23 16:08 ` Daniel Vetter
  2012-04-15  9:06 ` Chris Wilson
  4 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2011-09-23 16:08 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

Hi Keith,

I think these 3 patches (note that 2/3 has a resend) are ready for -next.
I don't think we need this for -fixes or Cc:stable, imo worst case we miss
a PM interrupt and hit the WARN. Afaik only Jesse reported seeing that on
his ivybrdige machine.

3/3 fixes an unload problem that might lead to an Oops, so that one would
make sense for -fixes. But imho module unload is a developer feature, so I
don't mind much.

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler
  2011-09-08 12:00 [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler Daniel Vetter
                   ` (3 preceding siblings ...)
  2011-09-23 16:08 ` Daniel Vetter
@ 2012-04-15  9:06 ` Chris Wilson
  4 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-04-15  9:06 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

Ping.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43107
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Fix rps irq warning
@ 2011-09-04 15:49 Ben Widawsky
  2011-09-04 15:34 ` [PATCH 0/3] slaughter rps races some more Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Widawsky @ 2011-09-04 15:49 UTC (permalink / raw)
  To: intel-gfx

On Sun, 04 Sep 2011 10:03:21 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Sat,  3 Sep 2011 20:24:15 -0700, Ben Widawsky <ben@bwidawsk.net>
> wrote:
> > I couldn't reproduce this one, but...  Interrupt mask state is lost
> > if three interrupts occur before the workqueue has run.
> > 
> > Should be straight forward to reproduce even without SMP. I'm pretty
> > sure Dan Vetter was trying to explain this to me, and I couldn't
> > get it. My solution I think is different than his though.
> 
> This logic is now duplicated in ivybridge_irq_handler(). This simply
> fits the scenario Daniel described, whilst also fitting in with our
> understanding of IMR, IER and IIR. (A big assumption ;-)
> 
> Reported-by: Soeren Sonnenburg <sonne@debian.org>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
> 

I completely screwed this one up because I was tired. Chris sent me this
exact patch, which I told him was wrong, and then proceeded  to rewrite
the same thing on my own. Chris, I suggest you resubmit what you sent me
privately with my reviewed by, and please accept my apology.

I am nak'ing my patch.

Ben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/3] slaughter rps races some more
  2011-09-04 15:49 [PATCH] drm/i915: Fix rps irq warning Ben Widawsky
@ 2011-09-04 15:34 ` Daniel Vetter
  2011-09-04 15:35   ` [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2011-09-04 15:34 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

Hi all,

Let's have some new ideas for races around the rps code.

Not yet tested (I can't hit the WARN anyway in my testing), I've just
wanted to get this out for discussion.

Yours, Daniel

Daniel Vetter (3):
  drm/i915: close PM interrupt masking races in the irq handler
  drm/i915: close PM interrupt masking races in the rps work func
  drm/i915: close rps work vs. rps disable races

 drivers/gpu/drm/i915/i915_irq.c      |    8 +++++---
 drivers/gpu/drm/i915/intel_display.c |    8 +++++++-
 2 files changed, 12 insertions(+), 4 deletions(-)

-- 
1.7.6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-04 15:34 ` [PATCH 0/3] slaughter rps races some more Daniel Vetter
@ 2011-09-04 15:35   ` Daniel Vetter
  2011-09-04 17:08     ` Ben Widawsky
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2011-09-04 15:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

This patch closes the following race:

We get a PM interrupt A, mask it, set dev_priv->iir = PM_A and kick of the
work item. Scheduler isn't grumpy, so the work queue takes rps_lock,
grabs pm_iir = dev_priv->pm_iir and pm_imr = READ(PMIMR). Note that
pm_imr == pm_iir because we've just masked the interrupt we've got.

Now hw sends out PM interrupt B (not masked), we process it and mask
it.  Later on the irq handler also clears PMIIR.

Then the work item proceeds and at the end clears PMIMR. Because
(local) pm_imr == pm_iir we have
        pm_imr & ~pm_iir == 0
so all interrupts are enabled.

Hardware is still interrupt-happy, and sends out a new PM interrupt B.
PMIMR doesn't mask B (it does not mask anything), PMIIR is cleared, so
we get it and hit the WARN in the interrupt handler (because
dev_priv->pm_iir == PM_B).

That's why I've moved the
        WRITE(PMIMR, 0)
up under the protection of the rps_lock. And write an uncoditional 0
to PMIMR, because that's what we'll do anyway.

This races looks much more likely because we can arbitrarily extend
the window by grabing dev->struct mutex right after the irq handler
has processed the first PM_B interrupt.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_irq.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2fdd9f9..21ebcbd 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -383,6 +383,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	pm_iir = dev_priv->pm_iir;
 	dev_priv->pm_iir = 0;
 	pm_imr = I915_READ(GEN6_PMIMR);
+	I915_WRITE(GEN6_PMIMR, 0);
 	spin_unlock_irq(&dev_priv->rps_lock);

 	if (!pm_iir)
@@ -420,7 +421,6 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	 * an *extremely* unlikely race with gen6_rps_enable() that is prevented
 	 * by holding struct_mutex for the duration of the write.
 	 */
-	I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
 	mutex_unlock(&dev_priv->dev->struct_mutex);
 }

-- 
1.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-04 15:35   ` [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func Daniel Vetter
@ 2011-09-04 17:08     ` Ben Widawsky
  2011-09-04 19:26       ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Widawsky @ 2011-09-04 17:08 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sun,  4 Sep 2011 17:35:01 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> This patch closes the following race:
> 
> We get a PM interrupt A, mask it, set dev_priv->iir = PM_A and kick
> of the work item. Scheduler isn't grumpy, so the work queue takes
> rps_lock, grabs pm_iir = dev_priv->pm_iir and pm_imr = READ(PMIMR).
> Note that pm_imr == pm_iir because we've just masked the interrupt
> we've got.
> 
> Now hw sends out PM interrupt B (not masked), we process it and mask
> it.  Later on the irq handler also clears PMIIR.
> 
> Then the work item proceeds and at the end clears PMIMR. Because
> (local) pm_imr == pm_iir we have
>         pm_imr & ~pm_iir == 0
> so all interrupts are enabled.
> 
> Hardware is still interrupt-happy, and sends out a new PM interrupt B.
> PMIMR doesn't mask B (it does not mask anything), PMIIR is cleared, so
> we get it and hit the WARN in the interrupt handler (because
> dev_priv->pm_iir == PM_B).
> 
> That's why I've moved the
>         WRITE(PMIMR, 0)
> up under the protection of the rps_lock. And write an uncoditional 0
> to PMIMR, because that's what we'll do anyway.
> 
> This races looks much more likely because we can arbitrarily extend
> the window by grabing dev->struct mutex right after the irq handler
> has processed the first PM_B interrupt.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_irq.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c
> b/drivers/gpu/drm/i915/i915_irq.c index 2fdd9f9..21ebcbd 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -383,6 +383,7 @@ static void gen6_pm_rps_work(struct work_struct
> *work) pm_iir = dev_priv->pm_iir;
>  	dev_priv->pm_iir = 0;
>  	pm_imr = I915_READ(GEN6_PMIMR);
> +	I915_WRITE(GEN6_PMIMR, 0);
>  	spin_unlock_irq(&dev_priv->rps_lock);
>  
>  	if (!pm_iir)
> @@ -420,7 +421,6 @@ static void gen6_pm_rps_work(struct work_struct
> *work)
>  	 * an *extremely* unlikely race with gen6_rps_enable() that
> is prevented
>  	 * by holding struct_mutex for the duration of the write.
>  	 */
> -	I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
>  	mutex_unlock(&dev_priv->dev->struct_mutex);
>  }
>  

How about this:

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 55518e3..3bc1479 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -415,12 +415,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
        gen6_set_rps(dev_priv->dev, new_delay);
        dev_priv->cur_delay = new_delay;
 
-       /*
-        * rps_lock not held here because clearing is non-destructive. There is
-        * an *extremely* unlikely race with gen6_rps_enable() that is prevented
-        * by holding struct_mutex for the duration of the write.
-        */
-       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
+       I915_WRITE(GEN6_PMIMR, pm_imr & dev_priv->pm_iir);
        mutex_unlock(&dev_priv->dev->struct_mutex);
 }

Ben

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-04 17:08     ` Ben Widawsky
@ 2011-09-04 19:26       ` Daniel Vetter
  2011-09-04 19:56         ` Ben Widawsky
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2011-09-04 19:26 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx

On Sun, Sep 04, 2011 at 10:08:17AM -0700, Ben Widawsky wrote:
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 55518e3..3bc1479 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -415,12 +415,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
>         gen6_set_rps(dev_priv->dev, new_delay);
>         dev_priv->cur_delay = new_delay;
>  
> -       /*
> -        * rps_lock not held here because clearing is non-destructive. There is
> -        * an *extremely* unlikely race with gen6_rps_enable() that is prevented
> -        * by holding struct_mutex for the duration of the write.
> -        */
> -       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
> +       I915_WRITE(GEN6_PMIMR, pm_imr & dev_priv->pm_iir);
>         mutex_unlock(&dev_priv->dev->struct_mutex);
>  }

For this to work we'd need to hold the rps_lock (to avoid racing with the
irq handler). But imo my approach is conceptually simpler: The work func
grabs all oustanding PM interrupts and then enables them again in one go
(protected by rps_lock). And because the dev_priv->wq workqueue is
single-threaded (no point in using multiple threads when all work items
grab dev->struct mutex) we also cannot make a mess by running work items
in the wrong order (or in parallel).

-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-04 19:26       ` Daniel Vetter
@ 2011-09-04 19:56         ` Ben Widawsky
  2011-09-04 20:10           ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Widawsky @ 2011-09-04 19:56 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx

On Sun, Sep 04, 2011 at 09:26:48PM +0200, Daniel Vetter wrote:
> On Sun, Sep 04, 2011 at 10:08:17AM -0700, Ben Widawsky wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 55518e3..3bc1479 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -415,12 +415,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
> >         gen6_set_rps(dev_priv->dev, new_delay);
> >         dev_priv->cur_delay = new_delay;
> >  
> > -       /*
> > -        * rps_lock not held here because clearing is non-destructive. There is
> > -        * an *extremely* unlikely race with gen6_rps_enable() that is prevented
> > -        * by holding struct_mutex for the duration of the write.
> > -        */
> > -       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
> > +       I915_WRITE(GEN6_PMIMR, pm_imr & dev_priv->pm_iir);
> >         mutex_unlock(&dev_priv->dev->struct_mutex);
> >  }
> 
> For this to work we'd need to hold the rps_lock (to avoid racing with the
> irq handler). But imo my approach is conceptually simpler: The work func
> grabs all oustanding PM interrupts and then enables them again in one go
> (protected by rps_lock).

I agree your approach is similar, but I think we should really consider
whether my approach actually requires the lock. I *think* it doesn't. At
least in my head my patch should fix the error you spotted. I don't
know, maybe I need to think some more.

The reason I worked so hard to avoid doing it the way you did in my
original implementation is I was trying really hard to not break the
cardinal rule about minimizing time holding spinlock_irqs. To go with
the other patch, you probably want a POSTING_READ also before releasing
the spin_lock (though I think this is being a bit paranoid).

Again I need to think on this some more, but I'm also not terribly fond
of ever unconditionally setting a value to 0 as it tends to complicate
things when adding new readers or writers to the system.

In summary, let me think about whether my solution actually won't work
(feel free to contribute to my thinking), and if it doesn't then the
decision is easy.

Ben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-04 19:56         ` Ben Widawsky
@ 2011-09-04 20:10           ` Daniel Vetter
  2011-09-04 21:38             ` Ben Widawsky
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2011-09-04 20:10 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx

On Sun, Sep 04, 2011 at 07:56:57PM +0000, Ben Widawsky wrote:
> On Sun, Sep 04, 2011 at 09:26:48PM +0200, Daniel Vetter wrote:
> > On Sun, Sep 04, 2011 at 10:08:17AM -0700, Ben Widawsky wrote:
> > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > index 55518e3..3bc1479 100644
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -415,12 +415,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
> > >         gen6_set_rps(dev_priv->dev, new_delay);
> > >         dev_priv->cur_delay = new_delay;
> > >  
> > > -       /*
> > > -        * rps_lock not held here because clearing is non-destructive. There is
> > > -        * an *extremely* unlikely race with gen6_rps_enable() that is prevented
> > > -        * by holding struct_mutex for the duration of the write.
> > > -        */
> > > -       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
> > > +       I915_WRITE(GEN6_PMIMR, pm_imr & dev_priv->pm_iir);
> > >         mutex_unlock(&dev_priv->dev->struct_mutex);
> > >  }
> > 
> > For this to work we'd need to hold the rps_lock (to avoid racing with the
> > irq handler). But imo my approach is conceptually simpler: The work func
> > grabs all oustanding PM interrupts and then enables them again in one go
> > (protected by rps_lock).
> 
> I agree your approach is similar, but I think we should really consider
> whether my approach actually requires the lock. I *think* it doesn't. At
> least in my head my patch should fix the error you spotted. I don't
> know, maybe I need to think some more.

1. rps work reads dev_priv->pm_iir (anew in the line you've added).
2. irq handler runs, adds a new bit to dev_priv->pm_iir and sets PMIMR to
dev_priv->pm_iir (under irqsafe rps_lock).
3. rps work writes crap to PMIMR.

I.e. same race, you've just dramatically reduced the window ;-)

> The reason I worked so hard to avoid doing it the way you did in my
> original implementation is I was trying really hard to not break the
> cardinal rule about minimizing time holding spinlock_irqs. To go with
> the other patch, you probably want a POSTING_READ also before releasing
> the spin_lock (though I think this is being a bit paranoid).

There POSTING_READ was to order the PMIMR write with the PMIIR write (both
in the irq handler). There's no such ordering here (and the irq handler
can't be interrupted) so I think we're save.

-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-04 20:10           ` Daniel Vetter
@ 2011-09-04 21:38             ` Ben Widawsky
  2011-09-05  6:38               ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Widawsky @ 2011-09-04 21:38 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx

On Sun, Sep 04, 2011 at 10:10:30PM +0200, Daniel Vetter wrote:
> On Sun, Sep 04, 2011 at 07:56:57PM +0000, Ben Widawsky wrote:
> > On Sun, Sep 04, 2011 at 09:26:48PM +0200, Daniel Vetter wrote:
> > > On Sun, Sep 04, 2011 at 10:08:17AM -0700, Ben Widawsky wrote:
> > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > > index 55518e3..3bc1479 100644
> > > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > > @@ -415,12 +415,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
> > > >         gen6_set_rps(dev_priv->dev, new_delay);
> > > >         dev_priv->cur_delay = new_delay;
> > > >  
> > > > -       /*
> > > > -        * rps_lock not held here because clearing is non-destructive. There is
> > > > -        * an *extremely* unlikely race with gen6_rps_enable() that is prevented
> > > > -        * by holding struct_mutex for the duration of the write.
> > > > -        */
> > > > -       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
> > > > +       I915_WRITE(GEN6_PMIMR, pm_imr & dev_priv->pm_iir);
> > > >         mutex_unlock(&dev_priv->dev->struct_mutex);
> > > >  }
> > > 
> > > For this to work we'd need to hold the rps_lock (to avoid racing with the
> > > irq handler). But imo my approach is conceptually simpler: The work func
> > > grabs all oustanding PM interrupts and then enables them again in one go
> > > (protected by rps_lock).
> > 
> > I agree your approach is similar, but I think we should really consider
> > whether my approach actually requires the lock. I *think* it doesn't. At
> > least in my head my patch should fix the error you spotted. I don't
> > know, maybe I need to think some more.
> 
> 1. rps work reads dev_priv->pm_iir (anew in the line you've added).
> 2. irq handler runs, adds a new bit to dev_priv->pm_iir and sets PMIMR to
> dev_priv->pm_iir (under irqsafe rps_lock).
> 3. rps work writes crap to PMIMR.
> 
> I.e. same race, you've just dramatically reduced the window ;-)
> 
> > The reason I worked so hard to avoid doing it the way you did in my
> > original implementation is I was trying really hard to not break the
> > cardinal rule about minimizing time holding spinlock_irqs. To go with
> > the other patch, you probably want a POSTING_READ also before releasing
> > the spin_lock (though I think this is being a bit paranoid).
> 
> There POSTING_READ was to order the PMIMR write with the PMIIR write (both
> in the irq handler). There's no such ordering here (and the irq handler
> can't be interrupted) so I think we're save.
> 
> -Daniel

Oops, you're totally right, I think I meant:
-       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
+       I915_WRITE(GEN6_PMIMR, dev_priv->pm_iir);

With regarding to the POSTING_READ, the concern I had was if the write
to IMR doesn't land before releasing the spinlock, but I don't feel like
addressing that concern anymore.

Ben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-04 21:38             ` Ben Widawsky
@ 2011-09-05  6:38               ` Daniel Vetter
  2011-09-05  6:51                 ` Ben Widawsky
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2011-09-05  6:38 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx

On Sun, Sep 04, 2011 at 09:38:56PM +0000, Ben Widawsky wrote:
> Oops, you're totally right, I think I meant:
> -       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
> +       I915_WRITE(GEN6_PMIMR, dev_priv->pm_iir);

Imo still racy without the irqsafe rps_lock around it. gcc is free to
compile that into a separate load and store which the irq handler can get
in between and change dev_priv->pm_iir and PMIMR. The race is now only one
instruction wide, though ;-)
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-05  6:38               ` Daniel Vetter
@ 2011-09-05  6:51                 ` Ben Widawsky
  2011-09-05 12:15                   ` Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Widawsky @ 2011-09-05  6:51 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx

On Mon, 5 Sep 2011 08:38:07 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Sun, Sep 04, 2011 at 09:38:56PM +0000, Ben Widawsky wrote:
> > Oops, you're totally right, I think I meant:
> > -       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
> > +       I915_WRITE(GEN6_PMIMR, dev_priv->pm_iir);
> 
> Imo still racy without the irqsafe rps_lock around it. gcc is free to
> compile that into a separate load and store which the irq handler can
> get in between and change dev_priv->pm_iir and PMIMR. The race is now
> only one instruction wide, though ;-)
> -Daniel

You are absolutely correct. The modification to GEN6_PMIMR must be
within the protection of rps_lock.

Ben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func
  2011-09-05  6:51                 ` Ben Widawsky
@ 2011-09-05 12:15                   ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2011-09-05 12:15 UTC (permalink / raw)
  To: Ben Widawsky, Daniel Vetter; +Cc: Daniel Vetter, intel-gfx

On Sun, 4 Sep 2011 23:51:52 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> On Mon, 5 Sep 2011 08:38:07 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Sun, Sep 04, 2011 at 09:38:56PM +0000, Ben Widawsky wrote:
> > > Oops, you're totally right, I think I meant:
> > > -       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
> > > +       I915_WRITE(GEN6_PMIMR, dev_priv->pm_iir);
> > 
> > Imo still racy without the irqsafe rps_lock around it. gcc is free to
> > compile that into a separate load and store which the irq handler can
> > get in between and change dev_priv->pm_iir and PMIMR. The race is now
> > only one instruction wide, though ;-)
> > -Daniel
> 
> You are absolutely correct. The modification to GEN6_PMIMR must be
> within the protection of rps_lock.

Right. However, I don't see why we need to hold the mutex though. Why
are we touching PMIMR in gen6_enable_rps()? Afaics, it is because
gen6_disable_rps() may leave a stale PMIMR (it sets dev_priv->pm_iir to
0, causing the existing work-handler to return early and not touch
PMIMR).

I believe everything is correct if we clear PMIMR on module load, remove
the setting of PMIMR from gen6_enable_rps() and clear PMIMR under the
rps lock at the top of the work handler (which both Daniel and I have
desired to do... ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-04-15  9:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-08 12:00 [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler Daniel Vetter
2011-09-08 12:00 ` [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func Daniel Vetter
2011-09-08 15:19   ` Ben Widawsky
2011-09-08 15:25   ` [PATCH] " Daniel Vetter
2011-09-08 20:49     ` Chris Wilson
2011-09-08 12:00 ` [PATCH 3/3] drm/i915: properly cancel rps_work on module unload v2 Daniel Vetter
2011-09-08 20:51   ` Chris Wilson
2011-09-08 20:43 ` [PATCH 1/3] drm/i915: close PM interrupt masking races in the irq handler Chris Wilson
2011-09-23 16:08 ` Daniel Vetter
2012-04-15  9:06 ` Chris Wilson
  -- strict thread matches above, loose matches on Subject: below --
2011-09-04 15:49 [PATCH] drm/i915: Fix rps irq warning Ben Widawsky
2011-09-04 15:34 ` [PATCH 0/3] slaughter rps races some more Daniel Vetter
2011-09-04 15:35   ` [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func Daniel Vetter
2011-09-04 17:08     ` Ben Widawsky
2011-09-04 19:26       ` Daniel Vetter
2011-09-04 19:56         ` Ben Widawsky
2011-09-04 20:10           ` Daniel Vetter
2011-09-04 21:38             ` Ben Widawsky
2011-09-05  6:38               ` Daniel Vetter
2011-09-05  6:51                 ` Ben Widawsky
2011-09-05 12:15                   ` Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).