From mboxrd@z Thu Jan  1 00:00:00 1970
From: Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [PATCH] drm/i915: close PM interrupt masking races
	in the rps work func
Date: Thu, 08 Sep 2011 21:49:19 +0100
Message-ID: <d08817$1cnmhf@azsmga001.ch.intel.com>
References: <1315483222-2195-2-git-send-email-daniel.vetter@ffwll.ch>
	<1315495505-1919-1-git-send-email-daniel.vetter@ffwll.ch>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from mga03.intel.com (mga03.intel.com [143.182.124.21])
	by gabe.freedesktop.org (Postfix) with ESMTP id A9F576B503
	for <intel-gfx@lists.freedesktop.org>;
	Thu,  8 Sep 2011 13:49:22 -0700 (PDT)
In-Reply-To: <1315495505-1919-1-git-send-email-daniel.vetter@ffwll.ch>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: intel-gfx@lists.freedesktop.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, Ben Widawsky <ben@bwidawsk.net>
List-Id: intel-gfx@lists.freedesktop.org

On Thu,  8 Sep 2011 17:25:05 +0200, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> This patch closes the following race:
> 
> We get a PM interrupt A, mask it, set dev_priv->iir = PM_A and kick of the
> work item. Scheduler isn't grumpy, so the work queue takes rps_lock,
> grabs pm_iir = dev_priv->pm_iir and pm_imr = READ(PMIMR). Note that
> pm_imr == pm_iir because we've just masked the interrupt we've got.
> 
> Now hw sends out PM interrupt B (not masked), we process it and mask
> it.  Later on the irq handler also clears PMIIR.
> 
> Then the work item proceeds and at the end clears PMIMR. Because
> (local) pm_imr == pm_iir we have
>         pm_imr & ~pm_iir == 0
> so all interrupts are enabled.
> 
> Hardware is still interrupt-happy, and sends out a new PM interrupt B.
> PMIMR doesn't mask B (it does not mask anything), PMIIR is cleared, so
> we get it and hit the WARN in the interrupt handler (because
> dev_priv->pm_iir == PM_B).
> 
> That's why I've moved the
>         WRITE(PMIMR, 0)
> up under the protection of the rps_lock. And write an uncoditional 0
> to PMIMR, because that's what we'll do anyway.
> 
> This races looks much more likely because we can arbitrarily extend
> the window by grabing dev->struct mutex right after the irq handler
> has processed the first PM_B interrupt.
> 
> v2: Chris Wilson pointed out some dead code and a now misleading
> comment to kill.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

I've stared at this a long time to find meaning to the warning about
having to do the IMR write under the mutex. Having found none, this was
also the patch that I wrote for myself to fix the bug identified by
Daniel.

In this case, the bug is simply that the value of PM_IMR may change since
caching it in pm_imr (and similarly dev_priv->pm_iir will have the
correspoding bit now set) and so we may loose an interrupt masking when
setting it later. Again, this opens the possibility for a second
interrupt to arrive and hit the WARN (and kick off a redundant work
function.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre