From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: Re: [PATCH] drm/i915: close PM interrupt masking races in the rps work func Date: Thu, 08 Sep 2011 21:49:19 +0100 Message-ID: References: <1315483222-2195-2-git-send-email-daniel.vetter@ffwll.ch> <1315495505-1919-1-git-send-email-daniel.vetter@ffwll.ch> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mga03.intel.com (mga03.intel.com [143.182.124.21]) by gabe.freedesktop.org (Postfix) with ESMTP id A9F576B503 for ; Thu, 8 Sep 2011 13:49:22 -0700 (PDT) In-Reply-To: <1315495505-1919-1-git-send-email-daniel.vetter@ffwll.ch> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: intel-gfx@lists.freedesktop.org Cc: Daniel Vetter , Ben Widawsky List-Id: intel-gfx@lists.freedesktop.org On Thu, 8 Sep 2011 17:25:05 +0200, Daniel Vetter wrote: > This patch closes the following race: > > We get a PM interrupt A, mask it, set dev_priv->iir = PM_A and kick of the > work item. Scheduler isn't grumpy, so the work queue takes rps_lock, > grabs pm_iir = dev_priv->pm_iir and pm_imr = READ(PMIMR). Note that > pm_imr == pm_iir because we've just masked the interrupt we've got. > > Now hw sends out PM interrupt B (not masked), we process it and mask > it. Later on the irq handler also clears PMIIR. > > Then the work item proceeds and at the end clears PMIMR. Because > (local) pm_imr == pm_iir we have > pm_imr & ~pm_iir == 0 > so all interrupts are enabled. > > Hardware is still interrupt-happy, and sends out a new PM interrupt B. > PMIMR doesn't mask B (it does not mask anything), PMIIR is cleared, so > we get it and hit the WARN in the interrupt handler (because > dev_priv->pm_iir == PM_B). > > That's why I've moved the > WRITE(PMIMR, 0) > up under the protection of the rps_lock. And write an uncoditional 0 > to PMIMR, because that's what we'll do anyway. > > This races looks much more likely because we can arbitrarily extend > the window by grabing dev->struct mutex right after the irq handler > has processed the first PM_B interrupt. > > v2: Chris Wilson pointed out some dead code and a now misleading > comment to kill. > > Signed-off-by: Daniel Vetter > Reviewed-by: Ben Widawsky Reviewed-by: Chris Wilson I've stared at this a long time to find meaning to the warning about having to do the IMR write under the mutex. Having found none, this was also the patch that I wrote for myself to fix the bug identified by Daniel. In this case, the bug is simply that the value of PM_IMR may change since caching it in pm_imr (and similarly dev_priv->pm_iir will have the correspoding bit now set) and so we may loose an interrupt masking when setting it later. Again, this opens the possibility for a second interrupt to arrive and hit the WARN (and kick off a redundant work function.) -Chris -- Chris Wilson, Intel Open Source Technology Centre