From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [PATCH 4/7] drm/i915: no hangcheck when reset is in progress Date: Tue, 16 Jul 2013 10:45:59 +0200 Message-ID: <20130716084559.GO5784@phenom.ffwll.local> References: <1372861332-6308-1-git-send-email-mika.kuoppala@intel.com> <1372861332-6308-5-git-send-email-mika.kuoppala@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-ee0-f42.google.com (mail-ee0-f42.google.com [74.125.83.42]) by gabe.freedesktop.org (Postfix) with ESMTP id 29B11E5CD7 for ; Tue, 16 Jul 2013 01:45:58 -0700 (PDT) Received: by mail-ee0-f42.google.com with SMTP id c4so208433eek.29 for ; Tue, 16 Jul 2013 01:45:57 -0700 (PDT) Content-Disposition: inline In-Reply-To: <1372861332-6308-5-git-send-email-mika.kuoppala@intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Mika Kuoppala Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Wed, Jul 03, 2013 at 05:22:09PM +0300, Mika Kuoppala wrote: > From: Mika Kuoppala > > The timer for hangchecking can run again before the previous > reset it has triggered has been handled. This can corrupt > the hangcheck state as reset handling will access and write to > the hangcheck data. To prevent this, avoid running the hangcheck > logic while reset is in progress. > > Signed-off-by: Mika Kuoppala > Reviewed-by: Ben Widawsky > --- > drivers/gpu/drm/i915/i915_irq.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index dc1b878..b0fec7f 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -2447,6 +2447,9 @@ void i915_hangcheck_elapsed(unsigned long data) > if (!i915_enable_hangcheck) > return; > > + if (i915_reset_in_progress(&dev_priv->gpu_error)) > + return; atomic_t are weakly-ordered (I know, we're on x86 here ...) and I think we're missing the requisite amount of barriers to keep races at bay. I think wrapping up all access to the hangcheck stats (both from the hangcheck timer and in the reset code and eventually in the readout code) with an irq-save spinlock is the simpler and much more obviously correct (and hence safer) option. Note that you can't optimize away the irq save/restore stuff in the timer since we call the hangcheck stuff also from hardirq context. So I'll punt on this patch here for now, merged all the others leading up to this one here. Thanks, Daniel > + > for_each_ring(ring, dev_priv, i) { > u32 seqno, acthd; > bool busy = true; > -- > 1.7.9.5 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch