Intel-GFX Archive on lore.kernel.org
 help / color / Atom feed
From: Ben Widawsky <ben@bwidawsk.net>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	intel-gfx <intel-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/i915: kicking rings considered harmful
Date: Tue, 27 Sep 2011 12:38:59 -0700
Message-ID: <20110927123859.5cd58ba8@bwidawsk.net> (raw)
In-Reply-To: <20110927180317.GC2785@phenom.ffwll.local>

On Tue, 27 Sep 2011 20:03:17 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Sep 27, 2011 at 06:31:59PM +0100, Chris Wilson wrote:
> > On Tue, 27 Sep 2011 09:46:14 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > > On Tue, 27 Sep 2011 12:03:22 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > 
> > > > On Mon, Sep 26, 2011 at 10:22:01PM -0700, Ben Widawsky wrote:
> > > > > On Mon, 26 Sep 2011 19:59:50 +0200
> > > > > Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c
> > > > > > b/drivers/gpu/drm/i915/i915_irq.c index da5d607..09c11e4 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > > > > @@ -1694,7 +1694,7 @@ void i915_hangcheck_elapsed(unsigned long data)
> > > > > >  		if (dev_priv->hangcheck_count++ > 1) {
> > > > > >  			DRM_ERROR("Hangcheck timer elapsed... GPU
> > > > > > hung\n"); 
> > > > > > -			if (!IS_GEN2(dev)) {
> > > > > > +			if (!IS_GEN2(dev) && i915_try_reset) {
> > > > > >  				/* Is the chip hanging on a
> > > > > > WAIT_FOR_EVENT?
> > > > > >  				 * If so we can simply poke the
> > > > > > RB_WAIT bit
> > > > > >  				 * and break the hang. This should
> > > > > > work on
> > > > > 
> > > > > I think you should also be able to accomplish the same thing
> > > > > with enable_hangcheck param. I had the same problem with the
> > > > > debugger :)
> > > > 
> > > > I agree. Iirc you have some patches floating in that area to make the
> > > > hangcheck a bit more robust. Can you maybe add this to that series and
> > > > (re-)submit?
> > > > 
> > > > Cheers, Daniel
> > > 
> > > While 9/10 times daniel > ben, I'm playing my 10% card here and
> > > suggesting that mixing the reset variable and ring kick is not the right
> > > way to go about this.
> > 
> > One purpose of the i915.reset parameter is to disable any automatic
> > attempts to recover from a hang condition so that the error state is not
> > misleading. So preventing the kick ring does help in that regard.
> > 
> > A second purpose is to prevent i915_reset() from causing havoc and hanging
> > the machine. Daniel is implying that kicking the rings is instrumental in
> > making matters worse. Again using i915.reset to prevent kicking the rings
> > fits in with that purpose.
> > 
> > Since I regard kicking rings as a form of reset, I don't see it as a
> > conflation of terms and so a valid use of i915.reset.
> 
> Couldn't have said it any better. The bad effects of kicking stuck rings
> is mostly that when we have a sync problem there's a decent chance
> somebody has written garbage into our batchbuffers. Continously trying to
> execute said garbage is just tempting faith in the gpu's error resilience.
> -Daniel

If we do this we lose the possibility to kick rings, but not reset the
GPU (not that I find that terribly useful. If we do this, it does fire a
wq event, but I don't see a problem with that for this case.

I think I would rather do this:
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 012732b..803524e 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1698,6 +1698,10 @@ void i915_hangcheck_elapsed(unsigned long data)
                if (dev_priv->hangcheck_count++ > 1) {
                        DRM_ERROR("Hangcheck timer elapsed... GPU hung\n");
 
+                       /* Save off error state before kicking the rings and
+                        * possibly ruining the GPU state.
+                        */
+                       i915_handle_error(dev, true);
                        if (!IS_GEN2(dev)) {
                                /* Is the chip hanging on a WAIT_FOR_EVENT?
                                 * If so we can simply poke the RB_WAIT bit
@@ -1717,7 +1721,6 @@ void i915_hangcheck_elapsed(unsigned long data)
                                        goto repeat;
                        }
 
-                       i915_handle_error(dev, true);
                        return;
                }
        } else {

  reply index

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-13  5:38 [2.6.39 regression] hard lock when GNOME starts Andrew Lutomirski
2011-05-13 16:07 ` Andrew Lutomirski
2011-05-13 16:14   ` [PATCH] drm/i915: Revert i915.semaphore=1 default from 47ae63e0 Andy Lutomirski
2011-05-15 23:09     ` Keith Packard
2011-05-19 19:56     ` Keith Packard
2011-05-19 20:50       ` Andrew Lutomirski
2011-05-24 17:10         ` Andrew Lutomirski
2011-05-24 17:46           ` Keith Packard
2011-05-24 20:05           ` Ivan Bulatovic
2011-06-07  7:12         ` Eric Anholt
2011-06-10 14:06           ` Andrew Lutomirski
2011-08-22 16:53             ` Jesse Barnes
2011-08-31 18:24               ` Ben Widawsky
2011-08-31 18:30               ` Andrew Lutomirski
2011-08-31 19:07                 ` Keith Packard
2011-08-31 19:37                   ` Andrew Lutomirski
2011-09-26 17:59                     ` [PATCH] drm/i915: kicking rings considered harmful Daniel Vetter
2011-09-26 19:07                       ` Andrew Lutomirski
2011-09-27  9:57                         ` Daniel Vetter
2011-09-27  5:22                       ` Ben Widawsky
2011-09-27 10:03                         ` Daniel Vetter
2011-09-27 16:46                           ` Ben Widawsky
2011-09-27 17:31                             ` Chris Wilson
2011-09-27 18:03                               ` Daniel Vetter
2011-09-27 19:38                                 ` Ben Widawsky [this message]
2011-09-27 21:54                                   ` Chris Wilson
2011-09-28  1:34                                     ` Ben Widawsky
2011-09-28  8:47                                       ` Chris Wilson
2011-09-28  8:53                                         ` Daniel Vetter
2011-10-03 20:21                                           ` Andrew Lutomirski
2011-10-03 21:02                                             ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110927123859.5cd58ba8@bwidawsk.net \
    --to=ben@bwidawsk.net \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Intel-GFX Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/intel-gfx/0 intel-gfx/git/0.git
	git clone --mirror https://lore.kernel.org/intel-gfx/1 intel-gfx/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 intel-gfx intel-gfx/ https://lore.kernel.org/intel-gfx \
		intel-gfx@lists.freedesktop.org
	public-inbox-index intel-gfx

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.intel-gfx


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git