From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [PATCH] drm/i915: Rework GPU reset sequence to match driver load & thaw Date: Wed, 30 Jul 2014 23:00:31 +0200 Message-ID: <20140730210030.GC8727@phenom.ffwll.local> References: <1405523159-8502-1-git-send-email-alistair.mcaulay@intel.com> <20140726010528.GA11547@bwidawsk.net> <20140728092638.GA4747@phenom.ffwll.local> <20140729073633.GC21570@nuc-i3427.alporthouse.com> <20140729103242.GN4747@phenom.ffwll.local> <2F6A3166A8653C4D914E172D478C0F012E4E2519@IRSMSX105.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by gabe.freedesktop.org (Postfix) with ESMTP id 6DEED6E419 for ; Wed, 30 Jul 2014 14:00:23 -0700 (PDT) Received: by mail-wi0-f178.google.com with SMTP id hi2so3040367wib.5 for ; Wed, 30 Jul 2014 14:00:22 -0700 (PDT) Content-Disposition: inline In-Reply-To: <2F6A3166A8653C4D914E172D478C0F012E4E2519@IRSMSX105.ger.corp.intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: "Mcaulay, Alistair" Cc: Ben Widawsky , "intel-gfx@lists.freedesktop.org" List-Id: intel-gfx@lists.freedesktop.org On Wed, Jul 30, 2014 at 04:59:33PM +0000, Mcaulay, Alistair wrote: > Hi Daniel, > > could you please be clearer on the change you mean. I think you mean something functionally equivalent to the code below, but done in a less hacky way. > (This slight change has made no change to test results) > Or is the idea to return at a different point to this? > I couldn't find " dev_priv->mm.reload_in_reset or similar" in the code. The only thing I can find is error->reset_counter, > which is used in check_wedge(). Bottom bit set means RESET_IN_PROGRESS, top bit means WEDGED Well I've meant that you have to add a new dev_prive->mm.realod_in_reset. And the below won't work since in all other places but when doing a gpu reset we want the -EAGAIN to reach callers. Actually it's really important that if we have an -EGAIN we don't eat it. And I guess the check for mm.reload_in_reset should actually be in gem_check_wedged. -Daniel > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -1832,7 +1832,9 @@ int intel_ring_begin(struct intel_engine_cs > *ring, > > ret = i915_gem_check_wedge(&dev_priv->gpu_error, > dev_priv->mm.interruptible); > - if (ret) > + > + /* -EAGAIN means a reset is in progress, it is Ok to return */ > + if (ret == -EAGAIN) > + return 0; > + if (ret) > + return ret; > > ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t)); > > Alistair. > > -----Original Message----- > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of Daniel Vetter > Sent: Tuesday, July 29, 2014 11:33 AM > To: Chris Wilson; Daniel Vetter; Ben Widawsky; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Rework GPU reset sequence to match driver load & thaw > > On Tue, Jul 29, 2014 at 08:36:33AM +0100, Chris Wilson wrote: > > On Mon, Jul 28, 2014 at 11:26:38AM +0200, Daniel Vetter wrote: > > > Oh, I guess that's the tricky bit why the old approach never worked > > > - because reset_in_progress is set we failed the context/ppgtt > > > loading through the rings and screwed up. > > > > > > Problem with your approach is that we want to bail out here if a > > > reset is in progress, so we can't just eat the EAGAIN. If we do that > > > we potentially deadlock or overflow the ring. > > > > > > I think we need a different hack here, and a few layers down (i.e. > > > at the place where we actually generate that offending -EAGAIN). > > > > > > - Around the re-init sequence in the reset function we set > > > dev_priv->mm.reload_in_reset or similar > > . Since we hold dev->struct_mutex > > > no one will see that, as long as we never leak it out of the critical > > > section. > > > > > > - In the ring_begin code that checks for gpu hangs we ignore > > > reset_in_progress if this bit is set. > > > > > > - Both places need fairly big comments to explain what exactly is going > > > on. > > > > This is going from bad to worse. I think you can do better if you > > looked at the problem afresh. > > Well we can't really reset reset_in_progress at that point, since not all reset is done yet. Especially the modeset stuff. So I don't think that reordering the reset sequence would get us out of this ugly spot. And I don't see any other solution really. Do you? > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch