From mboxrd@z Thu Jan 1 00:00:00 1970 From: Damien Lespiau Subject: Re: [PATCH 1/2] drm/i915: clear up wedged transitions Date: Wed, 5 Dec 2012 14:54:32 +0000 Message-ID: References: <1352909648-21514-1-git-send-email-daniel.vetter@ffwll.ch> <1352996243-15590-1-git-send-email-daniel.vetter@ffwll.ch> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from orsmga101.jf.intel.com (mga06.intel.com [134.134.136.21]) by gabe.freedesktop.org (Postfix) with ESMTP id 50E3FE651A for ; Wed, 5 Dec 2012 06:54:47 -0800 (PST) Received: by mail-ia0-f197.google.com with SMTP id x26so9741041iak.0 for ; Wed, 05 Dec 2012 06:54:33 -0800 (PST) In-Reply-To: <1352996243-15590-1-git-send-email-daniel.vetter@ffwll.ch> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Daniel Vetter Cc: Intel Graphics Development List-Id: intel-gfx@lists.freedesktop.org On Thu, Nov 15, 2012 at 4:17 PM, Daniel Vetter wrote: > We have two important transitions of the wedged state in the current > code: > > - 0 -> 1: This means a hang has been detected, and signals to everyone > that they please get of any locks, so that the reset work item can > do its job. > > - 1 -> 0: The reset handler has completed. > > Now the last transition mixes up two states: "Reset completed and > successful" and "Reset failed". To distinguish these two we do some > tricks with the reset completion, but I simply could not convince > myself that this doesn't race under odd circumstances. > > Hence split this up, and add a new terminal state indicating that the > hw is gone for good. > > Also add explicit #defines for both states, update comments. > > v2: Split out the reset handling bugfix for the throttle ioctl. > > v3: s/tmp/wedged/ sugested by Chris Wilson. Also fixup up a rebase > error which prevented this patch from actually compiling. > > v4: To unify the wedged state with the reset counter, keep the > reset-in-progress state just as a flag. The terminally-wedged state is > now denoted with a big number. > > v5: Add a comment to the reset_counter special values explaining that > WEDGED & RESET_IN_PROGRESS needs to be true for the code to be > correct. > > v6: Fixup logic errors introduced with the wedged+reset_counter > unification. Since WEDGED implies reset-in-progress (in a way we're > terminally stuck in the dead-but-reset-not-completed state), we need > ensure that we check for this everywhere. The specific bug was in > wait_for_error, which would simply have timed out. > > v7: Extract an inline i915_reset_in_progress helper to make the code > more readable. Also annote the reset-in-progress case with an > unlikely, to help the compiler optimize the fastpath. Do the same for > the terminally wedged case with i915_terminally_wedged. > > Signed-Off-by: Daniel Vetter Right, so the usage of a wait queue makes the code quite a bit more understandable I had to scratch my head for quite a bit with the x->done poking. I think I'd have love to see the "completion -> wait_queue + 2 reset states" and the "gpu_error.wedge -> gpu_error.reset_counter" transitions as two different patches (well, I did because of the 2 versions sent, so all is good) Reviewed-by: Damien Lespiau -- Damien