From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Widawsky Subject: Re: force wake reference counting (another try) Date: Tue, 12 Apr 2011 09:30:22 -0700 Message-ID: <20110412163022.GA12791@snipes.kumite> References: <1302570079-17032-1-git-send-email-ben@bwidawsk.net> <1bdc18$k6h5tu@fmsmga002.fm.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from cloud01.chad-versace.us (184-106-247-128.static.cloud-ips.com [184.106.247.128]) by gabe.freedesktop.org (Postfix) with ESMTP id 060959E773 for ; Tue, 12 Apr 2011 09:32:20 -0700 (PDT) Content-Disposition: inline In-Reply-To: <1bdc18$k6h5tu@fmsmga002.fm.intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Tue, Apr 12, 2011 at 09:02:21AM +0100, Chris Wilson wrote: > On Mon, 11 Apr 2011 18:01:15 -0700, Ben Widawsky wrote: > > So once again, I expect this patch to potentially generate a lot of > > warnings, but I consider all of those warnings to be serious bugs for > > SNB. > > > > If anyone has clever ideas on how to handle this outside of what I've > > already mentioned, please let me know. > > > > I expect ongoing patches to fix these issues as they come up. > > Continuing the general discussion from IRC, we really do need to clarify > which locks we expect to be holding when reading different sets of > registers. Along with similar documentation over which parts of our > structures are covered by struct_mutex, mode_config.lock and the ensemble > of spinlocks. This task is not limited to just our driver, but we need to > review the core as well as looking at how to improve the locking overall. > (The clear example that something sucks is that probing outputs causes a > rendering stall (fortunately less noticeable on recent hardware where > hotplug is prevalent and the polling itself is much quicker, but still > present.) No doubt about it. It seems quite difficult to do with the amount of change from generation to generation. It looks like we even have register changes between steppings. > > The only concern I have with the extra warnings are if we leave false > positives in the code we submit upstream. Those warnings will quickly > become regression reports and angry users. Alas, no clever ideas. > -Chris I am going to spend at least a day tracking down, and hopefully fixing warnings if you agree with my next statement that it is in fact a problem. My hope is there aren't too many cases. The assertion I'm making is these are not false positives. I don't have data yet to suggest this is happening, and I don't know the code well enough to have a hunch one way or the other. However, here are two possible examples of why it's broken with both the current, and refcounted code. Current: Thread A acquires struct_mutex Thread A goes to wake up the GT (takes a while) Thread B acquires congif.mutex Thread A finishes wake up, does the read Thread B goes to wake up the GT (gets out fast because it's awake) Thread A finishes, puts the GT to sleep Thread B reads are potentially corrupted from here on Refcounted: Thread A acquires struct_mutex Thread A increments refcounts, starts waking GT Thread B acquires config.mutex Thread B goes to wake GT, refcount is 1, does the reads before GT is actually awake Ben