From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ben Widawsky <ben@bwidawsk.net>
Subject: Re: force wake reference counting (another try)
Date: Tue, 12 Apr 2011 09:30:22 -0700
Message-ID: <20110412163022.GA12791@snipes.kumite>
References: <1302570079-17032-1-git-send-email-ben@bwidawsk.net>
	<1bdc18$k6h5tu@fmsmga002.fm.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from cloud01.chad-versace.us (184-106-247-128.static.cloud-ips.com
	[184.106.247.128])
	by gabe.freedesktop.org (Postfix) with ESMTP id 060959E773
	for <intel-gfx@lists.freedesktop.org>;
	Tue, 12 Apr 2011 09:32:20 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <1bdc18$k6h5tu@fmsmga002.fm.intel.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
List-Id: intel-gfx@lists.freedesktop.org

On Tue, Apr 12, 2011 at 09:02:21AM +0100, Chris Wilson wrote:
> On Mon, 11 Apr 2011 18:01:15 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > So once again, I expect this patch to potentially generate a lot of
> > warnings, but I consider all of those warnings to be serious bugs for
> > SNB.
> > 
> > If anyone has clever ideas on how to handle this outside of what I've
> > already mentioned, please let me know.
> > 
> > I expect ongoing patches to fix these issues as they come up.
> 
> Continuing the general discussion from IRC, we really do need to clarify
> which locks we expect to be holding when reading different sets of
> registers.  Along with similar documentation over which parts of our
> structures are covered by struct_mutex, mode_config.lock and the ensemble
> of spinlocks.  This task is not limited to just our driver, but we need to
> review the core as well as looking at how to improve the locking overall.
> (The clear example that something sucks is that probing outputs causes a
> rendering stall (fortunately less noticeable on recent hardware where
> hotplug is prevalent and the polling itself is much quicker, but still
> present.)

No doubt about it. It seems quite difficult to do with the amount of
change from generation to generation. It looks like we even have
register changes between steppings.

> 
> The only concern I have with the extra warnings are if we leave false
> positives in the code we submit upstream. Those warnings will quickly
> become regression reports and angry users. Alas, no clever ideas.
> -Chris

I am going to spend at least a day tracking down, and hopefully fixing
warnings if you agree with my next statement that it is in fact a
problem. My hope is there aren't too many cases.

The assertion I'm making is these are not false positives. I don't have
data yet to suggest this is happening, and I don't know the code well
enough to have a hunch one way or the other. However, here are two
possible examples of why it's broken with both the current, and
refcounted code.

Current:
Thread A acquires struct_mutex
Thread A goes to wake up the GT (takes a while)
Thread B acquires congif.mutex
Thread A finishes wake up, does the read
Thread B goes to wake up the GT (gets out fast because it's awake)
Thread A finishes, puts the GT to sleep
Thread B reads are potentially corrupted from here on

Refcounted:
Thread A acquires struct_mutex
Thread A increments refcounts, starts waking GT
Thread B acquires config.mutex
Thread B goes to wake GT, refcount is 1, does the reads before GT is
	actually awake

Ben