Re: Performance regression from switching lock to rw-sem for anon-vma tree

From: Ingo Molnar <mingo@kernel.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "Shi, Alex" <alex.shi@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Tue, 23 Jul 2013 11:45:13 +0200	[thread overview]
Message-ID: <20130723094513.GA24522@gmail.com> (raw)
In-Reply-To: <1373997195.22432.297.camel@schen9-DESK>

* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> Ingo,
> 
> I tried MCS locking to order the writers but it didn't make much 
> difference on my particular workload. After thinking about this some 
> more, a likely explanation of the performance difference between mutex 
> and rwsem performance is:
> 
> 1) Jobs acquiring mutex put itself on the wait list only after 
> optimistic spinning.  That's only 2% of the time on my test workload so 
> they access the wait list rarely.
> 
> 2) Jobs acquiring rw-sem for write *always* put itself on the wait list 
> first before trying lock stealing and optimistic spinning.  This creates 
> a bottleneck at the wait list, and also more cache bouncing.

Indeed ...

> One possible optimization is to delay putting the writer on the wait 
> list till after optimistic spinning, but we may need to keep track of 
> the number of writers waiting.  We could add a WAIT_BIAS to count for 
> each write waiter and remove the WAIT_BIAS each time a writer job 
> completes.  This is tricky as I'm changing the semantics of the count 
> field and likely will require a number of changes to rwsem code.  Your 
> thoughts on a better way to do this?

Why not just try the delayed addition approach first? The spinning is time 
limited AFAICS, so we don't _have to_ recognize those as writers per se, 
only if the spinning fails and it wants to go on the waitlist. Am I 
missing something?

It will change patterns, it might even change the fairness balance - but 
is a legit change otherwise, especially if it helps performance.

Thanks,

	Ingo