All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "Shi, Alex" <alex.shi@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Tue, 30 Jul 2013 21:24:55 +0200	[thread overview]
Message-ID: <20130730192455.GA22259@gmail.com> (raw)
In-Reply-To: <1375143209.22432.419.camel@schen9-DESK>


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> On Tue, 2013-07-23 at 11:53 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Tue, Jul 23, 2013 at 11:45:13AM +0200, Ingo Molnar wrote:
> > >
> > > > Why not just try the delayed addition approach first? The spinning is 
> > > > time limited AFAICS, so we don't _have to_ recognize those as writers 
> > > > per se, only if the spinning fails and it wants to go on the waitlist. 
> > > > Am I missing something?
> > > > 
> > > > It will change patterns, it might even change the fairness balance - 
> > > > but is a legit change otherwise, especially if it helps performance.
> > > 
> > > Be very careful here. Some people (XFS) have very specific needs. Walken 
> > > and dchinner had a longish discussion on this a while back.
> > 
> > Agreed - yet it's worth at least trying it out the quick way, to see the 
> > main effect and to see whether that explains the performance assymetry and 
> > invest more effort into it.
> > 
> 
> Ingo & Peter,
> 
> Here's a patch that moved optimistic spinning of the 
> writer lock acquisition before putting the writer on the 
> wait list.  It did give me a 5% performance boost on my 
> exim mail server workload. It recovered a good portion of 
> the 8% performance regression from mutex implementation.

Very nice detective work!

> I think we are on the right track. Let me know if there's 
> anything in the patch that may cause grief to XFS.
> 
> There is some further optimization possible.  We went to 
> the failed path within __down_write if the count field is 
> non zero. But in fact if the old count field was 
> RWSEM_WAITING_BIAS, there's no one active and we could 
> have stolen the lock when we perform our atomic op on the 
> count field in __down_write. Yet we go to the failed path 
> in the current code.
> 
> I will combine this change and also Alex's patches on 
> rwsem together in a patchset later.
> 
> Your comments and thoughts are most welcomed.
> 
> Tim
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>

> +config RWSEM_SPIN_ON_WRITE_OWNER
> +	bool "Optimistic spin write acquisition for writer owned rw-sem"
> +	default n
> +	depends on SMP
> +	help
> +	  Allows a writer to perform optimistic spinning if another writer own
> +	  the read write semaphore.  If the lock owner is running, it is likely
> +	  to release the lock soon. Spinning gives a greater chance for writer to
> +	  acquire a semaphore before putting it to sleep.

The way you've worded this new Kconfig option makes it 
sound as if it's not just for testing, and I'm not a big 
believer in extra Kconfig degrees of freedom for 
scalability features of core locking primitives like 
rwsems, in production kernels ...

So the bad news is that such scalability optimizations 
really need to work for everyone.

The good news is that I don't think there's anything 
particularly controversial about making the rwsem write 
side perform just as well as mutexes - it would in fact be 
a very nice quality of implementation feature: it gives 
people freedom to switch between mutexes and rwsems without 
having to worry about scalability differences too much.

Once readers are mixed into the workload can we keep the 
XFS assumptions, if they are broken in any way?

We are spinning here so we have full awareness about the 
state of the lock and we can react to a new reader in very 
short order - so at a quick glance I don't see any 
fundamental difficulty in being able to resolve it - beyond 
the SMOP aspect that is ... :-)

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "Shi, Alex" <alex.shi@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Tue, 30 Jul 2013 21:24:55 +0200	[thread overview]
Message-ID: <20130730192455.GA22259@gmail.com> (raw)
In-Reply-To: <1375143209.22432.419.camel@schen9-DESK>


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> On Tue, 2013-07-23 at 11:53 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Tue, Jul 23, 2013 at 11:45:13AM +0200, Ingo Molnar wrote:
> > >
> > > > Why not just try the delayed addition approach first? The spinning is 
> > > > time limited AFAICS, so we don't _have to_ recognize those as writers 
> > > > per se, only if the spinning fails and it wants to go on the waitlist. 
> > > > Am I missing something?
> > > > 
> > > > It will change patterns, it might even change the fairness balance - 
> > > > but is a legit change otherwise, especially if it helps performance.
> > > 
> > > Be very careful here. Some people (XFS) have very specific needs. Walken 
> > > and dchinner had a longish discussion on this a while back.
> > 
> > Agreed - yet it's worth at least trying it out the quick way, to see the 
> > main effect and to see whether that explains the performance assymetry and 
> > invest more effort into it.
> > 
> 
> Ingo & Peter,
> 
> Here's a patch that moved optimistic spinning of the 
> writer lock acquisition before putting the writer on the 
> wait list.  It did give me a 5% performance boost on my 
> exim mail server workload. It recovered a good portion of 
> the 8% performance regression from mutex implementation.

Very nice detective work!

> I think we are on the right track. Let me know if there's 
> anything in the patch that may cause grief to XFS.
> 
> There is some further optimization possible.  We went to 
> the failed path within __down_write if the count field is 
> non zero. But in fact if the old count field was 
> RWSEM_WAITING_BIAS, there's no one active and we could 
> have stolen the lock when we perform our atomic op on the 
> count field in __down_write. Yet we go to the failed path 
> in the current code.
> 
> I will combine this change and also Alex's patches on 
> rwsem together in a patchset later.
> 
> Your comments and thoughts are most welcomed.
> 
> Tim
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>

> +config RWSEM_SPIN_ON_WRITE_OWNER
> +	bool "Optimistic spin write acquisition for writer owned rw-sem"
> +	default n
> +	depends on SMP
> +	help
> +	  Allows a writer to perform optimistic spinning if another writer own
> +	  the read write semaphore.  If the lock owner is running, it is likely
> +	  to release the lock soon. Spinning gives a greater chance for writer to
> +	  acquire a semaphore before putting it to sleep.

The way you've worded this new Kconfig option makes it 
sound as if it's not just for testing, and I'm not a big 
believer in extra Kconfig degrees of freedom for 
scalability features of core locking primitives like 
rwsems, in production kernels ...

So the bad news is that such scalability optimizations 
really need to work for everyone.

The good news is that I don't think there's anything 
particularly controversial about making the rwsem write 
side perform just as well as mutexes - it would in fact be 
a very nice quality of implementation feature: it gives 
people freedom to switch between mutexes and rwsems without 
having to worry about scalability differences too much.

Once readers are mixed into the workload can we keep the 
XFS assumptions, if they are broken in any way?

We are spinning here so we have full awareness about the 
state of the lock and we can react to a new reader in very 
short order - so at a quick glance I don't see any 
fundamental difficulty in being able to resolve it - beyond 
the SMOP aspect that is ... :-)

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-07-30 19:25 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13 23:26 Performance regression from switching lock to rw-sem for anon-vma tree Tim Chen
2013-06-13 23:26 ` Tim Chen
2013-06-19 13:16 ` Ingo Molnar
2013-06-19 13:16   ` Ingo Molnar
2013-06-19 16:53   ` Tim Chen
2013-06-19 16:53     ` Tim Chen
2013-06-26  0:19     ` Tim Chen
2013-06-26  0:19       ` Tim Chen
2013-06-26  9:51       ` Ingo Molnar
2013-06-26  9:51         ` Ingo Molnar
2013-06-26 21:36         ` Tim Chen
2013-06-26 21:36           ` Tim Chen
2013-06-27  0:25           ` Tim Chen
2013-06-27  0:25             ` Tim Chen
2013-06-27  8:36             ` Ingo Molnar
2013-06-27  8:36               ` Ingo Molnar
2013-06-27 20:53               ` Tim Chen
2013-06-27 20:53                 ` Tim Chen
2013-06-27 23:31                 ` Tim Chen
2013-06-27 23:31                   ` Tim Chen
2013-06-28  9:38                   ` Ingo Molnar
2013-06-28  9:38                     ` Ingo Molnar
2013-06-28 21:04                     ` Tim Chen
2013-06-28 21:04                       ` Tim Chen
2013-06-29  7:12                       ` Ingo Molnar
2013-06-29  7:12                         ` Ingo Molnar
2013-07-01 20:28                         ` Tim Chen
2013-07-01 20:28                           ` Tim Chen
2013-07-02  6:45                           ` Ingo Molnar
2013-07-02  6:45                             ` Ingo Molnar
2013-07-16 17:53                             ` Tim Chen
2013-07-16 17:53                               ` Tim Chen
2013-07-23  9:45                               ` Ingo Molnar
2013-07-23  9:45                                 ` Ingo Molnar
2013-07-23  9:51                                 ` Peter Zijlstra
2013-07-23  9:51                                   ` Peter Zijlstra
2013-07-23  9:53                                   ` Ingo Molnar
2013-07-23  9:53                                     ` Ingo Molnar
2013-07-30  0:13                                     ` Tim Chen
2013-07-30  0:13                                       ` Tim Chen
2013-07-30 19:24                                       ` Ingo Molnar [this message]
2013-07-30 19:24                                         ` Ingo Molnar
2013-08-05 22:08                                         ` Tim Chen
2013-08-05 22:08                                           ` Tim Chen
2013-07-30 19:59                                       ` Davidlohr Bueso
2013-07-30 19:59                                         ` Davidlohr Bueso
2013-07-30 20:34                                         ` Tim Chen
2013-07-30 20:34                                           ` Tim Chen
2013-07-30 21:45                                           ` Davidlohr Bueso
2013-07-30 21:45                                             ` Davidlohr Bueso
2013-08-06 23:55                                       ` Davidlohr Bueso
2013-08-06 23:55                                         ` Davidlohr Bueso
2013-08-07  0:56                                         ` Tim Chen
2013-08-07  0:56                                           ` Tim Chen
2013-08-12 18:52                                           ` Ingo Molnar
2013-08-12 18:52                                             ` Ingo Molnar
2013-08-12 20:10                                             ` Tim Chen
2013-08-12 20:10                                               ` Tim Chen
2013-06-28  9:20                 ` Ingo Molnar
2013-06-28  9:20                   ` Ingo Molnar
     [not found] <1371165333.27102.568.camel@schen9-DESK>
     [not found] ` <1371167015.1754.14.camel@buesod1.americas.hpqcorp.net>
2013-06-14 16:09   ` Tim Chen
2013-06-14 16:09     ` Tim Chen
2013-06-14 22:31     ` Davidlohr Bueso
2013-06-14 22:31       ` Davidlohr Bueso
2013-06-14 22:44       ` Tim Chen
2013-06-14 22:44         ` Tim Chen
2013-06-14 22:47       ` Michel Lespinasse
2013-06-14 22:47         ` Michel Lespinasse
2013-06-17 22:27         ` Tim Chen
2013-06-17 22:27           ` Tim Chen
2013-06-16  9:50   ` Alex Shi
2013-06-16  9:50     ` Alex Shi
2013-06-17 16:22     ` Davidlohr Bueso
2013-06-17 16:22       ` Davidlohr Bueso
2013-06-17 18:45       ` Tim Chen
2013-06-17 18:45         ` Tim Chen
2013-06-17 19:05         ` Davidlohr Bueso
2013-06-17 19:05           ` Davidlohr Bueso
2013-06-17 22:28           ` Tim Chen
2013-06-17 22:28             ` Tim Chen
2013-06-17 23:18         ` Alex Shi
2013-06-17 23:18           ` Alex Shi
2013-06-17 23:20       ` Alex Shi
2013-06-17 23:20         ` Alex Shi
2013-06-17 23:35         ` Davidlohr Bueso
2013-06-17 23:35           ` Davidlohr Bueso
2013-06-18  0:08           ` Tim Chen
2013-06-18  0:08             ` Tim Chen
2013-06-19 23:11             ` Davidlohr Bueso
2013-06-19 23:11               ` Davidlohr Bueso
2013-06-19 23:24               ` Tim Chen
2013-06-19 23:24                 ` Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130730192455.GA22259@gmail.com \
    --to=mingo@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=andi@firstfloor.org \
    --cc=dave.hansen@intel.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mgorman@suse.de \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.