All of lore.kernel.org
 help / color / mirror / Atom feed
* userspace rcu flavor improvements
@ 2012-11-17 16:16 Mathieu Desnoyers
  0 siblings, 0 replies; 5+ messages in thread
From: Mathieu Desnoyers @ 2012-11-17 16:16 UTC (permalink / raw)
  To: lttng-dev, rp, Paul E . McKenney, Lai Jiangshan, Alan Stern

Here are a couple of improvements for all userspace RCU flavors. Many
thanks to Alan Stern for his suggestions.

Patch 8/8 is only done for qsbr so far, and proposed as RFC. I'd like to
try and benchmark other approaches to concurrent grace periods too.

Feedback is welcome,

Thanks,

Mathieu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: userspace rcu flavor improvements
       [not found]   ` <20121119162307.GE2829@linux.vnet.ibm.com>
@ 2012-11-19 17:05     ` Mathieu Desnoyers
  0 siblings, 0 replies; 5+ messages in thread
From: Mathieu Desnoyers @ 2012-11-19 17:05 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: lttng-dev, rp, Alan Stern

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> On Mon, Nov 19, 2012 at 03:52:18PM +0800, Lai Jiangshan wrote:
> > On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> > > Here are a couple of improvements for all userspace RCU flavors. Many
> > > thanks to Alan Stern for his suggestions.
> > 
> > It makes urcu like SRCU. (sync_rcu = check zero + flip + check zero)
> > If I have time, I may port more SRCU code to urcu.
> 
> I am sure that this is obvious to everyone, but I cannot help restating
> it.  There is one important difference between user code and kernel code,
> though.  In the kernel, we track by CPU, so one of SRCU's big jobs is
> to track multiple tasks using the same CPU.  This opens the possibility
> of preemption, which is one of the things that complicates SRCU's design.
> 
> In contrast, user-mode RCU tracks tasks without multiplexing.  This
> allows simplifications that are similar to those that could be achieved
> in the kernel if we were willing to disable preemption across the entire
> SRCU read-side critical section.
> 
> So although I am all for user-mode RCU taking advantage of any technology
> we have at hand, we do need to be careful to avoid needless complexity.

Very good point! Indeed, when considering modifications to URCU, I will
be considering all of those elements:

- Added complexity (verification cost),
+ Speedup,
+ Lower latency,
+ Better scalability,
+ Lower power consumption,

So yes, I'm all for improving URCU synchronisation, but I might be
reluctant to pull modifications that increase complexity significantly
without very significant benefits.

> 
> > > Patch 8/8 is only done for qsbr so far, and proposed as RFC. I'd like to
> > > try and benchmark other approaches to concurrent grace periods too.
> 
> The concurrent grace periods are the big win, in my opinion.  ;-)

I've done some basic benchmarking on the approach taken by patch 8/8,
and it leads to very interesting scalability improvement and speedups,
e.g., on a 24-core AMD, with a write-heavy scenario (4 readers threads,
20 updater threads, each updater using synchronize_rcu()):

* Serialized grace periods :
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads  20251412728 nr_writes      1826331 nr_ops  20253239059

* Batched grace periods :
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads  15141994746 nr_writes      9382515 nr_ops  15151377261

For a 9382515/1826331 = 5.13 speedup

Of course, we can see that readers have slowed down, probably due to
increased update traffic, given there is no change to the read-side code
whatsoever.

Now let's see the penality of managing the stack for single-updater.
With 4 readers, single updater:

* Serialized grace periods :
./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  19240784755 nr_writes      2130839 nr_ops  19242915594

* Batched grace periods :
./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  19160162768 nr_writes      2253068 nr_ops  1916241583

2253068 vs 2137036 -> a couple of runs show that this difference is lost
in the noise for single updater.

So given that implementing a real "concurrent" approach for grace
periods would take a while and adds a lot of complexity, I am tempted to
merge the batching approach given it does not add complexity to the
synchronization algorithm, and already shows interesting speedup.
Moreover, we can easily remove batching if it appears not to be needed
in the future.

Thoughts ?

Thanks,

Mathieu


> 
> 							Thanx, Paul
> 
> > > Feedback is welcome,
> > > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > > 
> > 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: userspace rcu flavor improvements
       [not found] ` <50A9E532.6000706@cn.fujitsu.com>
  2012-11-19 15:18   ` Mathieu Desnoyers
@ 2012-11-19 16:23   ` Paul E. McKenney
       [not found]   ` <20121119162307.GE2829@linux.vnet.ibm.com>
  2 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2012-11-19 16:23 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Alan Stern, lttng-dev, rp, Mathieu Desnoyers

On Mon, Nov 19, 2012 at 03:52:18PM +0800, Lai Jiangshan wrote:
> On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> > Here are a couple of improvements for all userspace RCU flavors. Many
> > thanks to Alan Stern for his suggestions.
> 
> It makes urcu like SRCU. (sync_rcu = check zero + flip + check zero)
> If I have time, I may port more SRCU code to urcu.

I am sure that this is obvious to everyone, but I cannot help restating
it.  There is one important difference between user code and kernel code,
though.  In the kernel, we track by CPU, so one of SRCU's big jobs is
to track multiple tasks using the same CPU.  This opens the possibility
of preemption, which is one of the things that complicates SRCU's design.

In contrast, user-mode RCU tracks tasks without multiplexing.  This
allows simplifications that are similar to those that could be achieved
in the kernel if we were willing to disable preemption across the entire
SRCU read-side critical section.

So although I am all for user-mode RCU taking advantage of any technology
we have at hand, we do need to be careful to avoid needless complexity.

> > Patch 8/8 is only done for qsbr so far, and proposed as RFC. I'd like to
> > try and benchmark other approaches to concurrent grace periods too.

The concurrent grace periods are the big win, in my opinion.  ;-)

							Thanx, Paul

> > Feedback is welcome,
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: userspace rcu flavor improvements
       [not found] ` <50A9E532.6000706@cn.fujitsu.com>
@ 2012-11-19 15:18   ` Mathieu Desnoyers
  2012-11-19 16:23   ` Paul E. McKenney
       [not found]   ` <20121119162307.GE2829@linux.vnet.ibm.com>
  2 siblings, 0 replies; 5+ messages in thread
From: Mathieu Desnoyers @ 2012-11-19 15:18 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Paul E . McKenney, lttng-dev, rp, Alan Stern

* Lai Jiangshan (laijs@cn.fujitsu.com) wrote:
> On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> > Here are a couple of improvements for all userspace RCU flavors. Many
> > thanks to Alan Stern for his suggestions.
> 
> 
> It makes urcu like SRCU. (sync_rcu = check zero + flip + check zero)

Good to know :)

> If I have time, I may port more SRCU code to urcu.

That will certainly be interesting.

Thanks!

Mathieu

> 
> > 
> > Patch 8/8 is only done for qsbr so far, and proposed as RFC. I'd like to
> > try and benchmark other approaches to concurrent grace periods too.
> > 
> > Feedback is welcome,
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: userspace rcu flavor improvements
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
@ 2012-11-19  7:52 ` Lai Jiangshan
       [not found] ` <50A9E532.6000706@cn.fujitsu.com>
  1 sibling, 0 replies; 5+ messages in thread
From: Lai Jiangshan @ 2012-11-19  7:52 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Paul E . McKenney, lttng-dev, rp, Alan Stern

On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> Here are a couple of improvements for all userspace RCU flavors. Many
> thanks to Alan Stern for his suggestions.


It makes urcu like SRCU. (sync_rcu = check zero + flip + check zero)
If I have time, I may port more SRCU code to urcu.

> 
> Patch 8/8 is only done for qsbr so far, and proposed as RFC. I'd like to
> try and benchmark other approaches to concurrent grace periods too.
> 
> Feedback is welcome,
> 
> Thanks,
> 
> Mathieu
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-11-19 17:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-17 16:16 userspace rcu flavor improvements Mathieu Desnoyers
     [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
2012-11-19  7:52 ` Lai Jiangshan
     [not found] ` <50A9E532.6000706@cn.fujitsu.com>
2012-11-19 15:18   ` Mathieu Desnoyers
2012-11-19 16:23   ` Paul E. McKenney
     [not found]   ` <20121119162307.GE2829@linux.vnet.ibm.com>
2012-11-19 17:05     ` Mathieu Desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.