All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: use synchronize_rcu_expedited()
@ 2011-05-24  9:07 Eric Dumazet
  2011-05-24 15:44 ` Paul E. McKenney
  2011-05-24 17:28 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2011-05-24  9:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Paul E. McKenney

synchronize_rcu() is very slow in various situations (HZ=100,
CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)

Extract from my (mostly idle) 8 core machine :

 synchronize_rcu() in 99985 us
 synchronize_rcu() in 79982 us
 synchronize_rcu() in 87612 us
 synchronize_rcu() in 79827 us
 synchronize_rcu() in 109860 us
 synchronize_rcu() in 98039 us
 synchronize_rcu() in 89841 us
 synchronize_rcu() in 79842 us
 synchronize_rcu() in 80151 us
 synchronize_rcu() in 119833 us
 synchronize_rcu() in 99858 us
 synchronize_rcu() in 73999 us
 synchronize_rcu() in 79855 us
 synchronize_rcu() in 79853 us


When we hold RTNL mutex, we would like to spend some cpu cycles but not
block too long other processes waiting for this mutex.

We also want to setup/dismantle network features as fast as possible at
boot/shutdown time.

This patch makes synchronize_net() call the expedited version if RTNL is
locked.

synchronize_rcu_expedited() typical delay is about 20 us on my machine.

 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 20 us
 synchronize_rcu_expedited() in 16 us
 synchronize_rcu_expedited() in 20 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Ben Greear <greearb@candelatech.com>
---
 net/core/dev.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index bcb05cb..ec11d75 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5954,7 +5954,10 @@ EXPORT_SYMBOL(free_netdev);
 void synchronize_net(void)
 {
 	might_sleep();
-	synchronize_rcu();
+	if (rtnl_is_locked())
+		synchronize_rcu_expedited();
+	else
+		synchronize_rcu();
 }
 EXPORT_SYMBOL(synchronize_net);
 



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: use synchronize_rcu_expedited()
  2011-05-24  9:07 [PATCH] net: use synchronize_rcu_expedited() Eric Dumazet
@ 2011-05-24 15:44 ` Paul E. McKenney
  2011-05-24 15:52   ` Eric Dumazet
  2011-05-24 17:28 ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2011-05-24 15:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, May 24, 2011 at 11:07:32AM +0200, Eric Dumazet wrote:
> synchronize_rcu() is very slow in various situations (HZ=100,
> CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)
> 
> Extract from my (mostly idle) 8 core machine :
> 
>  synchronize_rcu() in 99985 us
>  synchronize_rcu() in 79982 us
>  synchronize_rcu() in 87612 us
>  synchronize_rcu() in 79827 us
>  synchronize_rcu() in 109860 us
>  synchronize_rcu() in 98039 us
>  synchronize_rcu() in 89841 us
>  synchronize_rcu() in 79842 us
>  synchronize_rcu() in 80151 us
>  synchronize_rcu() in 119833 us
>  synchronize_rcu() in 99858 us
>  synchronize_rcu() in 73999 us
>  synchronize_rcu() in 79855 us
>  synchronize_rcu() in 79853 us
> 
> 
> When we hold RTNL mutex, we would like to spend some cpu cycles but not
> block too long other processes waiting for this mutex.
> 
> We also want to setup/dismantle network features as fast as possible at
> boot/shutdown time.
> 
> This patch makes synchronize_net() call the expedited version if RTNL is
> locked.
> 
> synchronize_rcu_expedited() typical delay is about 20 us on my machine.
> 
>  synchronize_rcu_expedited() in 18 us
>  synchronize_rcu_expedited() in 18 us
>  synchronize_rcu_expedited() in 18 us
>  synchronize_rcu_expedited() in 18 us
>  synchronize_rcu_expedited() in 20 us
>  synchronize_rcu_expedited() in 16 us
>  synchronize_rcu_expedited() in 20 us
>  synchronize_rcu_expedited() in 18 us
>  synchronize_rcu_expedited() in 18 us

Cool!!!

Just out of curiosity, how many CPUs does your system have?

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> CC: Ben Greear <greearb@candelatech.com>
> ---
>  net/core/dev.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index bcb05cb..ec11d75 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5954,7 +5954,10 @@ EXPORT_SYMBOL(free_netdev);
>  void synchronize_net(void)
>  {
>  	might_sleep();
> -	synchronize_rcu();
> +	if (rtnl_is_locked())
> +		synchronize_rcu_expedited();
> +	else
> +		synchronize_rcu();
>  }
>  EXPORT_SYMBOL(synchronize_net);
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: use synchronize_rcu_expedited()
  2011-05-24 15:44 ` Paul E. McKenney
@ 2011-05-24 15:52   ` Eric Dumazet
  2011-05-24 19:24     ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2011-05-24 15:52 UTC (permalink / raw)
  To: paulmck; +Cc: David Miller, netdev

Le mardi 24 mai 2011 à 08:44 -0700, Paul E. McKenney a écrit :
> On Tue, May 24, 2011 at 11:07:32AM +0200, Eric Dumazet wrote:
> > synchronize_rcu() is very slow in various situations (HZ=100,
> > CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)
> > 
> > Extract from my (mostly idle) 8 core machine :
> > 
> >  synchronize_rcu() in 99985 us
> >  synchronize_rcu() in 79982 us
> >  synchronize_rcu() in 87612 us
> >  synchronize_rcu() in 79827 us
> >  synchronize_rcu() in 109860 us
> >  synchronize_rcu() in 98039 us
> >  synchronize_rcu() in 89841 us
> >  synchronize_rcu() in 79842 us
> >  synchronize_rcu() in 80151 us
> >  synchronize_rcu() in 119833 us
> >  synchronize_rcu() in 99858 us
> >  synchronize_rcu() in 73999 us
> >  synchronize_rcu() in 79855 us
> >  synchronize_rcu() in 79853 us
> > 
> > 
> > When we hold RTNL mutex, we would like to spend some cpu cycles but not
> > block too long other processes waiting for this mutex.
> > 
> > We also want to setup/dismantle network features as fast as possible at
> > boot/shutdown time.
> > 
> > This patch makes synchronize_net() call the expedited version if RTNL is
> > locked.
> > 
> > synchronize_rcu_expedited() typical delay is about 20 us on my machine.
> > 
> >  synchronize_rcu_expedited() in 18 us
> >  synchronize_rcu_expedited() in 18 us
> >  synchronize_rcu_expedited() in 18 us
> >  synchronize_rcu_expedited() in 18 us
> >  synchronize_rcu_expedited() in 20 us
> >  synchronize_rcu_expedited() in 16 us
> >  synchronize_rcu_expedited() in 20 us
> >  synchronize_rcu_expedited() in 18 us
> >  synchronize_rcu_expedited() in 18 us
> 
> Cool!!!
> 
> Just out of curiosity, how many CPUs does your system have?

16 (2x4x2)  [ processor.max_cstate=1 ]

I am now trying to optimize rcu_barrier(), if you have an idea to get an
expedited version as well ?

We can see in following trace 3 groups, spaced by one jiffie (HZ=100)

Maybe we can avoid sending a call_rcu() to a cpu that has no pending rcu
work ?

[  835.189996] cpu0 synchronize_rcu_expedited() in 30 us 
   -> begin rcu_barrier() immediately
[  835.259702] cpu15 rcu_barrier_callback()
[  835.259705] cpu14 rcu_barrier_callback()
[  835.259708] cpu7 rcu_barrier_callback()
[  835.259711] cpu12 rcu_barrier_callback()
[  835.259714] cpu8 rcu_barrier_callback()
[  835.259716] cpu1 rcu_barrier_callback()
[  835.259719] cpu0 rcu_barrier_callback()

[  835.269691] cpu13 rcu_barrier_callback()
[  835.269695] cpu11 rcu_barrier_callback()
[  835.269698] cpu5 rcu_barrier_callback()
[  835.269700] cpu6 rcu_barrier_callback()
[  835.269702] cpu10 rcu_barrier_callback()
[  835.269705] cpu3 rcu_barrier_callback()
[  835.269707] cpu2 rcu_barrier_callback()

[  835.279687] cpu4 rcu_barrier_callback()
[  835.279689] cpu9 rcu_barrier_callback()
[  835.279744] cpu0 rcu_barrier() in 89499 us

Thanks



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: use synchronize_rcu_expedited()
  2011-05-24  9:07 [PATCH] net: use synchronize_rcu_expedited() Eric Dumazet
  2011-05-24 15:44 ` Paul E. McKenney
@ 2011-05-24 17:28 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2011-05-24 17:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, paulmck

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 24 May 2011 11:07:32 +0200

> synchronize_rcu() is very slow in various situations (HZ=100,
> CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)
> 
> Extract from my (mostly idle) 8 core machine :
 ...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: use synchronize_rcu_expedited()
  2011-05-24 15:52   ` Eric Dumazet
@ 2011-05-24 19:24     ` Paul E. McKenney
  2011-05-24 19:44       ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2011-05-24 19:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, May 24, 2011 at 05:52:44PM +0200, Eric Dumazet wrote:
> Le mardi 24 mai 2011 à 08:44 -0700, Paul E. McKenney a écrit :
> > On Tue, May 24, 2011 at 11:07:32AM +0200, Eric Dumazet wrote:
> > > synchronize_rcu() is very slow in various situations (HZ=100,
> > > CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)
> > > 
> > > Extract from my (mostly idle) 8 core machine :
> > > 
> > >  synchronize_rcu() in 99985 us
> > >  synchronize_rcu() in 79982 us
> > >  synchronize_rcu() in 87612 us
> > >  synchronize_rcu() in 79827 us
> > >  synchronize_rcu() in 109860 us
> > >  synchronize_rcu() in 98039 us
> > >  synchronize_rcu() in 89841 us
> > >  synchronize_rcu() in 79842 us
> > >  synchronize_rcu() in 80151 us
> > >  synchronize_rcu() in 119833 us
> > >  synchronize_rcu() in 99858 us
> > >  synchronize_rcu() in 73999 us
> > >  synchronize_rcu() in 79855 us
> > >  synchronize_rcu() in 79853 us
> > > 
> > > 
> > > When we hold RTNL mutex, we would like to spend some cpu cycles but not
> > > block too long other processes waiting for this mutex.
> > > 
> > > We also want to setup/dismantle network features as fast as possible at
> > > boot/shutdown time.
> > > 
> > > This patch makes synchronize_net() call the expedited version if RTNL is
> > > locked.
> > > 
> > > synchronize_rcu_expedited() typical delay is about 20 us on my machine.
> > > 
> > >  synchronize_rcu_expedited() in 18 us
> > >  synchronize_rcu_expedited() in 18 us
> > >  synchronize_rcu_expedited() in 18 us
> > >  synchronize_rcu_expedited() in 18 us
> > >  synchronize_rcu_expedited() in 20 us
> > >  synchronize_rcu_expedited() in 16 us
> > >  synchronize_rcu_expedited() in 20 us
> > >  synchronize_rcu_expedited() in 18 us
> > >  synchronize_rcu_expedited() in 18 us
> > 
> > Cool!!!
> > 
> > Just out of curiosity, how many CPUs does your system have?
> 
> 16 (2x4x2)  [ processor.max_cstate=1 ]
> 
> I am now trying to optimize rcu_barrier(), if you have an idea to get an
> expedited version as well ?
> 
> We can see in following trace 3 groups, spaced by one jiffie (HZ=100)
> 
> Maybe we can avoid sending a call_rcu() to a cpu that has no pending rcu
> work ?

Might make sense, though most of the gains would need to come from
kicking the grace-period machinery hard in order to make it go faster.

Interesting -- I will give this some thought.

							Thanx, Paul

> [  835.189996] cpu0 synchronize_rcu_expedited() in 30 us 
>    -> begin rcu_barrier() immediately
> [  835.259702] cpu15 rcu_barrier_callback()
> [  835.259705] cpu14 rcu_barrier_callback()
> [  835.259708] cpu7 rcu_barrier_callback()
> [  835.259711] cpu12 rcu_barrier_callback()
> [  835.259714] cpu8 rcu_barrier_callback()
> [  835.259716] cpu1 rcu_barrier_callback()
> [  835.259719] cpu0 rcu_barrier_callback()
> 
> [  835.269691] cpu13 rcu_barrier_callback()
> [  835.269695] cpu11 rcu_barrier_callback()
> [  835.269698] cpu5 rcu_barrier_callback()
> [  835.269700] cpu6 rcu_barrier_callback()
> [  835.269702] cpu10 rcu_barrier_callback()
> [  835.269705] cpu3 rcu_barrier_callback()
> [  835.269707] cpu2 rcu_barrier_callback()
> 
> [  835.279687] cpu4 rcu_barrier_callback()
> [  835.279689] cpu9 rcu_barrier_callback()
> [  835.279744] cpu0 rcu_barrier() in 89499 us
> 
> Thanks
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: use synchronize_rcu_expedited()
  2011-05-24 19:24     ` Paul E. McKenney
@ 2011-05-24 19:44       ` Eric Dumazet
  2011-05-24 19:56         ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2011-05-24 19:44 UTC (permalink / raw)
  To: paulmck; +Cc: David Miller, netdev

Le mardi 24 mai 2011 à 12:24 -0700, Paul E. McKenney a écrit :

> Might make sense, though most of the gains would need to come from
> kicking the grace-period machinery hard in order to make it go faster.
> 
> Interesting -- I will give this some thought.
> 

I am working on a final step, using a workqueue so that the
rcu_barrier() is not done under RTNL, so it wont be anymore a blocking
point to dismantle hundred of devices per second...

Thanks



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: use synchronize_rcu_expedited()
  2011-05-24 19:44       ` Eric Dumazet
@ 2011-05-24 19:56         ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2011-05-24 19:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, May 24, 2011 at 09:44:45PM +0200, Eric Dumazet wrote:
> Le mardi 24 mai 2011 à 12:24 -0700, Paul E. McKenney a écrit :
> 
> > Might make sense, though most of the gains would need to come from
> > kicking the grace-period machinery hard in order to make it go faster.
> > 
> > Interesting -- I will give this some thought.
> 
> I am working on a final step, using a workqueue so that the
> rcu_barrier() is not done under RTNL, so it wont be anymore a blocking
> point to dismantle hundred of devices per second...

OK, I will keep rcu_barrier_expedited() on the "might be useful list",
but will keep to the current plan: finishing up rough edges on RCU
priority boosting, merging SRCU into TREE_RCU and TINY_RCU, and so forth.

But let me know if you do need it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-05-24 19:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-24  9:07 [PATCH] net: use synchronize_rcu_expedited() Eric Dumazet
2011-05-24 15:44 ` Paul E. McKenney
2011-05-24 15:52   ` Eric Dumazet
2011-05-24 19:24     ` Paul E. McKenney
2011-05-24 19:44       ` Eric Dumazet
2011-05-24 19:56         ` Paul E. McKenney
2011-05-24 17:28 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.