linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 16-CPU #s for lockfree rtcache (rt_rcu)
  2002-05-17 13:51 16-CPU #s for lockfree rtcache (rt_rcu) Dipankar Sarma
@ 2002-05-17 13:49 ` David S. Miller
  2002-05-17 16:14   ` Dipankar Sarma
  0 siblings, 1 reply; 7+ messages in thread
From: David S. Miller @ 2002-05-17 13:49 UTC (permalink / raw)
  To: dipankar; +Cc: linux-kernel

   From: Dipankar Sarma <dipankar@in.ibm.com>
   Date: Fri, 17 May 2002 19:21:16 +0530
   
   2.5.3 : ip_route_output_key [c01bab8c]: 12166
   2.5.3+rt_rcu : ip_route_output_key [c01bb084]: 6027
   
Thanks for doing the testing.  Are you able to do this
test on some 4 or 8 processor non-NUMA system?

Basically halfing the profile hits for this function
is wonderful and I'd love to see how much of this translates to a
non-NUMA system.

   I have seen moderately significant profile counts
   for ip_route_input() in preliminary webserver benchmark runs.
   It is not however clear to me that bucket lock cache line
   bouncing is the reason behind it. That one needs more investigation.

This is where most of the routing heavy work is done on
a web server, so this doesn't surprise me.  Once packet
is input and routed, we have destination and just grab a reference to
and use it for output back to that remote host.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* 16-CPU #s for lockfree rtcache (rt_rcu)
@ 2002-05-17 13:51 Dipankar Sarma
  2002-05-17 13:49 ` David S. Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Dipankar Sarma @ 2002-05-17 13:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: davem

As promised, here is the kernprof data from the simulated
test suggested by Dave. The test uses 32 processes and dest
addresses for rtcache lookup are random, changing after every 5 packets.
The measurements were done in a 16 CPU NUMA-Q.

2.5.3 : ip_route_output_key [c01bab8c]: 12166
2.5.3+rt_rcu : ip_route_output_key [c01bb084]: 6027

I have seen moderately significant profile counts
for ip_route_input() in preliminary webserver benchmark runs.
It is not however clear to me that bucket lock cache line
bouncing is the reason behind it. That one needs more investigation.

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 16-CPU #s for lockfree rtcache (rt_rcu)
  2002-05-17 13:49 ` David S. Miller
@ 2002-05-17 16:14   ` Dipankar Sarma
  2002-05-17 16:46     ` David S. Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Dipankar Sarma @ 2002-05-17 16:14 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

On Fri, May 17, 2002 at 06:49:21AM -0700, David S. Miller wrote:
>    From: Dipankar Sarma <dipankar@in.ibm.com>
>    Date: Fri, 17 May 2002 19:21:16 +0530
>    
>    2.5.3 : ip_route_output_key [c01bab8c]: 12166
>    2.5.3+rt_rcu : ip_route_output_key [c01bb084]: 6027
>    
> Thanks for doing the testing.  Are you able to do this
> test on some 4 or 8 processor non-NUMA system?

Yes, but may not have been the same test. We have been doing various 
configurations for this test. One fallout of using large number of 
dest addresses is that we have frequent neighbor table garbage collection 
which results in a lot of lock contentions. By slowing down
the packet rate and/or increasing the gc threshold, we can
avoid this. How realistic is this ? If we avoid frequent
gc, we see better gains in route lookup. With frequent gc,
the speedup was of about 22% for an 8 cpu SMP, IIRC. I will rerun
the tests tomorrow or monday to get both sets of numbers for 8-cpu
SMP.

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 16-CPU #s for lockfree rtcache (rt_rcu)
  2002-05-17 16:14   ` Dipankar Sarma
@ 2002-05-17 16:46     ` David S. Miller
  2002-05-17 19:25       ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: David S. Miller @ 2002-05-17 16:46 UTC (permalink / raw)
  To: dipankar; +Cc: linux-kernel

   From: Dipankar Sarma <dipankar@in.ibm.com>
   Date: Fri, 17 May 2002 21:44:33 +0530
   
   I will rerun the tests tomorrow or monday to get both sets of
   numbers for 8-cpu  SMP.

Provide the data, it will be interesting.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 16-CPU #s for lockfree rtcache (rt_rcu)
  2002-05-17 16:46     ` David S. Miller
@ 2002-05-17 19:25       ` Andi Kleen
  2002-05-17 19:25         ` David S. Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2002-05-17 19:25 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, dipankar

"David S. Miller" <davem@redhat.com> writes:

>    From: Dipankar Sarma <dipankar@in.ibm.com>
>    Date: Fri, 17 May 2002 21:44:33 +0530
>    
>    I will rerun the tests tomorrow or monday to get both sets of
>    numbers for 8-cpu  SMP.
> 
> Provide the data, it will be interesting.

I bet the numbers would be much better if the x86 
do_gettimeofday() was converted to a lockless version first ...
Currently it is bouncing around its readlock for every incoming packet.

-Andi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 16-CPU #s for lockfree rtcache (rt_rcu)
  2002-05-17 19:25       ` Andi Kleen
@ 2002-05-17 19:25         ` David S. Miller
  2002-05-17 20:33           ` Dipankar Sarma
  0 siblings, 1 reply; 7+ messages in thread
From: David S. Miller @ 2002-05-17 19:25 UTC (permalink / raw)
  To: ak; +Cc: linux-kernel, dipankar

   From: Andi Kleen <ak@muc.de>
   Date: 17 May 2002 21:25:16 +0200

   "David S. Miller" <davem@redhat.com> writes:
   
   > Provide the data, it will be interesting.
   
   I bet the numbers would be much better if the x86 
   do_gettimeofday() was converted to a lockless version first ...
   Currently it is bouncing around its readlock for every incoming packet.

That is true.  But right now we are trying to analyze the effects of
his patch all by itself.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 16-CPU #s for lockfree rtcache (rt_rcu)
  2002-05-17 19:25         ` David S. Miller
@ 2002-05-17 20:33           ` Dipankar Sarma
  0 siblings, 0 replies; 7+ messages in thread
From: Dipankar Sarma @ 2002-05-17 20:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, linux-kernel

On Fri, May 17, 2002 at 12:25:19PM -0700, David S. Miller wrote:
>    From: Andi Kleen <ak@muc.de>
>    Date: 17 May 2002 21:25:16 +0200
> 
>    "David S. Miller" <davem@redhat.com> writes:
>    
>    > Provide the data, it will be interesting.
>    
>    I bet the numbers would be much better if the x86 
>    do_gettimeofday() was converted to a lockless version first ...
>    Currently it is bouncing around its readlock for every incoming packet.
> 
> That is true.  But right now we are trying to analyze the effects of
> his patch all by itself.

Yes, that is a another problem needs addressing.

BTW, do_gettimeofday() also shows up moderately significant in profile
of 8-CPU webserver benchmark. I will address xtime_lock separately.

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-05-17 20:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-17 13:51 16-CPU #s for lockfree rtcache (rt_rcu) Dipankar Sarma
2002-05-17 13:49 ` David S. Miller
2002-05-17 16:14   ` Dipankar Sarma
2002-05-17 16:46     ` David S. Miller
2002-05-17 19:25       ` Andi Kleen
2002-05-17 19:25         ` David S. Miller
2002-05-17 20:33           ` Dipankar Sarma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).