All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Limit size of route cache hash table
@ 2009-04-27  3:04 Anton Blanchard
  2009-04-27  5:17 ` Eric Dumazet
  2009-04-27  6:11 ` David Miller
  0 siblings, 2 replies; 11+ messages in thread
From: Anton Blanchard @ 2009-04-27  3:04 UTC (permalink / raw)
  To: netdev


Right now we have no upper limit on the size of the route cache hash table.
On a 128GB POWER6 box it ends up as 32MB:

    IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)

It would be nice to cap this just for memory consumption reasons, but this
massive hashtable also causes a significant spike when measuring OS
jitter.

With a 32MB hashtable and 4 million entries, rt_worker_func is taking
5 ms to complete. On another system with more memory it's taking 14 ms.
Even though rt_worker_func does call cond_sched() to limit its impact,
in an HPC environment we want to keep all sources of OS jitter to a minimum.

With the patch applied we limit the number of entries to 64k which
can still be overriden by using the rt_entries boot option:

    IP route cache hash table entries: 65536 (order: 3, 524288 bytes)

With this patch rt_worker_func takes 0.060 ms on the same system.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Is 64k a reasonable default for the limit?

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c40debe..5064c26 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3397,7 +3397,7 @@ int __init ip_rt_init(void)
 					0,
 					&rt_hash_log,
 					&rt_hash_mask,
-					0);
+					rhash_entries ? 0 : 64 * 1024);
 	memset(rt_hash_table, 0, (rt_hash_mask + 1) * sizeof(struct rt_hash_bucket));
 	rt_hash_lock_init();
 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27  3:04 [PATCH] Limit size of route cache hash table Anton Blanchard
@ 2009-04-27  5:17 ` Eric Dumazet
  2009-04-27  5:47   ` Anton Blanchard
  2009-04-27  6:11 ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2009-04-27  5:17 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: netdev

Anton Blanchard a écrit :
> Right now we have no upper limit on the size of the route cache hash table.
> On a 128GB POWER6 box it ends up as 32MB:
> 
>     IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)
> 
> It would be nice to cap this just for memory consumption reasons, but this
> massive hashtable also causes a significant spike when measuring OS
> jitter.
> 
> With a 32MB hashtable and 4 million entries, rt_worker_func is taking
> 5 ms to complete. On another system with more memory it's taking 14 ms.
> Even though rt_worker_func does call cond_sched() to limit its impact,
> in an HPC environment we want to keep all sources of OS jitter to a minimum.

Then boot with rhash_entries = 8000 ?
or 
echo 1 >/proc/sys/net/ipv4/route/gc_interval
> 
> With the patch applied we limit the number of entries to 64k which
> can still be overriden by using the rt_entries boot option:
> 
>     IP route cache hash table entries: 65536 (order: 3, 524288 bytes)
> 
> With this patch rt_worker_func takes 0.060 ms on the same system.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
> 
> Is 64k a reasonable default for the limit?
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index c40debe..5064c26 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -3397,7 +3397,7 @@ int __init ip_rt_init(void)
>  					0,
>  					&rt_hash_log,
>  					&rt_hash_mask,
> -					0);
> +					rhash_entries ? 0 : 64 * 1024);
>  	memset(rt_hash_table, 0, (rt_hash_mask + 1) * sizeof(struct rt_hash_bucket));
>  	rt_hash_lock_init();
>  
> 


Sorry this limit is too small. Many of my customer machines would collapse.

It would be smart to eventually change ip_rt_gc_interval from 60 
to 1 second for such machines ? Dividing 5 ms per 60 gives 83 us, which
is correct. 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27  5:17 ` Eric Dumazet
@ 2009-04-27  5:47   ` Anton Blanchard
  2009-04-27  6:12     ` Eric Dumazet
  2009-04-27  6:35     ` David Miller
  0 siblings, 2 replies; 11+ messages in thread
From: Anton Blanchard @ 2009-04-27  5:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

 
Hi,

> Then boot with rhash_entries = 8000 ?
> or 
> echo 1 >/proc/sys/net/ipv4/route/gc_interval

Yes we are hardwiring it for now.

> Sorry this limit is too small. Many of my customer machines would collapse.

So what would a reasonable upper limit be? Surely we should cap it at some
point?

Anton

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27  3:04 [PATCH] Limit size of route cache hash table Anton Blanchard
  2009-04-27  5:17 ` Eric Dumazet
@ 2009-04-27  6:11 ` David Miller
  1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2009-04-27  6:11 UTC (permalink / raw)
  To: anton; +Cc: netdev

From: Anton Blanchard <anton@samba.org>
Date: Mon, 27 Apr 2009 13:04:33 +1000

> Right now we have no upper limit on the size of the route cache hash table.
> On a 128GB POWER6 box it ends up as 32MB:
> 
>     IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)

Pretty reasonable size for a machine with that much ram.  In fact
perhaps not large enough depending upon your workload. :-)

If it's not suitable for you, as Eric Dumazet explained, pass the
appropriate option on the kernel command line.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27  5:47   ` Anton Blanchard
@ 2009-04-27  6:12     ` Eric Dumazet
  2009-04-27  6:36       ` David Miller
  2009-04-27  6:35     ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2009-04-27  6:12 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: netdev

Anton Blanchard a écrit :
>  
> Hi,
> 
>> Then boot with rhash_entries = 8000 ?
>> or 
>> echo 1 >/proc/sys/net/ipv4/route/gc_interval
> 
> Yes we are hardwiring it for now.
> 
>> Sorry this limit is too small. Many of my customer machines would collapse.
> 
> So what would a reasonable upper limit be? Surely we should cap it at some
> point?
> 

A similar patch was done for the size of TCP hash table. It was something
like 512 * 1024 if I remember well. IMHO this same value would be fine for
IP route cache.

Yes, this was commit :

commit 0ccfe61803ad24f1c0fe5e1f5ce840ff0f3d9660
Author: Jean Delvare <jdelvare@suse.de>
Date:   Tue Oct 30 00:59:25 2007 -0700

    [TCP]: Saner thash_entries default with much memory.

    On systems with a very large amount of memory, the heuristics in
    alloc_large_system_hash() result in a very large TCP established hash
    table: 16 millions of entries for a 128 GB ia64 system. This makes
    reading from /proc/net/tcp pretty slow (well over a second) and as a
    result netstat is slow on these machines. I know that /proc/net/tcp is
    deprecated in favor of tcp_diag, however at the moment netstat only
    knows of the former.

    I am skeptical that such a large TCP established hash is often needed.
    Just because a system has a lot of memory doesn't imply that it will
    have several millions of concurrent TCP connections. Thus I believe
    that we should put an arbitrary high limit to the size of the TCP
    established hash by default. Users who really need a bigger hash can
    always use the thash_entries boot parameter to get more.

    I propose 2 millions of entries as the arbitrary high limit. This
    makes /proc/net/tcp reasonably fast on the system in question (0.2 s)
    while being still large enough for me to be confident that network
    performance won't suffer.

    This is just one way to limit the hash size, there are others; I am not
    familiar enough with the TCP code to decide which is best. Thus, I
    would welcome the proposals of alternatives.

    [ 2 million is still too large, thus I've modified the limit in the
      change to be '512 * 1024'. -DaveM ]

    Signed-off-by: Jean Delvare <jdelvare@suse.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>



Thanks


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27  5:47   ` Anton Blanchard
  2009-04-27  6:12     ` Eric Dumazet
@ 2009-04-27  6:35     ` David Miller
  1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2009-04-27  6:35 UTC (permalink / raw)
  To: anton; +Cc: dada1, netdev

From: Anton Blanchard <anton@samba.org>
Date: Mon, 27 Apr 2009 15:47:02 +1000

>> Sorry this limit is too small. Many of my customer machines would collapse.
> 
> So what would a reasonable upper limit be? Surely we should cap it at some
> point?

32MB is even small for some people.

Like Eric, I think the current cap is just fine.  For special
situations, tweak settings :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27  6:12     ` Eric Dumazet
@ 2009-04-27  6:36       ` David Miller
  2009-04-27  6:47         ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2009-04-27  6:36 UTC (permalink / raw)
  To: dada1; +Cc: anton, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Mon, 27 Apr 2009 08:12:21 +0200

> A similar patch was done for the size of TCP hash table. It was something
> like 512 * 1024 if I remember well. IMHO this same value would be fine for
> IP route cache.
> 
> Yes, this was commit :

Fair enough, I'd be OK with this if you'd be too :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27  6:36       ` David Miller
@ 2009-04-27  6:47         ` Eric Dumazet
  2009-04-27 11:44           ` Anton Blanchard
  2009-04-27 11:50           ` Anton Blanchard
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2009-04-27  6:47 UTC (permalink / raw)
  To: David Miller; +Cc: anton, netdev

David Miller a écrit :
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Mon, 27 Apr 2009 08:12:21 +0200
> 
>> A similar patch was done for the size of TCP hash table. It was something
>> like 512 * 1024 if I remember well. IMHO this same value would be fine for
>> IP route cache.
>>
>> Yes, this was commit :
> 
> Fair enough, I'd be OK with this if you'd be too :)

Yes, as setups needing more than 512.000 slots (and 2 million entries with an
average of 4 entries per slot) are certainly able to boot with an override.

Anton, please resubmit with this 512 k limit, and my :

Acked-by: Eric Dumazet <dada1@cosmosbay.com>

 thank you.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27  6:47         ` Eric Dumazet
@ 2009-04-27 11:44           ` Anton Blanchard
  2009-04-27 11:50           ` Anton Blanchard
  1 sibling, 0 replies; 11+ messages in thread
From: Anton Blanchard @ 2009-04-27 11:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev


> Yes, as setups needing more than 512.000 slots (and 2 million entries with an
> average of 4 entries per slot) are certainly able to boot with an override.
> 
> Anton, please resubmit with this 512 k limit, and my :
> 
> Acked-by: Eric Dumazet <dada1@cosmosbay.com>

Thanks Eric, I'll respin with the limit set at 512k.

Anton

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27  6:47         ` Eric Dumazet
  2009-04-27 11:44           ` Anton Blanchard
@ 2009-04-27 11:50           ` Anton Blanchard
  2009-04-27 12:40             ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Anton Blanchard @ 2009-04-27 11:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev


Right now we have no upper limit on the size of the route cache hash table.
On a 128GB POWER6 box it ends up as 32MB:

    IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)

It would be nice to cap this for memory consumption reasons, but a massive
hashtable also causes a significant spike when measuring OS jitter.

With a 32MB hashtable and 4 million entries, rt_worker_func is taking
5 ms to complete. On another system with more memory it's taking 14 ms.
Even though rt_worker_func does call cond_sched() to limit its impact,
in an HPC environment we want to keep all sources of OS jitter to a minimum.

With the patch applied we limit the number of entries to 512k which
can still be overriden by using the rt_entries boot option:

    IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)

With this patch rt_worker_func now takes 0.460 ms on the same system.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
---

Index: linux-2.6/net/ipv4/route.c
===================================================================
--- linux-2.6.orig/net/ipv4/route.c	2009-04-27 12:48:18.000000000 +1000
+++ linux-2.6/net/ipv4/route.c	2009-04-27 17:05:46.000000000 +1000
@@ -3397,7 +3397,7 @@
 					0,
 					&rt_hash_log,
 					&rt_hash_mask,
-					0);
+					rhash_entries ? 0 : 512 * 1024);
 	memset(rt_hash_table, 0, (rt_hash_mask + 1) * sizeof(struct rt_hash_bucket));
 	rt_hash_lock_init();
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Limit size of route cache hash table
  2009-04-27 11:50           ` Anton Blanchard
@ 2009-04-27 12:40             ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2009-04-27 12:40 UTC (permalink / raw)
  To: anton; +Cc: dada1, netdev

From: Anton Blanchard <anton@samba.org>
Date: Mon, 27 Apr 2009 21:50:07 +1000

> Right now we have no upper limit on the size of the route cache hash table.
> On a 128GB POWER6 box it ends up as 32MB:
> 
>     IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)
> 
> It would be nice to cap this for memory consumption reasons, but a massive
> hashtable also causes a significant spike when measuring OS jitter.
> 
> With a 32MB hashtable and 4 million entries, rt_worker_func is taking
> 5 ms to complete. On another system with more memory it's taking 14 ms.
> Even though rt_worker_func does call cond_sched() to limit its impact,
> in an HPC environment we want to keep all sources of OS jitter to a minimum.
> 
> With the patch applied we limit the number of entries to 512k which
> can still be overriden by using the rt_entries boot option:
> 
>     IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)
> 
> With this patch rt_worker_func now takes 0.460 ms on the same system.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> Acked-by: Eric Dumazet <dada1@cosmosbay.com>

Applied, thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-04-27 12:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-27  3:04 [PATCH] Limit size of route cache hash table Anton Blanchard
2009-04-27  5:17 ` Eric Dumazet
2009-04-27  5:47   ` Anton Blanchard
2009-04-27  6:12     ` Eric Dumazet
2009-04-27  6:36       ` David Miller
2009-04-27  6:47         ` Eric Dumazet
2009-04-27 11:44           ` Anton Blanchard
2009-04-27 11:50           ` Anton Blanchard
2009-04-27 12:40             ` David Miller
2009-04-27  6:35     ` David Miller
2009-04-27  6:11 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.