All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 14/18] Netfilter:  Conntrack Hash Allocation using __get_free_pages
@ 2005-01-05  3:36 Rusty Russell
  2005-01-07 21:26 ` Martin Josefsson
  0 siblings, 1 reply; 6+ messages in thread
From: Rusty Russell @ 2005-01-05  3:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Harald Welte, Netfilter development mailing list

Name: Conntrack Hash Allocation using __get_free_pages
Author: Andi Kleen
Status: Tested under nfsim

Here is a patch for 2.4 that just makes it use get_free_pages to
test the TLB theory. Another obvious improvement would be to not
use list_heads for the hash table buckets - a single pointer would
likely suffice and it would cut the hash table in half, saving
cache, TLB and memory.

Index: linux-2.6.10-bk6-Netfilter/net/ipv4/netfilter/ip_conntrack_core.c
===================================================================
--- linux-2.6.10-bk6-Netfilter.orig/net/ipv4/netfilter/ip_conntrack_core.c	2005-01-04 22:11:53.614947576 +1100
+++ linux-2.6.10-bk6-Netfilter/net/ipv4/netfilter/ip_conntrack_core.c	2005-01-04 22:25:56.082872984 +1100
@@ -75,6 +75,7 @@
 struct ip_conntrack ip_conntrack_untracked;
 unsigned int ip_ct_log_invalid;
 static LIST_HEAD(unconfirmed);
+static int ip_conntrack_vmalloc;
 
 DEFINE_PER_CPU(struct ip_conntrack_stat, ip_conntrack_stat);
 
@@ -1309,6 +1310,16 @@
 	return 1;
 }
 
+static void free_conntrack_hash(void)
+{
+	if (ip_conntrack_vmalloc)
+		vfree(ip_conntrack_hash);
+	else
+		free_pages((unsigned long)ip_conntrack_hash, 
+			   get_order(sizeof(struct list_head)
+				     * ip_conntrack_htable_size));
+}
+
 /* Mishearing the voices in his head, our hero wonders how he's
    supposed to kill the mall. */
 void ip_conntrack_cleanup(void)
@@ -1328,7 +1339,7 @@
 
 	kmem_cache_destroy(ip_conntrack_cachep);
 	kmem_cache_destroy(ip_conntrack_expect_cachep);
-	vfree(ip_conntrack_hash);
+	free_conntrack_hash();
 	nf_unregister_sockopt(&so_getorigdst);
 }
 
@@ -1366,8 +1377,20 @@
 		return ret;
 	}
 
-	ip_conntrack_hash = vmalloc(sizeof(struct list_head)
-				    * ip_conntrack_htable_size);
+	/* AK: the hash table is twice as big than needed because it
+	   uses list_head.  it would be much nicer to caches to use a
+	   single pointer list head here. */
+	ip_conntrack_vmalloc = 0; 
+	ip_conntrack_hash 
+		=(void*)__get_free_pages(GFP_KERNEL, 
+					 get_order(sizeof(struct list_head)
+						   *ip_conntrack_htable_size));
+	if (!ip_conntrack_hash) { 
+		ip_conntrack_vmalloc = 1;
+		printk(KERN_WARNING "ip_conntrack: falling back to vmalloc.\n");
+		ip_conntrack_hash = vmalloc(sizeof(struct list_head)
+					    * ip_conntrack_htable_size);
+	}
 	if (!ip_conntrack_hash) {
 		printk(KERN_ERR "Unable to create ip_conntrack_hash\n");
 		goto err_unreg_sockopt;
@@ -1375,7 +1398,7 @@
 
 	ip_conntrack_cachep = kmem_cache_create("ip_conntrack",
 	                                        sizeof(struct ip_conntrack), 0,
-	                                        SLAB_HWCACHE_ALIGN, NULL, NULL);
+	                                        SLAB_HWCACHE_ALIGN, NULL,NULL);
 	if (!ip_conntrack_cachep) {
 		printk(KERN_ERR "Unable to create ip_conntrack slab cache\n");
 		goto err_free_hash;
@@ -1416,7 +1439,7 @@
 err_free_conntrack_slab:
 	kmem_cache_destroy(ip_conntrack_cachep);
 err_free_hash:
-	vfree(ip_conntrack_hash);
+	free_conntrack_hash();
 err_unreg_sockopt:
 	nf_unregister_sockopt(&so_getorigdst);
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 14/18] Netfilter:  Conntrack Hash Allocation using __get_free_pages
  2005-01-05  3:36 [PATCH 14/18] Netfilter: Conntrack Hash Allocation using __get_free_pages Rusty Russell
@ 2005-01-07 21:26 ` Martin Josefsson
  2005-01-08 17:57   ` Sven Schuster
  0 siblings, 1 reply; 6+ messages in thread
From: Martin Josefsson @ 2005-01-07 21:26 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 3489 bytes --]

On Wed, 2005-01-05 at 14:36 +1100, Rusty Russell wrote:

Hi Rusty

> Here is a patch for 2.4 that just makes it use get_free_pages to
> test the TLB theory. Another obvious improvement would be to not
> use list_heads for the hash table buckets - a single pointer would
> likely suffice and it would cut the hash table in half, saving
> cache, TLB and memory.

Here's a patch that recalculates the number of buckets after the table
has been allocated with get_free_pages. Otherwise we are wasting memory.
But it doesn't increase ip_conntrack_max under normal conditions (only
if it becomes smaller than the bucket count and that only happens on
machines with little memory), it should result in shorter collision
lists.

But the patch might increase cache pressure since we have more buckets.
No benchmarks have been performed, in fact only tested with nfsim.

--- linux-2.6.10-bk050106/net/ipv4/netfilter/ip_conntrack_core.c	2005-01-06 18:03:05.000000000 +0100
+++ linux-2.6.10-bk050106/net/ipv4/netfilter/ip_conntrack_core.c	2005-01-07 22:14:44.000000000 +0100
@@ -1349,7 +1349,7 @@
 int __init ip_conntrack_init(void)
 {
 	unsigned int i;
-	int ret;
+	int order, ret;
 
 	/* Idea from tcp.c: use 1/16384 of memory.  On i386: 32MB
 	 * machine has 256 buckets.  >= 1GB machines have 8192 buckets. */
@@ -1366,11 +1366,6 @@
 	}
 	ip_conntrack_max = 8 * ip_conntrack_htable_size;
 
-	printk("ip_conntrack version %s (%u buckets, %d max)"
-	       " - %Zd bytes per conntrack\n", IP_CONNTRACK_VERSION,
-	       ip_conntrack_htable_size, ip_conntrack_max,
-	       sizeof(struct ip_conntrack));
-
 	ret = nf_register_sockopt(&so_getorigdst);
 	if (ret != 0) {
 		printk(KERN_ERR "Unable to register netfilter socket option\n");
@@ -1381,19 +1376,26 @@
 	   uses list_head.  it would be much nicer to caches to use a
 	   single pointer list head here. */
 	ip_conntrack_vmalloc = 0; 
-	ip_conntrack_hash 
-		=(void*)__get_free_pages(GFP_KERNEL, 
-					 get_order(sizeof(struct list_head)
-						   *ip_conntrack_htable_size));
-	if (!ip_conntrack_hash) { 
+
+	order = get_order(sizeof(struct list_head) * ip_conntrack_htable_size);
+
+	ip_conntrack_hash = (void *)__get_free_pages(GFP_KERNEL, order);
+	if (ip_conntrack_hash) {
+		/* Recalculate hashtable size based on order. */
+		ip_conntrack_htable_size = ((1U << order) << PAGE_SHIFT)
+						/ sizeof(struct list_head);
+		if (ip_conntrack_max < ip_conntrack_htable_size)
+			ip_conntrack_max = ip_conntrack_htable_size;
+	} else { 
 		ip_conntrack_vmalloc = 1;
 		printk(KERN_WARNING "ip_conntrack: falling back to vmalloc.\n");
 		ip_conntrack_hash = vmalloc(sizeof(struct list_head)
 					    * ip_conntrack_htable_size);
-	}
-	if (!ip_conntrack_hash) {
-		printk(KERN_ERR "Unable to create ip_conntrack_hash\n");
-		goto err_unreg_sockopt;
+
+		if (!ip_conntrack_hash) {
+			printk(KERN_ERR "Unable to create ip_conntrack_hash\n");
+			goto err_unreg_sockopt;
+		}
 	}
 
 	ip_conntrack_cachep = kmem_cache_create("ip_conntrack",
@@ -1434,6 +1436,11 @@
 	/*  - and look it like as a confirmed connection */
 	set_bit(IPS_CONFIRMED_BIT, &ip_conntrack_untracked.status);
 
+	printk("ip_conntrack version %s (%u buckets, %d max)"
+	       " - %Zd bytes per conntrack\n", IP_CONNTRACK_VERSION,
+	       ip_conntrack_htable_size, ip_conntrack_max,
+	       sizeof(struct ip_conntrack));
+
 	return ret;
 
 err_free_conntrack_slab:

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 14/18] Netfilter: Conntrack Hash Allocation using __get_free_pages
  2005-01-07 21:26 ` Martin Josefsson
@ 2005-01-08 17:57   ` Sven Schuster
  2005-01-08 18:17     ` Sven Schuster
  2005-01-08 18:40     ` Martin Josefsson
  0 siblings, 2 replies; 6+ messages in thread
From: Sven Schuster @ 2005-01-08 17:57 UTC (permalink / raw)
  To: Martin Josefsson; +Cc: Rusty Russell, Netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1448 bytes --]


Hi Martin,

On Fri, Jan 07, 2005 at 10:26:46PM +0100, Martin Josefsson told us:
> +++ linux-2.6.10-bk050106/net/ipv4/netfilter/ip_conntrack_core.c	2005-01-07 22:14:44.000000000 +0100
> @@ -1349,7 +1349,7 @@
...
> +	if (ip_conntrack_hash) {
> +		/* Recalculate hashtable size based on order. */
> +		ip_conntrack_htable_size = ((1U << order) << PAGE_SHIFT)
> +						/ sizeof(struct list_head);
> +		if (ip_conntrack_max < ip_conntrack_htable_size)
> +			ip_conntrack_max = ip_conntrack_htable_size;
> +	} else { 

shouldn't this be

+		if (ip_conntrack_max > ip_conntrack_htable_size)

because you only want ip_conntrack_max to decrease when htable_size has
become smaller than conntrack_max after the allocation??


Sven

>  		ip_conntrack_vmalloc = 1;
>  		printk(KERN_WARNING "ip_conntrack: falling back to vmalloc.\n");
>  		ip_conntrack_hash = vmalloc(sizeof(struct list_head)
>  					    * ip_conntrack_htable_size);
> -	}
> -	if (!ip_conntrack_hash) {
> -		printk(KERN_ERR "Unable to create ip_conntrack_hash\n");
> -		goto err_unreg_sockopt;
> +
> +		if (!ip_conntrack_hash) {
> +			printk(KERN_ERR "Unable to create ip_conntrack_hash\n");
> +			goto err_unreg_sockopt;
> +		}
>  	}
>  
>  	ip_conntrack_cachep = kmem_cache_create("ip_conntrack",


-- 
Linux zion 2.6.10-bk7 #0 Fri Jan 7 19:08:39 CET 2005 i686 athlon i386 GNU/Linux
 18:52:47 up 23:30,  1 user,  load average: 0.00, 0.02, 0.08

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 14/18] Netfilter: Conntrack Hash Allocation using __get_free_pages
  2005-01-08 17:57   ` Sven Schuster
@ 2005-01-08 18:17     ` Sven Schuster
  2005-01-08 18:41       ` Martin Josefsson
  2005-01-08 18:40     ` Martin Josefsson
  1 sibling, 1 reply; 6+ messages in thread
From: Sven Schuster @ 2005-01-08 18:17 UTC (permalink / raw)
  To: Martin Josefsson; +Cc: Rusty Russell, Netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]


Hi,

please cat my previous posting to /dev/null, this was brain garbage
:-)


Sven (staying away from computer for the rest of the day)


On Sat, Jan 08, 2005 at 06:57:44PM +0100, Sven Schuster told us:
> shouldn't this be
> 
> +		if (ip_conntrack_max > ip_conntrack_htable_size)
> 
> because you only want ip_conntrack_max to decrease when htable_size has
> become smaller than conntrack_max after the allocation??
> 
> 
> Sven
> 

-- 
Linux zion 2.6.10-bk7 #0 Fri Jan 7 19:08:39 CET 2005 i686 athlon i386 GNU/Linux
 19:15:20 up 23:52,  1 user,  load average: 0.11, 0.05, 0.02

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 14/18] Netfilter:  Conntrack Hash Allocation using __get_free_pages
  2005-01-08 17:57   ` Sven Schuster
  2005-01-08 18:17     ` Sven Schuster
@ 2005-01-08 18:40     ` Martin Josefsson
  1 sibling, 0 replies; 6+ messages in thread
From: Martin Josefsson @ 2005-01-08 18:40 UTC (permalink / raw)
  To: Sven Schuster; +Cc: Rusty Russell, Netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 2413 bytes --]

On Sat, 2005-01-08 at 18:57 +0100, Sven Schuster wrote:
> Hi Martin,

Hi Sven

> On Fri, Jan 07, 2005 at 10:26:46PM +0100, Martin Josefsson told us:
> > +++ linux-2.6.10-bk050106/net/ipv4/netfilter/ip_conntrack_core.c	2005-01-07 22:14:44.000000000 +0100
> > @@ -1349,7 +1349,7 @@
> ...
> > +	if (ip_conntrack_hash) {
> > +		/* Recalculate hashtable size based on order. */
> > +		ip_conntrack_htable_size = ((1U << order) << PAGE_SHIFT)
> > +						/ sizeof(struct list_head);
> > +		if (ip_conntrack_max < ip_conntrack_htable_size)
> > +			ip_conntrack_max = ip_conntrack_htable_size;
> > +	} else { 
> 
> shouldn't this be
> 
> +		if (ip_conntrack_max > ip_conntrack_htable_size)
> 
> because you only want ip_conntrack_max to decrease when htable_size has
> become smaller than conntrack_max after the allocation??

No since ip_conntrack_htable_size can't become smaller after the
recalculation, it can either remain the same or become larger due to
that we allocate memory based on order.

Say you have 704MB ram like my workstation, this gives 5632 buckets
without the patch.
5632 buckets = 5632 * 8 (two pointers on 32bit machine) bytes = 45056
bytes

order 0 means 2^0 pages of memory, order 1 means 2^1 pages of
memory.....
and a page of memory is 4096byte on x86. So in order to allocate 45056
bytes of memory we need 45056 bytes / 4096 bytes/page = 11 pages.
And in order to allocate 11 pages we need an order 4 allocation. 2^4 =
16 pages (2^3 = 8 pages which is too small) = 65536 bytes.

So we've allocated 16 pages of memory but we are only using 11 of those
pages.
My patch simply recalculates ip_conntrack_htable_size in order to make
most use of the memory we've actually allocated. pagesize * 2^order is
always at least as big as the amount we requested. So
ip_conntrack_htable_size will never be smaller after this recalculation.

The reason I added the 'if (ip_conntrack_max <
ip_conntrack_htable_size)' was that there is a possibility that the
recalculation increases ip_conntrack_htable_size so it becomes larger
than ip_conntrack_max which doesn't really make sense.
This can happen on machines with very little memory, if I've done my
match correctly this happens on machines with less than 8MB ram. One can
ask the question of how sane a person is to run conntrack on an <8MB
machine... but it looks better in nfsim :)

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 14/18] Netfilter: Conntrack Hash Allocation using __get_free_pages
  2005-01-08 18:17     ` Sven Schuster
@ 2005-01-08 18:41       ` Martin Josefsson
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Josefsson @ 2005-01-08 18:41 UTC (permalink / raw)
  To: Sven Schuster; +Cc: Rusty Russell, Netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 352 bytes --]

On Sat, 2005-01-08 at 19:17 +0100, Sven Schuster wrote:
> Hi,

Hi

> please cat my previous posting to /dev/null, this was brain garbage
> :-)
> 
> 
> Sven (staying away from computer for the rest of the day)

Heh, I've already replied, you can see it as a more detailed description
for the rest of the list if you want :)

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-01-08 18:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-05  3:36 [PATCH 14/18] Netfilter: Conntrack Hash Allocation using __get_free_pages Rusty Russell
2005-01-07 21:26 ` Martin Josefsson
2005-01-08 17:57   ` Sven Schuster
2005-01-08 18:17     ` Sven Schuster
2005-01-08 18:41       ` Martin Josefsson
2005-01-08 18:40     ` Martin Josefsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.