linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] New jhash function
@ 2010-11-25 13:15 Jozsef Kadlecsik
  2010-11-25 13:15 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jozsef Kadlecsik
  0 siblings, 1 reply; 18+ messages in thread
From: Jozsef Kadlecsik @ 2010-11-25 13:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik

Hi,

Please consider applying the following patch:

The current jhash.h implements the lookup2() hash function by Bob Jenkins. 
However, lookup2() is outdated as Bob wrote a new hash function called 
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit 
  changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
  than lookup2() depending on the key length.

You can read a longer comparison of the two and other hash functions at 
http://burtleburtle.net/bob/hash/doobs.html.

jhash is widely used in the kernel and because the functions
are inlined, the cost in size is significant. The new functions
are slightly larger than the previous ones so the new implementation
uses non-inlined fucntions. (See
http://lkml.indiana.edu/hypermail//linux/kernel/0902.2/01149.html).
Therefore the first patch replaces the calls to the internal macros
of jhash with the jhash function calls and the second patch contains
the implementation of the new jhash functions.

Jozsef Kadlecsik (2):
  Prepare the tree for un-inlined jhash.
  The new jhash implementation

 include/linux/jhash.h            |  136 ++-------------------------------
 lib/Makefile                     |    2 +-
 lib/jhash.c                      |  153 ++++++++++++++++++++++++++++++++++++++
 net/ipv6/inet6_connection_sock.c |   18 ++---
 net/ipv6/reassembly.c            |   36 ++++-----
 5 files changed, 186 insertions(+), 159 deletions(-)
 create mode 100644 lib/jhash.c


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] Prepare the tree for un-inlined jhash.
  2010-11-25 13:15 [PATCH 0/2] New jhash function Jozsef Kadlecsik
@ 2010-11-25 13:15 ` Jozsef Kadlecsik
  2010-11-25 13:15   ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik
  2010-11-25 13:39   ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jan Engelhardt
  0 siblings, 2 replies; 18+ messages in thread
From: Jozsef Kadlecsik @ 2010-11-25 13:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik

jhash is widely used in the kernel and because the functions
are inlined, the cost in size is significant. Also, the new jhash
functions are slightly larger than the previous ones so better un-inline.
As a preparation step, the calls to the internal macros are replaced
with the plain jhash function calls.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
---
 net/ipv6/inet6_connection_sock.c |   18 ++++++++----------
 net/ipv6/reassembly.c            |   36 ++++++++++++++++--------------------
 2 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 8a16280..861d252 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -60,18 +60,16 @@ EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict);
 static u32 inet6_synq_hash(const struct in6_addr *raddr, const __be16 rport,
 			   const u32 rnd, const u16 synq_hsize)
 {
-	u32 a = (__force u32)raddr->s6_addr32[0];
-	u32 b = (__force u32)raddr->s6_addr32[1];
-	u32 c = (__force u32)raddr->s6_addr32[2];
-
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
-	c += rnd;
-	__jhash_mix(a, b, c);
-
-	a += (__force u32)raddr->s6_addr32[3];
-	b += (__force u32)rport;
-	__jhash_mix(a, b, c);
+	u32 c;
+
+	c = jhash_3words((__force u32)raddr->s6_addr32[0],
+			 (__force u32)raddr->s6_addr32[1],
+			 (__force u32)raddr->s6_addr32[2],
+			 rnd);
+
+	c = jhash_2words((__force u32)raddr->s6_addr32[3],
+			 (__force u32)rport,
+			 c);
 
 	return c & (synq_hsize - 1);
 }
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index c7ba314..5e57490 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -104,26 +104,22 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
 			     const struct in6_addr *daddr, u32 rnd)
 {
-	u32 a, b, c;
-
-	a = (__force u32)saddr->s6_addr32[0];
-	b = (__force u32)saddr->s6_addr32[1];
-	c = (__force u32)saddr->s6_addr32[2];
-
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
-	c += rnd;
-	__jhash_mix(a, b, c);
-
-	a += (__force u32)saddr->s6_addr32[3];
-	b += (__force u32)daddr->s6_addr32[0];
-	c += (__force u32)daddr->s6_addr32[1];
-	__jhash_mix(a, b, c);
-
-	a += (__force u32)daddr->s6_addr32[2];
-	b += (__force u32)daddr->s6_addr32[3];
-	c += (__force u32)id;
-	__jhash_mix(a, b, c);
+	u32 c;
+
+	c = jhash_3words((__force u32)saddr->s6_addr32[0],
+			 (__force u32)saddr->s6_addr32[1],
+			 (__force u32)saddr->s6_addr32[2],
+			 rnd);
+
+	c = jhash_3words((__force u32)saddr->s6_addr32[3],
+			 (__force u32)daddr->s6_addr32[0],
+			 (__force u32)daddr->s6_addr32[1],
+			 c);
+
+	c =  jhash_3words((__force u32)daddr->s6_addr32[2],
+			  (__force u32)daddr->s6_addr32[3],
+			  (__force u32)id,
+			  c);
 
 	return c & (INETFRAGS_HASHSZ - 1);
 }
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] The new jhash implementation
  2010-11-25 13:15 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jozsef Kadlecsik
@ 2010-11-25 13:15   ` Jozsef Kadlecsik
  2010-11-25 13:49     ` Eric Dumazet
  2010-11-27  3:31     ` Rusty Russell
  2010-11-25 13:39   ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jan Engelhardt
  1 sibling, 2 replies; 18+ messages in thread
From: Jozsef Kadlecsik @ 2010-11-25 13:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit
  changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
  than lookup2() depending on the key length.

The patch replaces the lookup2() implementation of the 'jhash*'
functions with that of lookup3().

You can read a longer comparison of the two and other hash functions at
http://burtleburtle.net/bob/hash/doobs.html.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
---
 include/linux/jhash.h |  136 +++-----------------------------------------
 lib/Makefile          |    2 +-
 lib/jhash.c           |  153 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 162 insertions(+), 129 deletions(-)
 create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index ced1159..ca69ac3 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -1,134 +1,14 @@
 #ifndef _LINUX_JHASH_H
 #define _LINUX_JHASH_H
 
-/* jhash.h: Jenkins hash support.
- *
- * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
- *
- * http://burtleburtle.net/bob/hash/
- *
- * These are the credits from Bob's sources:
- *
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose.  It has no warranty.
- *
- * Copyright (C) 2003 David S. Miller (davem@redhat.com)
- *
- * I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault.  -DaveM
- */
-
-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
-{ \
-  a -= b; a -= c; a ^= (c>>13); \
-  b -= c; b -= a; b ^= (a<<8); \
-  c -= a; c -= b; c ^= (b>>13); \
-  a -= b; a -= c; a ^= (c>>12);  \
-  b -= c; b -= a; b ^= (a<<16); \
-  c -= a; c -= b; c ^= (b>>5); \
-  a -= b; a -= c; a ^= (c>>3);  \
-  b -= c; b -= a; b ^= (a<<10); \
-  c -= a; c -= b; c ^= (b>>15); \
-}
-
-/* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO	0x9e3779b9
-
-/* The most generic version, hashes an arbitrary sequence
- * of bytes.  No alignment or length assumptions are made about
- * the input key.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
-	u32 a, b, c, len;
-	const u8 *k = key;
-
-	len = length;
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-
-	while (len >= 12) {
-		a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
-		b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
-		c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
-		__jhash_mix(a,b,c);
-
-		k += 12;
-		len -= 12;
-	}
-
-	c += length;
-	switch (len) {
-	case 11: c += ((u32)k[10]<<24);
-	case 10: c += ((u32)k[9]<<16);
-	case 9 : c += ((u32)k[8]<<8);
-	case 8 : b += ((u32)k[7]<<24);
-	case 7 : b += ((u32)k[6]<<16);
-	case 6 : b += ((u32)k[5]<<8);
-	case 5 : b += k[4];
-	case 4 : a += ((u32)k[3]<<24);
-	case 3 : a += ((u32)k[2]<<16);
-	case 2 : a += ((u32)k[1]<<8);
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
-
-	return c;
-}
-
-/* A special optimized version that handles 1 or more of u32s.
- * The length parameter here is the number of u32s in the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
-	u32 a, b, c, len;
-
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-	len = length;
-
-	while (len >= 3) {
-		a += k[0];
-		b += k[1];
-		c += k[2];
-		__jhash_mix(a, b, c);
-		k += 3; len -= 3;
-	}
-
-	c += length * 4;
-
-	switch (len) {
-	case 2 : b += k[1];
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
-
-	return c;
-}
-
-
-/* A special ultra-optimized versions that knows they are hashing exactly
- * 3, 2 or 1 word(s).
- *
- * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally
- *       done at the end is not done here.
- */
-static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
-{
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
-	c += initval;
-
-	__jhash_mix(a, b, c);
-
-	return c;
-}
+/* Best hash sizes are of power of two */
+#define jhash_size(n)   ((u32)1<<(n))
+/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
+#define jhash_mask(n)   (jhash_size(n)-1)
+
+extern u32 jhash(const void *key, u32 length, u32 initval);
+extern u32 jhash2(const u32 *k, u32 length, u32 initval);
+extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval);
 
 static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
 {
diff --git a/lib/Makefile b/lib/Makefile
index e6a3763..a1a4932 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -10,7 +10,7 @@ endif
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o dump_stack.o \
 	 idr.o int_sqrt.o extable.o prio_tree.o \
-	 sha1.o irq_regs.o reciprocal_div.o argv_split.o \
+	 jhash.o sha1.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o prio_heap.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o flex_array.o
 
diff --git a/lib/jhash.c b/lib/jhash.c
new file mode 100644
index 0000000..593cc77
--- /dev/null
+++ b/lib/jhash.c
@@ -0,0 +1,153 @@
+/* jhash.c: Jenkins hash support.
+ *
+ * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
+ *
+ * http://burtleburtle.net/bob/hash/
+ *
+ * These are the credits from Bob's sources:
+ *
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+ *
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+ * are externally useful functions.  Routines to test the hash are included
+ * if SELF_TEST is defined.  You can use this free for any purpose.  It's in
+ * the public domain.  It has no warranty.
+ *
+ * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
+ *
+ * I've modified Bob's hash to be useful in the Linux kernel, and
+ * any bugs present are my fault.
+ * Jozsef
+ */
+#include <linux/bitops.h>
+#include <linux/module.h>
+#include <linux/jhash.h>
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c)			\
+{						\
+	a -= c;  a ^= rol32(c, 4);  c += b;	\
+	b -= a;  b ^= rol32(a, 6);  a += c;	\
+	c -= b;  c ^= rol32(b, 8);  b += a;	\
+	a -= c;  a ^= rol32(c, 16); c += b;	\
+	b -= a;  b ^= rol32(a, 19); a += c;	\
+	c -= b;  c ^= rol32(b, 4);  b += a;	\
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c)			\
+{						\
+	c ^= b; c -= rol32(b, 14);		\
+	a ^= c; a -= rol32(c, 11);		\
+	b ^= a; b -= rol32(a, 25);		\
+	c ^= b; c -= rol32(b, 16);		\
+	a ^= c; a -= rol32(c, 4);		\
+	b ^= a; b -= rol32(a, 14);		\
+	c ^= b; c -= rol32(b, 24);		\
+}
+
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL		0xdeadbeef
+
+/* jhash - hash an arbitrary key
+ * @k: sequence of bytes as key
+ * @length: the length of the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * The generic version, hashes an arbitrary sequence of bytes.
+ * No alignment or length assumptions are made about the input key.
+ *
+ * Returns the hash value of the key. The result depends on endianness.
+ */
+u32 jhash(const void *key, u32 length, u32 initval)
+{
+	u32 a, b, c;
+	const u8 *k = key;
+
+	/* Set up the internal state */
+	a = b = c = JHASH_INITVAL + length + initval;
+
+	/* All but the last block: affect some 32 bits of (a,b,c) */
+	while (length > 12) {
+		a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24);
+		b += k[4] + ((u32)k[5]<<8) + ((u32)k[6]<<16) + ((u32)k[7]<<24);
+		c += k[8] + ((u32)k[9]<<8) + ((u32)k[10]<<16) + ((u32)k[11]<<24);
+		__jhash_mix(a, b, c);
+		length -= 12;
+		k += 12;
+	}
+	/* Last block: affect all 32 bits of (c) */
+	/* All the case statements fall through */
+	switch (length) {
+	case 12: c += (u32)k[11]<<24;
+	case 11: c += (u32)k[10]<<16;
+	case 10: c += (u32)k[9]<<8;
+	case 9:  c += k[8];
+	case 8:  b += (u32)k[7]<<24;
+	case 7:  b += (u32)k[6]<<16;
+	case 6:  b += (u32)k[5]<<8;
+	case 5:  b += k[4];
+	case 4:  a += (u32)k[3]<<24;
+	case 3:  a += (u32)k[2]<<16;
+	case 2:  a += (u32)k[1]<<8;
+	case 1:  a += k[0];
+		 __jhash_final(a, b, c);
+	case 0: /* Nothing left to add */
+		break;
+	}
+
+	return c;
+}
+EXPORT_SYMBOL(jhash);
+
+/* jhash2 - hash an array of u32's
+ * @k: the key which must be an array of u32's
+ * @length: the number of u32's in the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * Returns the hash value of the key.
+ */
+u32 jhash2(const u32 *k, u32 length, u32 initval)
+{
+	u32 a, b, c;
+
+	/* Set up the internal state */
+	a = b = c = JHASH_INITVAL + (length<<2) + initval;
+
+	/* Handle most of the key */
+	while (length > 3) {
+		a += k[0];
+		b += k[1];
+		c += k[2];
+		__jhash_mix(a, b, c);
+		length -= 3;
+		k += 3;
+	}
+
+	/* Handle the last 3 u32's: all the case statements fall through */
+	switch (length) {
+	case 3: c += k[2];
+	case 2: b += k[1];
+	case 1: a += k[0];
+		__jhash_final(a, b, c);
+	case 0:	/* Nothing left to add */
+		break;
+	}
+
+	return c;
+}
+EXPORT_SYMBOL(jhash2);
+
+/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
+u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
+{
+	a += JHASH_INITVAL;
+	b += JHASH_INITVAL;
+	c += initval;
+
+	__jhash_mix(a, b, c);
+
+	return c;
+}
+EXPORT_SYMBOL(jhash_3words);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] Prepare the tree for un-inlined jhash.
  2010-11-25 13:15 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jozsef Kadlecsik
  2010-11-25 13:15   ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik
@ 2010-11-25 13:39   ` Jan Engelhardt
  2010-11-25 13:57     ` Jozsef Kadlecsik
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Engelhardt @ 2010-11-25 13:39 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell

On Thursday 2010-11-25 14:15, Jozsef Kadlecsik wrote:

>jhash is widely used in the kernel and because the functions
>are inlined, the cost in size is significant. Also, the new jhash
>functions are slightly larger than the previous ones so better un-inline.
>As a preparation step, the calls to the internal macros are replaced
>with the plain jhash function calls.

Do you have a non-normative allyesconfig/allmodconfig build whose 
size(1) you can run on, to show approximately just how much it differs?


thanks,
Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] The new jhash implementation
  2010-11-25 13:15   ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik
@ 2010-11-25 13:49     ` Eric Dumazet
  2010-11-25 13:55       ` Changli Gao
  2010-11-25 14:41       ` Jozsef Kadlecsik
  2010-11-27  3:31     ` Rusty Russell
  1 sibling, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2010-11-25 13:49 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell

Le jeudi 25 novembre 2010 à 14:15 +0100, Jozsef Kadlecsik a écrit :
> The current jhash.h implements the lookup2() hash function by Bob Jenkins.
> However, lookup2() is outdated as Bob wrote a new hash function called
> lookup3(). The new hash function
> 
> - mixes better than lookup2(): it passes the check that every input bit
>   changes every output bit 50% of the time, while lookup2() failed it.
> - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
>   than lookup2() depending on the key length.
> 
> The patch replaces the lookup2() implementation of the 'jhash*'
> functions with that of lookup3().
> 
> You can read a longer comparison of the two and other hash functions at
> http://burtleburtle.net/bob/hash/doobs.html.
> 
> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> ---
>  include/linux/jhash.h |  136 +++-----------------------------------------
>  lib/Makefile          |    2 +-
>  lib/jhash.c           |  153 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 162 insertions(+), 129 deletions(-)
>  create mode 100644 lib/jhash.c
> 
...

I agree jhash() should be not be inlined.

I am not sure for other variants.

> +u32 jhash(const void *key, u32 length, u32 initval)
> +{
> +	u32 a, b, c;
> +	const u8 *k = key;
> +
> +	/* Set up the internal state */
> +	a = b = c = JHASH_INITVAL + length + initval;
> +
> +	/* All but the last block: affect some 32 bits of (a,b,c) */
> +	while (length > 12) {
> +		a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24);

disassembly code on x86_32 for the previous line :

  26:	66 90                	xchg   %ax,%ax
  28:	0f b6 72 01          	movzbl 0x1(%edx),%esi
  2c:	0f b6 4a 02          	movzbl 0x2(%edx),%ecx
  30:	c1 e6 08             	shl    $0x8,%esi
  33:	c1 e1 10             	shl    $0x10,%ecx
  36:	8d 0c 0e             	lea    (%esi,%ecx,1),%ecx
  39:	0f b6 32             	movzbl (%edx),%esi
  3c:	8d 34 31             	lea    (%ecx,%esi,1),%esi
  3f:	0f b6 4a 03          	movzbl 0x3(%edx),%ecx
  43:	c1 e1 18             	shl    $0x18,%ecx
  46:	8d 0c 0e             	lea    (%esi,%ecx,1),%ecx

or (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) :

  1b:	0f b6 7b 01          	movzbl 0x1(%ebx),%edi
  1f:	c1 e7 08             	shl    $0x8,%edi
  22:	89 7d f0             	mov    %edi,-0x10(%ebp)
  25:	0f b6 7b 02          	movzbl 0x2(%ebx),%edi
  29:	c1 e7 10             	shl    $0x10,%edi
  2c:	03 7d f0             	add    -0x10(%ebp),%edi
  2f:	89 7d f0             	mov    %edi,-0x10(%ebp)
  32:	0f b6 3b             	movzbl (%ebx),%edi
  35:	03 7d f0             	add    -0x10(%ebp),%edi
  38:	89 7d f0             	mov    %edi,-0x10(%ebp)
  3b:	0f b6 7b 03          	movzbl 0x3(%ebx),%edi
  3f:	c1 e7 18             	shl    $0x18,%edi
  42:	03 7d f0             	add    -0x10(%ebp),%edi


I suggest :

#include <linux/unaligned/packed_struct.h>
...
	a += __get_unaligned_cpu32(k);
	b += __get_unaligned_cpu32(k+4);
	c += __get_unaligned_cpu32(k+8);

Fits nicely in registers.
 



> +		b += k[4] + ((u32)k[5]<<8) + ((u32)k[6]<<16) + ((u32)k[7]<<24);
> +		c += k[8] + ((u32)k[9]<<8) + ((u32)k[10]<<16) + ((u32)k[11]<<24);
> +		__jhash_mix(a, b, c);
> +		length -= 12;
> +		k += 12;
> +	}
> +	/* Last block: affect all 32 bits of (c) */
> +	/* All the case statements fall through */
> +	switch (length) {
> +	case 12: c += (u32)k[11]<<24;
> +	case 11: c += (u32)k[10]<<16;
> +	case 10: c += (u32)k[9]<<8;
> +	case 9:  c += k[8];
> +	case 8:  b += (u32)k[7]<<24;
> +	case 7:  b += (u32)k[6]<<16;
> +	case 6:  b += (u32)k[5]<<8;
> +	case 5:  b += k[4];
> +	case 4:  a += (u32)k[3]<<24;
> +	case 3:  a += (u32)k[2]<<16;
> +	case 2:  a += (u32)k[1]<<8;
> +	case 1:  a += k[0];
> +		 __jhash_final(a, b, c);
> +	case 0: /* Nothing left to add */
> +		break;
> +	}
> +
> +	return c;
> +}
> +EXPORT_SYMBOL(jhash);




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] The new jhash implementation
  2010-11-25 13:49     ` Eric Dumazet
@ 2010-11-25 13:55       ` Changli Gao
  2010-11-25 14:05         ` Eric Dumazet
  2010-11-25 14:41       ` Jozsef Kadlecsik
  1 sibling, 1 reply; 18+ messages in thread
From: Changli Gao @ 2010-11-25 13:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jozsef Kadlecsik, linux-kernel, netdev, netfilter-devel,
	Linus Torvalds, Rusty Russell

On Thu, Nov 25, 2010 at 9:49 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 25 novembre 2010 à 14:15 +0100, Jozsef Kadlecsik a écrit :
>> The current jhash.h implements the lookup2() hash function by Bob Jenkins.
>> However, lookup2() is outdated as Bob wrote a new hash function called
>> lookup3(). The new hash function
>>
>> - mixes better than lookup2(): it passes the check that every input bit
>>   changes every output bit 50% of the time, while lookup2() failed it.
>> - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
>>   than lookup2() depending on the key length.
>>
>> The patch replaces the lookup2() implementation of the 'jhash*'
>> functions with that of lookup3().
>>
>> You can read a longer comparison of the two and other hash functions at
>> http://burtleburtle.net/bob/hash/doobs.html.
>>
>> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
>> ---
>>  include/linux/jhash.h |  136 +++-----------------------------------------
>>  lib/Makefile          |    2 +-
>>  lib/jhash.c           |  153 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 162 insertions(+), 129 deletions(-)
>>  create mode 100644 lib/jhash.c
>>
> ...
>
> I agree jhash() should be not be inlined.
>
> I am not sure for other variants.
>
>> +u32 jhash(const void *key, u32 length, u32 initval)
>> +{
>> +     u32 a, b, c;
>> +     const u8 *k = key;
>> +
>> +     /* Set up the internal state */
>> +     a = b = c = JHASH_INITVAL + length + initval;
>> +
>> +     /* All but the last block: affect some 32 bits of (a,b,c) */
>> +     while (length > 12) {
>> +             a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24);
>
> disassembly code on x86_32 for the previous line :
>
>  26:   66 90                   xchg   %ax,%ax
>  28:   0f b6 72 01             movzbl 0x1(%edx),%esi
>  2c:   0f b6 4a 02             movzbl 0x2(%edx),%ecx
>  30:   c1 e6 08                shl    $0x8,%esi
>  33:   c1 e1 10                shl    $0x10,%ecx
>  36:   8d 0c 0e                lea    (%esi,%ecx,1),%ecx
>  39:   0f b6 32                movzbl (%edx),%esi
>  3c:   8d 34 31                lea    (%ecx,%esi,1),%esi
>  3f:   0f b6 4a 03             movzbl 0x3(%edx),%ecx
>  43:   c1 e1 18                shl    $0x18,%ecx
>  46:   8d 0c 0e                lea    (%esi,%ecx,1),%ecx
>
> or (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) :
>
>  1b:   0f b6 7b 01             movzbl 0x1(%ebx),%edi
>  1f:   c1 e7 08                shl    $0x8,%edi
>  22:   89 7d f0                mov    %edi,-0x10(%ebp)
>  25:   0f b6 7b 02             movzbl 0x2(%ebx),%edi
>  29:   c1 e7 10                shl    $0x10,%edi
>  2c:   03 7d f0                add    -0x10(%ebp),%edi
>  2f:   89 7d f0                mov    %edi,-0x10(%ebp)
>  32:   0f b6 3b                movzbl (%ebx),%edi
>  35:   03 7d f0                add    -0x10(%ebp),%edi
>  38:   89 7d f0                mov    %edi,-0x10(%ebp)
>  3b:   0f b6 7b 03             movzbl 0x3(%ebx),%edi
>  3f:   c1 e7 18                shl    $0x18,%edi
>  42:   03 7d f0                add    -0x10(%ebp),%edi
>
>
> I suggest :
>
> #include <linux/unaligned/packed_struct.h>
> ...
>        a += __get_unaligned_cpu32(k);
>        b += __get_unaligned_cpu32(k+4);
>        c += __get_unaligned_cpu32(k+8);
>
> Fits nicely in registers.
>

I think you mean get_unaligned_le32().

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] Prepare the tree for un-inlined jhash.
  2010-11-25 13:39   ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jan Engelhardt
@ 2010-11-25 13:57     ` Jozsef Kadlecsik
  0 siblings, 0 replies; 18+ messages in thread
From: Jozsef Kadlecsik @ 2010-11-25 13:57 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell

On Thu, 25 Nov 2010, Jan Engelhardt wrote:

> On Thursday 2010-11-25 14:15, Jozsef Kadlecsik wrote:
> 
> >jhash is widely used in the kernel and because the functions
> >are inlined, the cost in size is significant. Also, the new jhash
> >functions are slightly larger than the previous ones so better un-inline.
> >As a preparation step, the calls to the internal macros are replaced
> >with the plain jhash function calls.
> 
> Do you have a non-normative allyesconfig/allmodconfig build whose 
> size(1) you can run on, to show approximately just how much it differs?

In the cover mail I referred the link to the message from Ilpo Jarvinen: 
"I once looked into inlining cost and jhash functions were among the most 
wasteful (kernel-wide). Multiple jhash bodies were 100+ bytes, and the 
overall cost was 10k+."

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] The new jhash implementation
  2010-11-25 13:55       ` Changli Gao
@ 2010-11-25 14:05         ` Eric Dumazet
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2010-11-25 14:05 UTC (permalink / raw)
  To: Changli Gao
  Cc: Jozsef Kadlecsik, linux-kernel, netdev, netfilter-devel,
	Linus Torvalds, Rusty Russell

Le jeudi 25 novembre 2010 à 21:55 +0800, Changli Gao a écrit :

> > I suggest :
> >
> > #include <linux/unaligned/packed_struct.h>
> > ...
> >        a += __get_unaligned_cpu32(k);
> >        b += __get_unaligned_cpu32(k+4);
> >        c += __get_unaligned_cpu32(k+8);
> >
> > Fits nicely in registers.
> >
> 
> I think you mean get_unaligned_le32().
> 

No, I meant __get_unaligned_cpu32()

We do same thing in jhash2() :

	a += k[0]; 
	b += k[1];
	c += k[2];

We dont care of bit order of the 32bit quantity we are adding to a,b or
c , as long its consistent for the current machine ;)

get_unaligned_le32() would be slow on big endian arches.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] The new jhash implementation
  2010-11-25 13:49     ` Eric Dumazet
  2010-11-25 13:55       ` Changli Gao
@ 2010-11-25 14:41       ` Jozsef Kadlecsik
  2010-11-25 17:21         ` Eric Dumazet
  1 sibling, 1 reply; 18+ messages in thread
From: Jozsef Kadlecsik @ 2010-11-25 14:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell

On Thu, 25 Nov 2010, Eric Dumazet wrote:

> I agree jhash() should be not be inlined.
> 
> I am not sure for other variants.

Yeah, I have got the same feelings. I decided to un-inline all of them 
because that way the internal macros are not exposed at all.
 
> > +u32 jhash(const void *key, u32 length, u32 initval)
> > +{
> > +	u32 a, b, c;
> > +	const u8 *k = key;
> > +
> > +	/* Set up the internal state */
> > +	a = b = c = JHASH_INITVAL + length + initval;
> > +
> > +	/* All but the last block: affect some 32 bits of (a,b,c) */
> > +	while (length > 12) {
> > +		a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24);
> 
> disassembly code on x86_32 for the previous line :
> 
>   26:	66 90                	xchg   %ax,%ax
>   28:	0f b6 72 01          	movzbl 0x1(%edx),%esi
>   2c:	0f b6 4a 02          	movzbl 0x2(%edx),%ecx
>   30:	c1 e6 08             	shl    $0x8,%esi
>   33:	c1 e1 10             	shl    $0x10,%ecx
>   36:	8d 0c 0e             	lea    (%esi,%ecx,1),%ecx
>   39:	0f b6 32             	movzbl (%edx),%esi
>   3c:	8d 34 31             	lea    (%ecx,%esi,1),%esi
>   3f:	0f b6 4a 03          	movzbl 0x3(%edx),%ecx
>   43:	c1 e1 18             	shl    $0x18,%ecx
>   46:	8d 0c 0e             	lea    (%esi,%ecx,1),%ecx
> 
> or (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) :
> 
>   1b:	0f b6 7b 01          	movzbl 0x1(%ebx),%edi
>   1f:	c1 e7 08             	shl    $0x8,%edi
>   22:	89 7d f0             	mov    %edi,-0x10(%ebp)
>   25:	0f b6 7b 02          	movzbl 0x2(%ebx),%edi
>   29:	c1 e7 10             	shl    $0x10,%edi
>   2c:	03 7d f0             	add    -0x10(%ebp),%edi
>   2f:	89 7d f0             	mov    %edi,-0x10(%ebp)
>   32:	0f b6 3b             	movzbl (%ebx),%edi
>   35:	03 7d f0             	add    -0x10(%ebp),%edi
>   38:	89 7d f0             	mov    %edi,-0x10(%ebp)
>   3b:	0f b6 7b 03          	movzbl 0x3(%ebx),%edi
>   3f:	c1 e7 18             	shl    $0x18,%edi
>   42:	03 7d f0             	add    -0x10(%ebp),%edi
> 
> 
> I suggest :
> 
> #include <linux/unaligned/packed_struct.h>
> ...
> 	a += __get_unaligned_cpu32(k);
> 	b += __get_unaligned_cpu32(k+4);
> 	c += __get_unaligned_cpu32(k+8);
> 
> Fits nicely in registers.

Good idea, thanks!

Here follows the updated second patch:

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit
  changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
  than lookup2() depending on the key length.

The patch replaces the lookup2() implementation of the 'jhash*'
functions with that of lookup3().

You can read a longer comparison of the two and other hash functions at
http://burtleburtle.net/bob/hash/doobs.html.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
---
 include/linux/jhash.h |  136 +++----------------------------------------
 lib/Makefile          |    2 +-
 lib/jhash.c           |  154 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 163 insertions(+), 129 deletions(-)
 create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index ced1159..ca69ac3 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -1,134 +1,14 @@
 #ifndef _LINUX_JHASH_H
 #define _LINUX_JHASH_H
 
-/* jhash.h: Jenkins hash support.
- *
- * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
- *
- * http://burtleburtle.net/bob/hash/
- *
- * These are the credits from Bob's sources:
- *
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose.  It has no warranty.
- *
- * Copyright (C) 2003 David S. Miller (davem@redhat.com)
- *
- * I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault.  -DaveM
- */
-
-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
-{ \
-  a -= b; a -= c; a ^= (c>>13); \
-  b -= c; b -= a; b ^= (a<<8); \
-  c -= a; c -= b; c ^= (b>>13); \
-  a -= b; a -= c; a ^= (c>>12);  \
-  b -= c; b -= a; b ^= (a<<16); \
-  c -= a; c -= b; c ^= (b>>5); \
-  a -= b; a -= c; a ^= (c>>3);  \
-  b -= c; b -= a; b ^= (a<<10); \
-  c -= a; c -= b; c ^= (b>>15); \
-}
-
-/* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO	0x9e3779b9
-
-/* The most generic version, hashes an arbitrary sequence
- * of bytes.  No alignment or length assumptions are made about
- * the input key.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
-	u32 a, b, c, len;
-	const u8 *k = key;
-
-	len = length;
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-
-	while (len >= 12) {
-		a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
-		b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
-		c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
-		__jhash_mix(a,b,c);
-
-		k += 12;
-		len -= 12;
-	}
-
-	c += length;
-	switch (len) {
-	case 11: c += ((u32)k[10]<<24);
-	case 10: c += ((u32)k[9]<<16);
-	case 9 : c += ((u32)k[8]<<8);
-	case 8 : b += ((u32)k[7]<<24);
-	case 7 : b += ((u32)k[6]<<16);
-	case 6 : b += ((u32)k[5]<<8);
-	case 5 : b += k[4];
-	case 4 : a += ((u32)k[3]<<24);
-	case 3 : a += ((u32)k[2]<<16);
-	case 2 : a += ((u32)k[1]<<8);
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
-
-	return c;
-}
-
-/* A special optimized version that handles 1 or more of u32s.
- * The length parameter here is the number of u32s in the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
-	u32 a, b, c, len;
-
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-	len = length;
-
-	while (len >= 3) {
-		a += k[0];
-		b += k[1];
-		c += k[2];
-		__jhash_mix(a, b, c);
-		k += 3; len -= 3;
-	}
-
-	c += length * 4;
-
-	switch (len) {
-	case 2 : b += k[1];
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
-
-	return c;
-}
-
-
-/* A special ultra-optimized versions that knows they are hashing exactly
- * 3, 2 or 1 word(s).
- *
- * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally
- *       done at the end is not done here.
- */
-static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
-{
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
-	c += initval;
-
-	__jhash_mix(a, b, c);
-
-	return c;
-}
+/* Best hash sizes are of power of two */
+#define jhash_size(n)   ((u32)1<<(n))
+/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
+#define jhash_mask(n)   (jhash_size(n)-1)
+
+extern u32 jhash(const void *key, u32 length, u32 initval);
+extern u32 jhash2(const u32 *k, u32 length, u32 initval);
+extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval);
 
 static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
 {
diff --git a/lib/Makefile b/lib/Makefile
index e6a3763..a1a4932 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -10,7 +10,7 @@ endif
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o dump_stack.o \
 	 idr.o int_sqrt.o extable.o prio_tree.o \
-	 sha1.o irq_regs.o reciprocal_div.o argv_split.o \
+	 jhash.o sha1.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o prio_heap.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o flex_array.o
 
diff --git a/lib/jhash.c b/lib/jhash.c
new file mode 100644
index 0000000..538277b
--- /dev/null
+++ b/lib/jhash.c
@@ -0,0 +1,154 @@
+/* jhash.c: Jenkins hash support.
+ *
+ * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
+ *
+ * http://burtleburtle.net/bob/hash/
+ *
+ * These are the credits from Bob's sources:
+ *
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+ *
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+ * are externally useful functions.  Routines to test the hash are included
+ * if SELF_TEST is defined.  You can use this free for any purpose.  It's in
+ * the public domain.  It has no warranty.
+ *
+ * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
+ *
+ * I've modified Bob's hash to be useful in the Linux kernel, and
+ * any bugs present are my fault.
+ * Jozsef
+ */
+#include <linux/bitops.h>
+#include <linux/module.h>
+#include <linux/unaligned/packed_struct.h>
+#include <linux/jhash.h>
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c)			\
+{						\
+	a -= c;  a ^= rol32(c, 4);  c += b;	\
+	b -= a;  b ^= rol32(a, 6);  a += c;	\
+	c -= b;  c ^= rol32(b, 8);  b += a;	\
+	a -= c;  a ^= rol32(c, 16); c += b;	\
+	b -= a;  b ^= rol32(a, 19); a += c;	\
+	c -= b;  c ^= rol32(b, 4);  b += a;	\
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c)			\
+{						\
+	c ^= b; c -= rol32(b, 14);		\
+	a ^= c; a -= rol32(c, 11);		\
+	b ^= a; b -= rol32(a, 25);		\
+	c ^= b; c -= rol32(b, 16);		\
+	a ^= c; a -= rol32(c, 4);		\
+	b ^= a; b -= rol32(a, 14);		\
+	c ^= b; c -= rol32(b, 24);		\
+}
+
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL		0xdeadbeef
+
+/* jhash - hash an arbitrary key
+ * @k: sequence of bytes as key
+ * @length: the length of the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * The generic version, hashes an arbitrary sequence of bytes.
+ * No alignment or length assumptions are made about the input key.
+ *
+ * Returns the hash value of the key. The result depends on endianness.
+ */
+u32 jhash(const void *key, u32 length, u32 initval)
+{
+	u32 a, b, c;
+	const u8 *k = key;
+
+	/* Set up the internal state */
+	a = b = c = JHASH_INITVAL + length + initval;
+
+	/* All but the last block: affect some 32 bits of (a,b,c) */
+	while (length > 12) {
+		a += __get_unaligned_cpu32(k);
+		b += __get_unaligned_cpu32(k + 4);
+		c += __get_unaligned_cpu32(k + 8);
+		__jhash_mix(a, b, c);
+		length -= 12;
+		k += 12;
+	}
+	/* Last block: affect all 32 bits of (c) */
+	/* All the case statements fall through */
+	switch (length) {
+	case 12: c += (u32)k[11]<<24;
+	case 11: c += (u32)k[10]<<16;
+	case 10: c += (u32)k[9]<<8;
+	case 9:  c += k[8];
+	case 8:  b += (u32)k[7]<<24;
+	case 7:  b += (u32)k[6]<<16;
+	case 6:  b += (u32)k[5]<<8;
+	case 5:  b += k[4];
+	case 4:  a += (u32)k[3]<<24;
+	case 3:  a += (u32)k[2]<<16;
+	case 2:  a += (u32)k[1]<<8;
+	case 1:  a += k[0];
+		 __jhash_final(a, b, c);
+	case 0: /* Nothing left to add */
+		break;
+	}
+
+	return c;
+}
+EXPORT_SYMBOL(jhash);
+
+/* jhash2 - hash an array of u32's
+ * @k: the key which must be an array of u32's
+ * @length: the number of u32's in the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * Returns the hash value of the key.
+ */
+u32 jhash2(const u32 *k, u32 length, u32 initval)
+{
+	u32 a, b, c;
+
+	/* Set up the internal state */
+	a = b = c = JHASH_INITVAL + (length<<2) + initval;
+
+	/* Handle most of the key */
+	while (length > 3) {
+		a += k[0];
+		b += k[1];
+		c += k[2];
+		__jhash_mix(a, b, c);
+		length -= 3;
+		k += 3;
+	}
+
+	/* Handle the last 3 u32's: all the case statements fall through */
+	switch (length) {
+	case 3: c += k[2];
+	case 2: b += k[1];
+	case 1: a += k[0];
+		__jhash_final(a, b, c);
+	case 0:	/* Nothing left to add */
+		break;
+	}
+
+	return c;
+}
+EXPORT_SYMBOL(jhash2);
+
+/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
+u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
+{
+	a += JHASH_INITVAL;
+	b += JHASH_INITVAL;
+	c += initval;
+
+	__jhash_mix(a, b, c);
+
+	return c;
+}
+EXPORT_SYMBOL(jhash_3words);
-- 
1.7.0.4

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] The new jhash implementation
  2010-11-25 14:41       ` Jozsef Kadlecsik
@ 2010-11-25 17:21         ` Eric Dumazet
  2010-11-25 21:14           ` Jozsef Kadlecsik
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2010-11-25 17:21 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell

Le jeudi 25 novembre 2010 à 15:41 +0100, Jozsef Kadlecsik a écrit :

...

> +/* __jhash_mix -- mix 3 32-bit values reversibly. */
> +#define __jhash_mix(a, b, c)			\
> +{						\
> +	a -= c;  a ^= rol32(c, 4);  c += b;	\
> +	b -= a;  b ^= rol32(a, 6);  a += c;	\
> +	c -= b;  c ^= rol32(b, 8);  b += a;	\
> +	a -= c;  a ^= rol32(c, 16); c += b;	\
> +	b -= a;  b ^= rol32(a, 19); a += c;	\
> +	c -= b;  c ^= rol32(b, 4);  b += a;	\
> +}
> +
> +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
> +#define __jhash_final(a, b, c)			\
> +{						\
> +	c ^= b; c -= rol32(b, 14);		\
> +	a ^= c; a -= rol32(c, 11);		\
> +	b ^= a; b -= rol32(a, 25);		\
> +	c ^= b; c -= rol32(b, 16);		\
> +	a ^= c; a -= rol32(c, 4);		\
> +	b ^= a; b -= rol32(a, 14);		\
> +	c ^= b; c -= rol32(b, 24);		\
> +}
> +

So we now have a special __jhash_final(a, b, c) thing for the last
values.



> +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
> +u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
> +{
> +	a += JHASH_INITVAL;
> +	b += JHASH_INITVAL;
> +	c += initval;
> +
> +	__jhash_mix(a, b, c);
> +
> +	return c;
> +}
> +EXPORT_SYMBOL(jhash_3words);


But you dont use it in jhash_3words().


I do think jhash_3words() should stay inlined, unless maybe
CONFIG_CC_OPTIMIZE_FOR_SIZE=y

We hit it several time per packet in network stack in RX path.

Once in skb_get_rxhash() (unless device fills skb->rxhash)
Once at least in conntrack (if used).
Once in UDP or TCP stack




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] The new jhash implementation
  2010-11-25 17:21         ` Eric Dumazet
@ 2010-11-25 21:14           ` Jozsef Kadlecsik
  0 siblings, 0 replies; 18+ messages in thread
From: Jozsef Kadlecsik @ 2010-11-25 21:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell

On Thu, 25 Nov 2010, Eric Dumazet wrote:

> Le jeudi 25 novembre 2010 ? 15:41 +0100, Jozsef Kadlecsik a ?crit :
> 
> > +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
> > +#define __jhash_final(a, b, c)			\
> > +{						\
> > +	c ^= b; c -= rol32(b, 14);		\
> > +	a ^= c; a -= rol32(c, 11);		\
> > +	b ^= a; b -= rol32(a, 25);		\
> > +	c ^= b; c -= rol32(b, 16);		\
> > +	a ^= c; a -= rol32(c, 4);		\
> > +	b ^= a; b -= rol32(a, 14);		\
> > +	c ^= b; c -= rol32(b, 24);		\
> > +}
> > +
> 
> So we now have a special __jhash_final(a, b, c) thing for the last
> values.

Yes, and...
 
> > +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
> > +u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
> > +{
> > +	a += JHASH_INITVAL;
> > +	b += JHASH_INITVAL;
> > +	c += initval;
> > +
> > +	__jhash_mix(a, b, c);
> > +
> > +	return c;
> > +}
> > +EXPORT_SYMBOL(jhash_3words);
>  
> But you dont use it in jhash_3words().

... excellent spotting! That should be __jhash_final in jhash_3words. 

> I do think jhash_3words() should stay inlined, unless maybe
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> 
> We hit it several time per packet in network stack in RX path.
> 
> Once in skb_get_rxhash() (unless device fills skb->rxhash)
> Once at least in conntrack (if used).
> Once in UDP or TCP stack

There follows the new version of the second patch: jhash_3words is moved 
back to inlined form and the bug in it fixed. Thanks for your thorough 
reviewing indeed!

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit
  changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
  than lookup2() depending on the key length.

The patch replaces the lookup2() implementation of the 'jhash*'
functions with that of lookup3().

You can read a longer comparison of the two and other hash functions at
http://burtleburtle.net/bob/hash/doobs.html.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
---
 include/linux/jhash.h |  140 ++++++++-----------------------------------------
 lib/Makefile          |    2 +-
 lib/jhash.c           |  127 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+), 118 deletions(-)
 create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index ced1159..963abad 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -1,131 +1,37 @@
 #ifndef _LINUX_JHASH_H
 #define _LINUX_JHASH_H
 
-/* jhash.h: Jenkins hash support.
- *
- * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
- *
- * http://burtleburtle.net/bob/hash/
- *
- * These are the credits from Bob's sources:
- *
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose.  It has no warranty.
- *
- * Copyright (C) 2003 David S. Miller (davem@redhat.com)
- *
- * I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault.  -DaveM
- */
-
-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
-{ \
-  a -= b; a -= c; a ^= (c>>13); \
-  b -= c; b -= a; b ^= (a<<8); \
-  c -= a; c -= b; c ^= (b>>13); \
-  a -= b; a -= c; a ^= (c>>12);  \
-  b -= c; b -= a; b ^= (a<<16); \
-  c -= a; c -= b; c ^= (b>>5); \
-  a -= b; a -= c; a ^= (c>>3);  \
-  b -= c; b -= a; b ^= (a<<10); \
-  c -= a; c -= b; c ^= (b>>15); \
-}
-
-/* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO	0x9e3779b9
-
-/* The most generic version, hashes an arbitrary sequence
- * of bytes.  No alignment or length assumptions are made about
- * the input key.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
-	u32 a, b, c, len;
-	const u8 *k = key;
-
-	len = length;
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-
-	while (len >= 12) {
-		a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
-		b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
-		c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
-		__jhash_mix(a,b,c);
-
-		k += 12;
-		len -= 12;
-	}
-
-	c += length;
-	switch (len) {
-	case 11: c += ((u32)k[10]<<24);
-	case 10: c += ((u32)k[9]<<16);
-	case 9 : c += ((u32)k[8]<<8);
-	case 8 : b += ((u32)k[7]<<24);
-	case 7 : b += ((u32)k[6]<<16);
-	case 6 : b += ((u32)k[5]<<8);
-	case 5 : b += k[4];
-	case 4 : a += ((u32)k[3]<<24);
-	case 3 : a += ((u32)k[2]<<16);
-	case 2 : a += ((u32)k[1]<<8);
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
-
-	return c;
+/* Best hash sizes are of power of two */
+#define jhash_size(n)   ((u32)1<<(n))
+/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
+#define jhash_mask(n)   (jhash_size(n)-1)
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c)			\
+{						\
+	c ^= b; c -= rol32(b, 14);		\
+	a ^= c; a -= rol32(c, 11);		\
+	b ^= a; b -= rol32(a, 25);		\
+	c ^= b; c -= rol32(b, 16);		\
+	a ^= c; a -= rol32(c, 4);		\
+	b ^= a; b -= rol32(a, 14);		\
+	c ^= b; c -= rol32(b, 24);		\
 }
 
-/* A special optimized version that handles 1 or more of u32s.
- * The length parameter here is the number of u32s in the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
-	u32 a, b, c, len;
-
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-	len = length;
-
-	while (len >= 3) {
-		a += k[0];
-		b += k[1];
-		c += k[2];
-		__jhash_mix(a, b, c);
-		k += 3; len -= 3;
-	}
-
-	c += length * 4;
-
-	switch (len) {
-	case 2 : b += k[1];
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
-
-	return c;
-}
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL		0xdeadbeef
 
+extern u32 jhash(const void *key, u32 length, u32 initval);
+extern u32 jhash2(const u32 *k, u32 length, u32 initval);
 
-/* A special ultra-optimized versions that knows they are hashing exactly
- * 3, 2 or 1 word(s).
- *
- * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally
- *       done at the end is not done here.
- */
+/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
 static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
 {
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
+	a += JHASH_INITVAL;
+	b += JHASH_INITVAL;
 	c += initval;
 
-	__jhash_mix(a, b, c);
+	__jhash_final(a, b, c);
 
 	return c;
 }
diff --git a/lib/Makefile b/lib/Makefile
index e6a3763..a1a4932 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -10,7 +10,7 @@ endif
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o dump_stack.o \
 	 idr.o int_sqrt.o extable.o prio_tree.o \
-	 sha1.o irq_regs.o reciprocal_div.o argv_split.o \
+	 jhash.o sha1.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o prio_heap.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o flex_array.o
 
diff --git a/lib/jhash.c b/lib/jhash.c
new file mode 100644
index 0000000..e0c8d11
--- /dev/null
+++ b/lib/jhash.c
@@ -0,0 +1,127 @@
+/* jhash.c: Jenkins hash support.
+ *
+ * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
+ *
+ * http://burtleburtle.net/bob/hash/
+ *
+ * These are the credits from Bob's sources:
+ *
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+ *
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+ * are externally useful functions.  Routines to test the hash are included
+ * if SELF_TEST is defined.  You can use this free for any purpose.  It's in
+ * the public domain.  It has no warranty.
+ *
+ * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
+ *
+ * I've modified Bob's hash to be useful in the Linux kernel, and
+ * any bugs present are my fault.
+ * Jozsef
+ */
+#include <linux/bitops.h>
+#include <linux/module.h>
+#include <linux/unaligned/packed_struct.h>
+#include <linux/jhash.h>
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c)			\
+{						\
+	a -= c;  a ^= rol32(c, 4);  c += b;	\
+	b -= a;  b ^= rol32(a, 6);  a += c;	\
+	c -= b;  c ^= rol32(b, 8);  b += a;	\
+	a -= c;  a ^= rol32(c, 16); c += b;	\
+	b -= a;  b ^= rol32(a, 19); a += c;	\
+	c -= b;  c ^= rol32(b, 4);  b += a;	\
+}
+
+/* jhash - hash an arbitrary key
+ * @k: sequence of bytes as key
+ * @length: the length of the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * The generic version, hashes an arbitrary sequence of bytes.
+ * No alignment or length assumptions are made about the input key.
+ *
+ * Returns the hash value of the key. The result depends on endianness.
+ */
+u32 jhash(const void *key, u32 length, u32 initval)
+{
+	u32 a, b, c;
+	const u8 *k = key;
+
+	/* Set up the internal state */
+	a = b = c = JHASH_INITVAL + length + initval;
+
+	/* All but the last block: affect some 32 bits of (a,b,c) */
+	while (length > 12) {
+		a += __get_unaligned_cpu32(k);
+		b += __get_unaligned_cpu32(k + 4);
+		c += __get_unaligned_cpu32(k + 8);
+		__jhash_mix(a, b, c);
+		length -= 12;
+		k += 12;
+	}
+	/* Last block: affect all 32 bits of (c) */
+	/* All the case statements fall through */
+	switch (length) {
+	case 12: c += (u32)k[11]<<24;
+	case 11: c += (u32)k[10]<<16;
+	case 10: c += (u32)k[9]<<8;
+	case 9:  c += k[8];
+	case 8:  b += (u32)k[7]<<24;
+	case 7:  b += (u32)k[6]<<16;
+	case 6:  b += (u32)k[5]<<8;
+	case 5:  b += k[4];
+	case 4:  a += (u32)k[3]<<24;
+	case 3:  a += (u32)k[2]<<16;
+	case 2:  a += (u32)k[1]<<8;
+	case 1:  a += k[0];
+		 __jhash_final(a, b, c);
+	case 0: /* Nothing left to add */
+		break;
+	}
+
+	return c;
+}
+EXPORT_SYMBOL(jhash);
+
+/* jhash2 - hash an array of u32's
+ * @k: the key which must be an array of u32's
+ * @length: the number of u32's in the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * Returns the hash value of the key.
+ */
+u32 jhash2(const u32 *k, u32 length, u32 initval)
+{
+	u32 a, b, c;
+
+	/* Set up the internal state */
+	a = b = c = JHASH_INITVAL + (length<<2) + initval;
+
+	/* Handle most of the key */
+	while (length > 3) {
+		a += k[0];
+		b += k[1];
+		c += k[2];
+		__jhash_mix(a, b, c);
+		length -= 3;
+		k += 3;
+	}
+
+	/* Handle the last 3 u32's: all the case statements fall through */
+	switch (length) {
+	case 3: c += k[2];
+	case 2: b += k[1];
+	case 1: a += k[0];
+		__jhash_final(a, b, c);
+	case 0:	/* Nothing left to add */
+		break;
+	}
+
+	return c;
+}
+EXPORT_SYMBOL(jhash2);
+
-- 
1.7.0.4

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] The new jhash implementation
  2010-11-25 13:15   ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik
  2010-11-25 13:49     ` Eric Dumazet
@ 2010-11-27  3:31     ` Rusty Russell
  2010-12-03 12:38       ` [PATCH 0/2] New jhash function Jozsef Kadlecsik
  1 sibling, 1 reply; 18+ messages in thread
From: Rusty Russell @ 2010-11-27  3:31 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds

On Thu, 25 Nov 2010 11:45:08 pm Jozsef Kadlecsik wrote:
> The current jhash.h implements the lookup2() hash function by Bob Jenkins.
> However, lookup2() is outdated as Bob wrote a new hash function called
> lookup3(). The new hash function

Oops, we discussed this ages ago, looks like it fell through the cracks.

Thanks Jozsef!

> +extern u32 jhash(const void *key, u32 length, u32 initval);
> +extern u32 jhash2(const u32 *k, u32 length, u32 initval);
> +extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval);

This last one can be marked __attribute__((const)) for a little
extra compiler help, too.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/2] New jhash function
  2010-11-27  3:31     ` Rusty Russell
@ 2010-12-03 12:38       ` Jozsef Kadlecsik
  2010-12-03 12:39         ` [PATCH 1/2] Remove calls to jhash internals Jozsef Kadlecsik
  2010-12-08 17:09         ` [PATCH 0/2] New jhash function David Miller
  0 siblings, 2 replies; 18+ messages in thread
From: Jozsef Kadlecsik @ 2010-12-03 12:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik

Hi,

The current jhash.h implements the lookup2() hash function by Bob Jenkins. 
However, lookup2() is outdated as Bob wrote a new hash function called 
lookup3(). There is a longer comparison of those two and other hash
functions at http://burtleburtle.net/bob/hash/doobs.html.

Please consider applying the following patches.

Speed

I wrote a small benchmark program to compare jhash2 and jhash3 (you can
download it from http://www.kfki.hu/~kadlec/sw/netfilter/jhash23.tgz).
On a machine with Core2 Duo and compiled with -O2, the ratio of the execution
time for the byte variants of the hash functions were (in parens the different
key sizes):

jhash2/jhash3 (4 bytes): 1.587518
jhash2/jhash3 (8 bytes): 1.282824
jhash2/jhash3 (12 bytes): 2.349628
jhash2/jhash3 (16 bytes): 1.466988
jhash2/jhash3 (32 bytes): 1.501063
jhash2/jhash3 (64 bytes): 1.587527
jhash2/jhash3 (128 bytes): 1.628037
jhash2/jhash3 (256 bytes): 1.648318

Similarly, for the word variants

jhash2/jhash3 (1 words): 1.300904
jhash2/jhash3 (2 words): 1.316108
jhash2/jhash3 (3 words): 2.249708
jhash2/jhash3 (4 words): 1.460192
jhash2/jhash3 (8 words): 1.501438
jhash2/jhash3 (16 words): 1.558039
jhash2/jhash3 (32 words): 1.520082
jhash2/jhash3 (64 words): 1.528347

Sizes

When compiling just the byte variants the sizes are

   text    data     bss     dec     hex filename
    661       0       0     661     295 jhashb2.o
    441       0       0     441     1b9 jhashb3.o

and for the word variants

   text    data     bss     dec     hex filename
    354       0       0     354     162 jhashw2.o
    248       0       0     248      f8 jhashw3.o

I compiled the kernel with "allyesconfig", in three variants: jhash2, jhash3 and
jhash3 un-inlined

   text    data     bss     dec     hex filename
69297477        11282083        35904032        116483592       6f16608 vmlinux.jhash2
69293829        11282083        35903728        116479640       6f15698 vmlinux.jhash3
69288578        11282083        35902336        116472997       6f13ca5 vmlinux.jhash3-uninlined

With jhash3 we can gain 3.6k and un-inlining shrinks the code with an additional
5.2k. In the patch I left jhash(3) inlined.

Uniformity

According to Bob Jenkins, lookup3() mixes better than lookup2(): it passes
the check that every input bit changes every output bit 50% of the time, while
lookup2() failed it. In order to verify it I added jhash3 to the cttest program,
which tests hash functions with (artifical, worst case) netfilter conntrack data
and calculates the statistics (chi-square, probability of longest bucket, etc).
You can find the program and the results here:
http://www.kfki.hu/~kadlec/sw/netfilter/ct3/ - to sum up, lookup3() produces
uniform key values and no weakness could be spotted.

Many thanks to Eric Dumazet for his thorough reviewing.

Dave applied the first patch to net-next-2.6.

Jozsef Kadlecsik (2):
  Remove calls to jhash internals
  The new jhash implementation

 include/linux/jhash.h            |  183 ++++++++++++++++++++++----------------
 net/ipv6/inet6_connection_sock.c |   18 ++--
 net/ipv6/reassembly.c            |   36 ++++----
 3 files changed, 129 insertions(+), 108 deletions(-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] Remove calls to jhash internals
  2010-12-03 12:38       ` [PATCH 0/2] New jhash function Jozsef Kadlecsik
@ 2010-12-03 12:39         ` Jozsef Kadlecsik
  2010-12-03 12:39           ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik
  2010-12-08 17:09         ` [PATCH 0/2] New jhash function David Miller
  1 sibling, 1 reply; 18+ messages in thread
From: Jozsef Kadlecsik @ 2010-12-03 12:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik

Because the jhash implementation changes, replace the calls to the jhash
internal macros with calls to the jhash functions.
---
 net/ipv6/inet6_connection_sock.c |   18 ++++++++----------
 net/ipv6/reassembly.c            |   36 ++++++++++++++++--------------------
 2 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 8a16280..861d252 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -60,18 +60,16 @@ EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict);
 static u32 inet6_synq_hash(const struct in6_addr *raddr, const __be16 rport,
 			   const u32 rnd, const u16 synq_hsize)
 {
-	u32 a = (__force u32)raddr->s6_addr32[0];
-	u32 b = (__force u32)raddr->s6_addr32[1];
-	u32 c = (__force u32)raddr->s6_addr32[2];
-
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
-	c += rnd;
-	__jhash_mix(a, b, c);
-
-	a += (__force u32)raddr->s6_addr32[3];
-	b += (__force u32)rport;
-	__jhash_mix(a, b, c);
+	u32 c;
+
+	c = jhash_3words((__force u32)raddr->s6_addr32[0],
+			 (__force u32)raddr->s6_addr32[1],
+			 (__force u32)raddr->s6_addr32[2],
+			 rnd);
+
+	c = jhash_2words((__force u32)raddr->s6_addr32[3],
+			 (__force u32)rport,
+			 c);
 
 	return c & (synq_hsize - 1);
 }
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index c7ba314..5e57490 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -104,26 +104,22 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
 			     const struct in6_addr *daddr, u32 rnd)
 {
-	u32 a, b, c;
-
-	a = (__force u32)saddr->s6_addr32[0];
-	b = (__force u32)saddr->s6_addr32[1];
-	c = (__force u32)saddr->s6_addr32[2];
-
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
-	c += rnd;
-	__jhash_mix(a, b, c);
-
-	a += (__force u32)saddr->s6_addr32[3];
-	b += (__force u32)daddr->s6_addr32[0];
-	c += (__force u32)daddr->s6_addr32[1];
-	__jhash_mix(a, b, c);
-
-	a += (__force u32)daddr->s6_addr32[2];
-	b += (__force u32)daddr->s6_addr32[3];
-	c += (__force u32)id;
-	__jhash_mix(a, b, c);
+	u32 c;
+
+	c = jhash_3words((__force u32)saddr->s6_addr32[0],
+			 (__force u32)saddr->s6_addr32[1],
+			 (__force u32)saddr->s6_addr32[2],
+			 rnd);
+
+	c = jhash_3words((__force u32)saddr->s6_addr32[3],
+			 (__force u32)daddr->s6_addr32[0],
+			 (__force u32)daddr->s6_addr32[1],
+			 c);
+
+	c =  jhash_3words((__force u32)daddr->s6_addr32[2],
+			  (__force u32)daddr->s6_addr32[3],
+			  (__force u32)id,
+			  c);
 
 	return c & (INETFRAGS_HASHSZ - 1);
 }
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] The new jhash implementation
  2010-12-03 12:39         ` [PATCH 1/2] Remove calls to jhash internals Jozsef Kadlecsik
@ 2010-12-03 12:39           ` Jozsef Kadlecsik
  0 siblings, 0 replies; 18+ messages in thread
From: Jozsef Kadlecsik @ 2010-12-03 12:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). The patch replaces the lookup2() implementation of the 'jhash*'
functions with that of lookup3().

You can read a longer comparison of the two and other hash functions at
http://burtleburtle.net/bob/hash/doobs.html.
---
 include/linux/jhash.h |  183 ++++++++++++++++++++++++++++---------------------
 1 files changed, 105 insertions(+), 78 deletions(-)

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index ced1159..47cb09e 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -3,129 +3,156 @@
 
 /* jhash.h: Jenkins hash support.
  *
- * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
+ * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
  *
  * http://burtleburtle.net/bob/hash/
  *
  * These are the credits from Bob's sources:
  *
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose.  It has no warranty.
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
  *
- * Copyright (C) 2003 David S. Miller (davem@redhat.com)
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+ * are externally useful functions.  Routines to test the hash are included
+ * if SELF_TEST is defined.  You can use this free for any purpose.  It's in
+ * the public domain.  It has no warranty.
+ *
+ * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
  *
  * I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault.  -DaveM
+ * any bugs present are my fault.
+ * Jozsef
  */
+#include <linux/bitops.h>
+#include <linux/unaligned/packed_struct.h>
+
+/* Best hash sizes are of power of two */
+#define jhash_size(n)   ((u32)1<<(n))
+/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
+#define jhash_mask(n)   (jhash_size(n)-1)
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c)			\
+{						\
+	a -= c;  a ^= rol32(c, 4);  c += b;	\
+	b -= a;  b ^= rol32(a, 6);  a += c;	\
+	c -= b;  c ^= rol32(b, 8);  b += a;	\
+	a -= c;  a ^= rol32(c, 16); c += b;	\
+	b -= a;  b ^= rol32(a, 19); a += c;	\
+	c -= b;  c ^= rol32(b, 4);  b += a;	\
+}
 
-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
-{ \
-  a -= b; a -= c; a ^= (c>>13); \
-  b -= c; b -= a; b ^= (a<<8); \
-  c -= a; c -= b; c ^= (b>>13); \
-  a -= b; a -= c; a ^= (c>>12);  \
-  b -= c; b -= a; b ^= (a<<16); \
-  c -= a; c -= b; c ^= (b>>5); \
-  a -= b; a -= c; a ^= (c>>3);  \
-  b -= c; b -= a; b ^= (a<<10); \
-  c -= a; c -= b; c ^= (b>>15); \
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c)			\
+{						\
+	c ^= b; c -= rol32(b, 14);		\
+	a ^= c; a -= rol32(c, 11);		\
+	b ^= a; b -= rol32(a, 25);		\
+	c ^= b; c -= rol32(b, 16);		\
+	a ^= c; a -= rol32(c, 4);		\
+	b ^= a; b -= rol32(a, 14);		\
+	c ^= b; c -= rol32(b, 24);		\
 }
 
-/* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO	0x9e3779b9
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL		0xdeadbeef
 
-/* The most generic version, hashes an arbitrary sequence
- * of bytes.  No alignment or length assumptions are made about
- * the input key.
+/* jhash - hash an arbitrary key
+ * @k: sequence of bytes as key
+ * @length: the length of the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * The generic version, hashes an arbitrary sequence of bytes.
+ * No alignment or length assumptions are made about the input key.
+ *
+ * Returns the hash value of the key. The result depends on endianness.
  */
 static inline u32 jhash(const void *key, u32 length, u32 initval)
 {
-	u32 a, b, c, len;
+	u32 a, b, c;
 	const u8 *k = key;
 
-	len = length;
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-
-	while (len >= 12) {
-		a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
-		b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
-		c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
-		__jhash_mix(a,b,c);
+	/* Set up the internal state */
+	a = b = c = JHASH_INITVAL + length + initval;
 
+	/* All but the last block: affect some 32 bits of (a,b,c) */
+	while (length > 12) {
+		a += __get_unaligned_cpu32(k);
+		b += __get_unaligned_cpu32(k + 4);
+		c += __get_unaligned_cpu32(k + 8);
+		__jhash_mix(a, b, c);
+		length -= 12;
 		k += 12;
-		len -= 12;
 	}
-
-	c += length;
-	switch (len) {
-	case 11: c += ((u32)k[10]<<24);
-	case 10: c += ((u32)k[9]<<16);
-	case 9 : c += ((u32)k[8]<<8);
-	case 8 : b += ((u32)k[7]<<24);
-	case 7 : b += ((u32)k[6]<<16);
-	case 6 : b += ((u32)k[5]<<8);
-	case 5 : b += k[4];
-	case 4 : a += ((u32)k[3]<<24);
-	case 3 : a += ((u32)k[2]<<16);
-	case 2 : a += ((u32)k[1]<<8);
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
+	/* Last block: affect all 32 bits of (c) */
+	/* All the case statements fall through */
+	switch (length) {
+	case 12: c += (u32)k[11]<<24;
+	case 11: c += (u32)k[10]<<16;
+	case 10: c += (u32)k[9]<<8;
+	case 9:  c += k[8];
+	case 8:  b += (u32)k[7]<<24;
+	case 7:  b += (u32)k[6]<<16;
+	case 6:  b += (u32)k[5]<<8;
+	case 5:  b += k[4];
+	case 4:  a += (u32)k[3]<<24;
+	case 3:  a += (u32)k[2]<<16;
+	case 2:  a += (u32)k[1]<<8;
+	case 1:  a += k[0];
+		 __jhash_final(a, b, c);
+	case 0: /* Nothing left to add */
+		break;
+	}
 
 	return c;
 }
 
-/* A special optimized version that handles 1 or more of u32s.
- * The length parameter here is the number of u32s in the key.
+/* jhash2 - hash an array of u32's
+ * @k: the key which must be an array of u32's
+ * @length: the number of u32's in the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * Returns the hash value of the key.
  */
 static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
 {
-	u32 a, b, c, len;
+	u32 a, b, c;
 
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-	len = length;
+	/* Set up the internal state */
+	a = b = c = JHASH_INITVAL + (length<<2) + initval;
 
-	while (len >= 3) {
+	/* Handle most of the key */
+	while (length > 3) {
 		a += k[0];
 		b += k[1];
 		c += k[2];
 		__jhash_mix(a, b, c);
-		k += 3; len -= 3;
+		length -= 3;
+		k += 3;
 	}
 
-	c += length * 4;
-
-	switch (len) {
-	case 2 : b += k[1];
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
+	/* Handle the last 3 u32's: all the case statements fall through */
+	switch (length) {
+	case 3: c += k[2];
+	case 2: b += k[1];
+	case 1: a += k[0];
+		__jhash_final(a, b, c);
+	case 0:	/* Nothing left to add */
+		break;
+	}
 
 	return c;
 }
 
 
-/* A special ultra-optimized versions that knows they are hashing exactly
- * 3, 2 or 1 word(s).
- *
- * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally
- *       done at the end is not done here.
- */
+/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
 static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
 {
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
+	a += JHASH_INITVAL;
+	b += JHASH_INITVAL;
 	c += initval;
 
-	__jhash_mix(a, b, c);
+	__jhash_final(a, b, c);
 
 	return c;
 }
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] New jhash function
  2010-12-03 12:38       ` [PATCH 0/2] New jhash function Jozsef Kadlecsik
  2010-12-03 12:39         ` [PATCH 1/2] Remove calls to jhash internals Jozsef Kadlecsik
@ 2010-12-08 17:09         ` David Miller
  2010-12-08 21:23           ` Rusty Russell
  1 sibling, 1 reply; 18+ messages in thread
From: David Miller @ 2010-12-08 17:09 UTC (permalink / raw)
  To: kadlec; +Cc: linux-kernel, netdev, netfilter-devel, torvalds, rusty

From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Date: Fri,  3 Dec 2010 13:38:59 +0100

> The current jhash.h implements the lookup2() hash function by Bob Jenkins. 
> However, lookup2() is outdated as Bob wrote a new hash function called 
> lookup3(). There is a longer comparison of those two and other hash
> functions at http://burtleburtle.net/bob/hash/doobs.html.
> 
> Please consider applying the following patches.

Patch #1 is already in the net-next-2.6 tree, and as long as there are
no major objections to the general crowd (including Rusty et al.) I am
happy to put patch #2 into my tree as well.

Rusty, does the current version of patch #2 look good to you?

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] New jhash function
  2010-12-08 17:09         ` [PATCH 0/2] New jhash function David Miller
@ 2010-12-08 21:23           ` Rusty Russell
  2010-12-10  4:19             ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Rusty Russell @ 2010-12-08 21:23 UTC (permalink / raw)
  To: David Miller; +Cc: kadlec, linux-kernel, netdev, netfilter-devel, torvalds

On Thu, 9 Dec 2010 03:39:54 am David Miller wrote:
> From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Date: Fri,  3 Dec 2010 13:38:59 +0100
> 
> > The current jhash.h implements the lookup2() hash function by Bob Jenkins. 
> > However, lookup2() is outdated as Bob wrote a new hash function called 
> > lookup3(). There is a longer comparison of those two and other hash
> > functions at http://burtleburtle.net/bob/hash/doobs.html.
> > 
> > Please consider applying the following patches.
> 
> Patch #1 is already in the net-next-2.6 tree, and as long as there are
> no major objections to the general crowd (including Rusty et al.) I am
> happy to put patch #2 into my tree as well.
> 
> Rusty, does the current version of patch #2 look good to you?

Yes, 2/2 good.  Thanks Jozsef!

Acked-by: Rusty Russell <rusty@rustcorp.com.au>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] New jhash function
  2010-12-08 21:23           ` Rusty Russell
@ 2010-12-10  4:19             ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2010-12-10  4:19 UTC (permalink / raw)
  To: rusty; +Cc: kadlec, linux-kernel, netdev, netfilter-devel, torvalds

From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 9 Dec 2010 07:53:45 +1030

> On Thu, 9 Dec 2010 03:39:54 am David Miller wrote:
>> From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
>> Date: Fri,  3 Dec 2010 13:38:59 +0100
>> 
>> > The current jhash.h implements the lookup2() hash function by Bob Jenkins. 
>> > However, lookup2() is outdated as Bob wrote a new hash function called 
>> > lookup3(). There is a longer comparison of those two and other hash
>> > functions at http://burtleburtle.net/bob/hash/doobs.html.
>> > 
>> > Please consider applying the following patches.
>> 
>> Patch #1 is already in the net-next-2.6 tree, and as long as there are
>> no major objections to the general crowd (including Rusty et al.) I am
>> happy to put patch #2 into my tree as well.
>> 
>> Rusty, does the current version of patch #2 look good to you?
> 
> Yes, 2/2 good.  Thanks Jozsef!
> 
> Acked-by: Rusty Russell <rusty@rustcorp.com.au>

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-12-10  4:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-25 13:15 [PATCH 0/2] New jhash function Jozsef Kadlecsik
2010-11-25 13:15 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jozsef Kadlecsik
2010-11-25 13:15   ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik
2010-11-25 13:49     ` Eric Dumazet
2010-11-25 13:55       ` Changli Gao
2010-11-25 14:05         ` Eric Dumazet
2010-11-25 14:41       ` Jozsef Kadlecsik
2010-11-25 17:21         ` Eric Dumazet
2010-11-25 21:14           ` Jozsef Kadlecsik
2010-11-27  3:31     ` Rusty Russell
2010-12-03 12:38       ` [PATCH 0/2] New jhash function Jozsef Kadlecsik
2010-12-03 12:39         ` [PATCH 1/2] Remove calls to jhash internals Jozsef Kadlecsik
2010-12-03 12:39           ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik
2010-12-08 17:09         ` [PATCH 0/2] New jhash function David Miller
2010-12-08 21:23           ` Rusty Russell
2010-12-10  4:19             ` David Miller
2010-11-25 13:39   ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jan Engelhardt
2010-11-25 13:57     ` Jozsef Kadlecsik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).