linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: George Spelvin <linux@sciencehorizons.net>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>
Cc: "J . Bruce Fields" <bfields@redhat.com>,
	George Spelvin <linux@sciencehorizons.net>,
	Antti Palosaari <crope@iki.fi>,
	Mauro Carvalho Chehab <m.chehab@samsung.com>
Subject: [PATCH v3 05/10] Eliminate bad hash multipliers from hash_32() and  hash_64()
Date: Sat, 28 May 2016 15:57:18 -0400	[thread overview]
Message-ID: <1464465443-25305-6-git-send-email-linux@sciencehorizons.net> (raw)
In-Reply-To: <1464465443-25305-1-git-send-email-linux@sciencehorizons.net>

The "simplified" prime multipliers made very bad hash functions, so get rid
of them.  This completes the work of 689de1d6ca.

To avoid the inefficiency which was the motivation for the "simplified"
multipliers, hash_64() on 32-bit systems is changed to use a different
algorithm.  It makes two calls to hash_32() instead.

drivers/media/usb/dvb-usb-v2/af9015.c uses the old GOLDEN_RATIO_PRIME_32
for some horrible reason, so it inherits a copy of the old definition.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
---
 drivers/media/usb/dvb-usb-v2/af9015.c |  2 +
 include/linux/hash.h                  | 87 ++++++++++++++---------------------
 2 files changed, 36 insertions(+), 53 deletions(-)

diff --git a/drivers/media/usb/dvb-usb-v2/af9015.c b/drivers/media/usb/dvb-usb-v2/af9015.c
index 95a7388e..09e0f58f 100644
--- a/drivers/media/usb/dvb-usb-v2/af9015.c
+++ b/drivers/media/usb/dvb-usb-v2/af9015.c
@@ -398,6 +398,8 @@ error:
 }
 
 #define AF9015_EEPROM_SIZE 256
+/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
+#define GOLDEN_RATIO_PRIME_32 0x9e370001UL
 
 /* hash (and dump) eeprom */
 static int af9015_eeprom_hash(struct dvb_usb_device *d)
diff --git a/include/linux/hash.h b/include/linux/hash.h
index f967dedb..613cfde3 100644
--- a/include/linux/hash.h
+++ b/include/linux/hash.h
@@ -3,85 +3,65 @@
 /* Fast hashing routine for ints,  longs and pointers.
    (C) 2002 Nadia Yvette Chambers, IBM */
 
-/*
- * Knuth recommends primes in approximately golden ratio to the maximum
- * integer representable by a machine word for multiplicative hashing.
- * Chuck Lever verified the effectiveness of this technique:
- * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
- *
- * These primes are chosen to be bit-sparse, that is operations on
- * them can use shifts and additions instead of multiplications for
- * machines where multiplications are slow.
- */
-
 #include <asm/types.h>
 #include <linux/compiler.h>
 
-/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
-#define GOLDEN_RATIO_PRIME_32 0x9e370001UL
-/*  2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
-#define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL
-
+/*
+ * The "GOLDEN_RATIO_PRIME" is used in ifs/btrfs/brtfs_inode.h and
+ * fs/inode.c.  It's not actually prime any more (the previous primes
+ * were actively bad for hashing), but the name remains.
+ */
 #if BITS_PER_LONG == 32
-#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_32
+#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_32
 #define hash_long(val, bits) hash_32(val, bits)
 #elif BITS_PER_LONG == 64
 #define hash_long(val, bits) hash_64(val, bits)
-#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_64
+#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_64
 #else
 #error Wordsize not 32 or 64
 #endif
 
 /*
- * The above primes are actively bad for hashing, since they are
- * too sparse. The 32-bit one is mostly ok, the 64-bit one causes
- * real problems. Besides, the "prime" part is pointless for the
- * multiplicative hash.
+ * This hash multiplies the input by a large odd number and takes the
+ * high bits.  Since multiplication propagates changes to the most
+ * significant end only, it is essential that the high bits of the
+ * product be used for the hash value.
+ *
+ * Chuck Lever verified the effectiveness of this technique:
+ * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
  *
  * Although a random odd number will do, it turns out that the golden
  * ratio phi = (sqrt(5)-1)/2, or its negative, has particularly nice
- * properties.
+ * properties.  (See Knuth vol 3, section 6.4, exercise 9.)
  *
- * These are the negative, (1 - phi) = (phi^2) = (3 - sqrt(5))/2.
- * (See Knuth vol 3, section 6.4, exercise 9.)
+ * These are the negative, (1 - phi) = phi**2 = (3 - sqrt(5))/2,
+ * which is very slightly easier to multiply by and makes no
+ * difference to the hash distribution.
  */
 #define GOLDEN_RATIO_32 0x61C88647
 #define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
-static __always_inline u32 hash_64(u64 val, unsigned int bits)
+
+static inline u32 __hash_32(u32 val)
 {
-	u64 hash = val;
-
-#if BITS_PER_LONG == 64
-	hash = hash * GOLDEN_RATIO_64;
-#else
-	/*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
-	u64 n = hash;
-	n <<= 18;
-	hash -= n;
-	n <<= 33;
-	hash -= n;
-	n <<= 3;
-	hash += n;
-	n <<= 3;
-	hash -= n;
-	n <<= 4;
-	hash += n;
-	n <<= 2;
-	hash += n;
-#endif
-
-	/* High bits are more random, so use them. */
-	return (u32)(hash >> (64 - bits));
+	return val * GOLDEN_RATIO_32;
 }
 
 static inline u32 hash_32(u32 val, unsigned int bits)
 {
-	/* On some cpus multiply is faster, on others gcc will do shifts */
-	u32 hash = val * GOLDEN_RATIO_PRIME_32;
-
 	/* High bits are more random, so use them. */
-	return hash >> (32 - bits);
+	return __hash_32(val) >> (32 - bits);
+}
+
+static __always_inline u32 hash_64(u64 val, unsigned int bits)
+{
+#if BITS_PER_LONG == 64
+	/* 64x64-bit multiply is efficient on all 64-bit processors */
+	return val * GOLDEN_RATIO_64 >> (64 - bits);
+#else
+	/* Hash 64 bits using only 32x32-bit multiply. */
+	return hash_32((u32)val ^ __hash_32(val >> 32), bits);
+#endif
 }
 
 static inline u32 hash_ptr(const void *ptr, unsigned int bits)
@@ -89,6 +69,7 @@ static inline u32 hash_ptr(const void *ptr, unsigned int bits)
 	return hash_long((unsigned long)ptr, bits);
 }
 
+/* This really should be called fold32_ptr; it does no hashing to speak of. */
 static inline u32 hash32_ptr(const void *ptr)
 {
 	unsigned long val = (unsigned long)ptr;
-- 
2.8.1

  parent reply	other threads:[~2016-05-28 19:58 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
2016-05-25  7:20 ` [PATCH 00/10] String hash improvements George Spelvin
2016-05-25  8:00   ` Geert Uytterhoeven
2016-05-25  8:11     ` George Spelvin
2016-05-25  8:50       ` Geert Uytterhoeven
2016-05-25  9:07         ` George Spelvin
2016-05-25 16:08   ` Linus Torvalds
2016-05-28 19:57     ` [PATCH v3 " George Spelvin
2016-05-28 19:57       ` [PATCH v3 01/10] Pull out string hash to <linux/stringhash.h> George Spelvin
2016-05-28 19:57       ` [PATCH v3 02/10] fs/namei.c: Add hashlen_string() function George Spelvin
2016-05-28 19:57       ` [PATCH v3 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() George Spelvin
2016-05-28 19:57       ` [PATCH v3 04/10] Change hash_64() return value to 32 bits George Spelvin
2016-05-28 19:57       ` George Spelvin [this message]
2016-05-28 19:57       ` [PATCH v3 06/10] fs/namei.c: Improve dcache hash function George Spelvin
2016-05-30 15:11         ` Peter Zijlstra
2016-05-30 16:06           ` George Spelvin
2016-05-30 16:27             ` Peter Zijlstra
2016-05-30 18:10               ` George Spelvin
2016-06-02  1:18                 ` Linus Torvalds
2016-06-02  2:31                   ` George Spelvin
2016-06-02 16:35                     ` Linus Torvalds
2016-06-02 18:23                       ` George Spelvin
2016-05-28 19:57       ` [PATCH v3 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
2016-05-29  7:57         ` Geert Uytterhoeven
2016-05-28 19:57       ` [PATCH v3 08/10] m68k: Add <asm/hash.h> George Spelvin
2016-05-28 19:57       ` [PATCH v3 09/10] microblaze: " George Spelvin
2016-05-28 19:57       ` [PATCH v3 10/10] h8300: " George Spelvin
2016-05-28 20:47       ` [PATCH v3 00/10] String hash improvements Linus Torvalds
2016-05-28 20:54         ` George Spelvin
2016-06-02 22:59     ` [PATCH " Fubo Chen
2016-05-26 17:09   ` [PATCH v2 " George Spelvin
2016-05-25  7:21 ` [PATCH 01/10] Pull out string hash to <linux/stringhash.h> George Spelvin
2016-05-25  7:22 ` [PATCH 02/10] fs/namei.c: Add hash_string() function George Spelvin
2016-05-25  7:26 ` [PATCH 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hash_string() George Spelvin
2016-05-25  7:28 ` [PATCH 04/10] Change hash_64() return value to 32 bits George Spelvin
2016-05-25  7:29 ` [PATCH 05/10] Eliminate bad hash multipliers from hash_32() and hash_64() George Spelvin
2016-05-25  7:31 ` [PATCH 06/10] fs/namei.c: Improve dcache hash function George Spelvin
2016-05-25  7:33 ` [PATCH 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
2016-05-26 17:16   ` [PATCH v2 " George Spelvin
2016-05-25  7:34 ` [PATCH 08/10] m68k: Add <asm/archhash.h> George Spelvin
2016-05-25  7:34 ` George Spelvin
2016-05-25  8:07   ` Geert Uytterhoeven
2016-05-25  8:19     ` George Spelvin
2016-05-25  8:24     ` [PATCH 08v2/10] " George Spelvin
2016-05-25  8:48       ` Geert Uytterhoeven
2016-05-25  8:56   ` [PATCH 08/10] " Philippe De Muyter
2016-05-25  9:14     ` George Spelvin
2016-05-25  9:31       ` Andreas Schwab
2016-05-25  9:51       ` Philippe De Muyter
2016-05-25 13:24   ` Philippe De Muyter
2016-05-25 13:42     ` George Spelvin
2016-05-26 17:19   ` [PATCH v2 08/10] m68k: Add <asm/hash.h> George Spelvin
2016-05-25  7:37 ` [PATCH 09/10] microblaze: Add <asm/archhash.h> George Spelvin
2016-05-26 17:21   ` [PATCH v2 09/10] microblaze: Add <asm/hash.h> George Spelvin
2016-05-25  7:38 ` [PATCH 10/10] h8300: Add <asm/archhash.h> George Spelvin
2016-05-26 17:23   ` [PATCH v2 10/10] h8300: Add <asm/hash.h> George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1464465443-25305-6-git-send-email-linux@sciencehorizons.net \
    --to=linux@sciencehorizons.net \
    --cc=bfields@redhat.com \
    --cc=crope@iki.fi \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.chehab@samsung.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).