linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: George Spelvin <linux@horizon.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Bruce Fields <bfields@fieldses.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 2/2] <linux/hash.h>: Fix hash_64()'s horrible collision problem
Date: Mon, 2 May 2016 13:08:03 -0700	[thread overview]
Message-ID: <CA+55aFyPMrAWqMaA013ZQA0TULtdTXbUh4rgUurMLu40TastuA@mail.gmail.com> (raw)
In-Reply-To: <20160502102248.18755.qmail@ns.horizon.com>

[-- Attachment #1: Type: text/plain, Size: 1166 bytes --]

On Mon, May 2, 2016 at 3:22 AM, George Spelvin <linux@horizon.com> wrote:
> hash_64() was using a low-bit-weight multiplier, which resulted in
> very bad mixing of the high bits of the input.  In particular,
> page-aligned pointers (low 12 bits not used) were a disaster,

So I did just a minimal for fro 4.6 (and back-porting), which took
just the constants and made _only_ the 64-bit architevture case use
this improved constant for hash_64.

In other words, people who use "hash_long()" or use "hash_64()" on
64-bit architectures will get the improvements, but if you use
hash_64() on a 32-bit architecture you'll conteinue to see the old
behavior.

Quite frankly, looking at some of the explicit hash_64() users, they
seem to be a big dubious anyway. And it won't make things *worse* for
them.

So that simple "just use multiplication unconditionally on 64-bit, and
use the better constant" should fix the actual _practical_ problems
that we've seen. And it shouldn't have any negative consequences,
since as you say, 64-bit architectures universally do have a
multiplier.

The bigger changes will have to be for 4.7 by now, I think.

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 2805 bytes --]

From 689de1d6ca95b3b5bd8ee446863bf81a4883ea25 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 2 May 2016 12:46:42 -0700
Subject: [PATCH] Minimal fix-up of bad hashing behavior of hash_64()

This is a fairly minimal fixup to the horribly bad behavior of hash_64()
with certain input patterns.

In particular, because the multiplicative value used for the 64-bit hash
was intentionally bit-sparse (so that the multiply could be done with
shifts and adds on architectures without hardware multipliers), some
bits did not get spread out very much.  In particular, certain fairly
common bit ranges in the input (roughly bits 12-20: commonly with the
most information in them when you hash things like byte offsets in files
or memory that have block factors that mean that the low bits are often
zero) would not necessarily show up much in the result.

There's a bigger patch-series brewing to fix up things more completely,
but this is the fairly minimal fix for the 64-bit hashing problem.  It
simply picks a much better constant multiplier, spreading the bits out a
lot better.

NOTE! For 32-bit architectures, the bad old hash_64() remains the same
for now, since 64-bit multiplies are expensive.  The bigger hashing
cleanup will replace the 32-bit case with something better.

The new constants were picked by George Spelvin who wrote that bigger
cleanup series.  I just picked out the constants and part of the comment
from that series.

Cc: stable@vger.kernel.org
Cc: George Spelvin <linux@horizon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/include/linux/hash.h b/include/linux/hash.h
index 1afde47e1528..79c52fa81cac 100644
--- a/include/linux/hash.h
+++ b/include/linux/hash.h
@@ -32,12 +32,28 @@
 #error Wordsize not 32 or 64
 #endif
 
+/*
+ * The above primes are actively bad for hashing, since they are
+ * too sparse. The 32-bit one is mostly ok, the 64-bit one causes
+ * real problems. Besides, the "prime" part is pointless for the
+ * multiplicative hash.
+ *
+ * Although a random odd number will do, it turns out that the golden
+ * ratio phi = (sqrt(5)-1)/2, or its negative, has particularly nice
+ * properties.
+ *
+ * These are the negative, (1 - phi) = (phi^2) = (3 - sqrt(5))/2.
+ * (See Knuth vol 3, section 6.4, exercise 9.)
+ */
+#define GOLDEN_RATIO_32 0x61C88647
+#define GOLDEN_RATIO_64 0x61C8864680B583EBull
+
 static __always_inline u64 hash_64(u64 val, unsigned int bits)
 {
 	u64 hash = val;
 
-#if defined(CONFIG_ARCH_HAS_FAST_MULTIPLIER) && BITS_PER_LONG == 64
-	hash = hash * GOLDEN_RATIO_PRIME_64;
+#if BITS_PER_LONG == 64
+	hash = hash * GOLDEN_RATIO_64;
 #else
 	/*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
 	u64 n = hash;

  reply	other threads:[~2016-05-02 20:08 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+55aFxBWfAHQNAdBbdVr+z8ror4GVteyce3D3=vwDWxhu5KqQ@mail.gmail.com>
2016-04-30 20:52 ` [patch 2/7] lib/hashmod: Add modulo based hash mechanism George Spelvin
2016-05-01  8:35   ` Thomas Gleixner
2016-05-01  9:43     ` George Spelvin
2016-05-01 16:51       ` Linus Torvalds
2016-05-14  3:54         ` George Spelvin
2016-05-14 18:35           ` Linus Torvalds
2016-05-02  7:11       ` Thomas Gleixner
2016-05-02 10:20         ` [PATCH 1/2] <linux/hash.h>: Make hash_64(), hash_ptr() return 32 bits George Spelvin
2016-05-02 10:22           ` [PATCH 2/2] <linux/hash.h>: Fix hash_64()'s horrible collision problem George Spelvin
2016-05-02 20:08             ` Linus Torvalds [this message]
2016-05-02 10:27           ` [RFC PATCH 3/2] (Rant) Fix various hash abuses George Spelvin
2016-05-02 10:31           ` [RFC PATCH 4/2] namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS George Spelvin
2016-05-16 18:51             ` Linus Torvalds
2016-05-02 13:28           ` [PATCH 1/2] <linux/hash.h>: Make hash_64(), hash_ptr() return 32 bits Peter Zijlstra
2016-05-02 19:08             ` George Spelvin
2016-05-02 16:24           ` Linus Torvalds
2016-05-02 20:26             ` George Spelvin
2016-05-02 21:19               ` Linus Torvalds
2016-05-02 21:41                 ` Linus Torvalds
2016-05-03  1:59                 ` George Spelvin
2016-05-03  3:01                   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFyPMrAWqMaA013ZQA0TULtdTXbUh4rgUurMLu40TastuA@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=eric.dumazet@gmail.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).