linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, torvalds@linux-foundation.org
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de
Subject: Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism
Date: 29 Apr 2016 00:12:36 -0400	[thread overview]
Message-ID: <20160429041236.6211.qmail@ns.horizon.com> (raw)
In-Reply-To: <CA+55aFxS9zZ8Gpyg9cn=xgkamx-G6dDZ3DTR2B=iQzXYcAhGoQ@mail.gmail.com>

Linus wrote:
> Having looked around at other hashes, I suspect we should look at the
> ones that do five or six shifts, and a mix of add/sub and xor. And
> because they shift the bits around more freely you don't have the
> final shift (that ends up being dependent on the size of the target
> set).

I'm not sure that final shift is a problem.  You need to mask the result
to the desired final size somehow, and a shift is no more cycles than
an AND.

> It really would be lovely to hear that we can just replace
> hash_int/long() with a better hash. And I wouldn't get too hung up on
> the multiplication trick. I suspect it's not worth it.

My main concern is that the scope of the search grows enormously
if we include such things.  I don't want to discourage someone
from looking, but I volunteered to find a better multiplication
constant with an efficient add/subtract chain, not start a thesis
project on more general hash functions.

Two places one could look for ideas, though:
http://www.burtleburtle.net/bob/hash/integer.html
https://gist.github.com/badboy/6267743

Here's Thomas Wang's 64-bit hash, which is reputedly quite
good, in case it helps:

uint64_t hash(uint64_t key)
{
	key  = ~key + (key << 21);	// key = (key << 21) - key - 1;
	key ^= key >> 24;
	key += (key << 3)) + (key << 8); // key *= 265
	key ^= key >> 14;
	key += (key << 2)) + (key << 4); // key *= 21
	key ^= key >> 28;
	key += key << 31;
	return key;
}

And his slightly shorter 64-to-32-bit function:
unsigned hash(uint64_t key)
{
  key  = ~key + (key << 18); // key = (key << 18) - key - 1;
  key ^= key >> 31;
  key *= 21;	 // key += (key << 2)) + (key << 4);
  key ^= key >> 11;
  key += key << 6;
  key ^= key >> 22;
  return (uint32_t)key;
}


Sticking to multiplication, using the heuristics in the
current comments (prime near golden ratio = 9e3779b9 = 2654435769,)
I can come up with this for multiplying by 2654435599 = 0x9e37790f:

// -----------------------------------------------------------------------------
// This code was generated by Spiral Multiplier Block Generator, www.spiral.net
// Copyright (c) 2006, Carnegie Mellon University
// All rights reserved.
// The generated code is distributed under a BSD style license
// (see http://www.opensource.org/licenses/bsd-license.php)
// -----------------------------------------------
// Cost: 6 adds/subtracts 6 shifts 0 negations
// Depth: 5
// Input:
//   int t0
// Outputs:
//   int t1 = 2654435599 * t0
// -----------------------------------------------
t3 = shl(t0, 11);   /* 2048*/
t2 = sub(t3, t0);   /* 2047*/
t5 = shl(t2, 8);   /* 524032*/
t4 = sub(t5, t2);   /* 521985*/
t7 = shl(t0, 25);   /* 33554432*/
t6 = add(t4, t7);   /* 34076417*/
t9 = shl(t0, 9);   /* 512*/
t8 = sub(t9, t0);   /* 511*/
t11 = shl(t6, 4);   /* 545222672*/
t10 = sub(t11, t6);   /* 511146255*/
t12 = shl(t8, 22);   /* 2143289344*/
t1 = add(t10, t12);   /* 2654435599*/

Which translates into C as

uint32_t multiply(uint32_t x)
{
        unsigned y = (x << 11) - x;

        y -= y << 8;
        y -= x << 25;
        x -= x << 9;
        y -= y << 4;
        y -= x << 22;
        return y;
}

Unfortunately, that utility bogs like hell on 64-bit constants.

  reply	other threads:[~2016-04-29  4:12 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-29  2:57 [patch 2/7] lib/hashmod: Add modulo based hash mechanism George Spelvin
2016-04-29  3:16 ` Linus Torvalds
2016-04-29  4:12   ` George Spelvin [this message]
2016-04-29 23:31   ` George Spelvin
2016-04-30  0:05     ` Linus Torvalds
2016-04-30  0:32       ` George Spelvin
2016-04-30  1:12         ` Linus Torvalds
2016-04-30  3:04           ` George Spelvin
     [not found] <CA+55aFxBWfAHQNAdBbdVr+z8ror4GVteyce3D3=vwDWxhu5KqQ@mail.gmail.com>
2016-04-30 20:52 ` George Spelvin
2016-05-01  8:35   ` Thomas Gleixner
2016-05-01  9:43     ` George Spelvin
2016-05-01 16:51       ` Linus Torvalds
2016-05-14  3:54         ` George Spelvin
2016-05-14 18:35           ` Linus Torvalds
2016-05-02  7:11       ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2016-04-28 16:42 [patch 0/7] futex: Add support for process private hashing Thomas Gleixner
2016-04-28 16:42 ` [patch 2/7] lib/hashmod: Add modulo based hash mechanism Thomas Gleixner
2016-04-28 18:32   ` Linus Torvalds
2016-04-28 23:26     ` Thomas Gleixner
2016-04-29  2:25       ` Linus Torvalds
2016-04-30 13:02         ` Thomas Gleixner
2016-04-30 16:45           ` Eric Dumazet
2016-04-30 17:12             ` Linus Torvalds
2016-04-30 17:37               ` Eric Dumazet
2016-06-12 12:18         ` Sandy Harris
2016-04-29 21:10     ` Linus Torvalds
2016-04-29 23:51       ` Linus Torvalds
2016-04-30  1:34         ` Rik van Riel
2016-05-02  9:39         ` Torvald Riegel
2016-04-30 15:22       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160429041236.6211.qmail@ns.horizon.com \
    --to=linux@horizon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).