linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@horizon.com>
To: tglx@linutronix.de
Cc: eric.dumazet@gmail.com, linux@horizon.com,
	linux-kernel@vger.kernel.org, riel@redhat.com,
	torvalds@linux-foundation.org
Subject: Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism
Date: 30 Apr 2016 16:52:35 -0400	[thread overview]
Message-ID: <20160430205235.24232.qmail@ns.horizon.com> (raw)
In-Reply-To: <CA+55aFxBWfAHQNAdBbdVr+z8ror4GVteyce3D3=vwDWxhu5KqQ@mail.gmail.com>

Thomas Gleixner wrote:
> I'll send a patch to replace hash_64 and hash_32.

Before you do that, could we look for a way to tweak the constants
in the existing hash?

It seems the basic "take the high bits of x * K" algorithm is actually
a decent hash function if K is chosen properly, and has a significant
speed advantage on machines with half-decent multipliers.

I'm researching how to do the multiply with fewer shifts and adds on
machines that need it.  (Or we could use a totally different function
in that case.)

You say that
> hash64 is slightly faster as the modulo prime as it does not have the
> multiplication.

Um... are you sure you benchmarked that right?  The hash_64 code you
used (Thomas Wang's 64->32-bit hash) has a critical path consisting of 6
shifts and 7 adds.  I can't believe that's faster than a single multiply.

For 1,000,000 iterations on an Ivy Bridge, the multiply is 4x
faster (5x if out of line) for me!

The constants I recommend are
#define GOLDEN_RATIO_64 0x61C8864680B583EBull
#define GOLDEN_RATIO_32 0x61C88647


rdtsc times for 1,000,000 iterations of each of the two.
(The sum of all hashes is printed to prevent dead code elimination.)

  hash_64  (sum)      * PHI   (sum)      hash_64  (sum)      * PHI   (sum)
  17552154 a52752df   3431821 2ce5398c   17485381 a52752df   3375535 2ce5398c
  17522273 a52752df   3487206 2ce5398c   17551217 a52752df   3374221 2ce5398c
  17546242 a52752df   3377355 2ce5398c   17494306 a52752df   3374202 2ce5398c
  17495702 a52752df   3409768 2ce5398c   17505839 a52752df   3398205 2ce5398c
  17501114 a52752df   3375435 2ce5398c   17539388 a52752df   3374202 2ce5398c
And with hash_64 forced inline:
  13596945 a52752df   3374931 2ce5398c   13585916 a52752df   3411107 2ce5398c
  13564616 a52752df   3374928 2ce5398c   13573465 a52752df   3425160 2ce5398c
  13569712 a52752df   3374915 2ce5398c   13580461 a52752df   3397773 2ce5398c
  13577481 a52752df   3374912 2ce5398c   13558708 a52752df   3417456 2ce5398c
  13569044 a52752df   3374909 2ce5398c   13557193 a52752df   3407912 2ce5398c

That's 3.5 cycles vs. 13.5.

(I actually have two copies of the inlined code, to show code alignment
issues.)

On a Phenom, it's worse, 4 cycles vs. 35.
  35083119 a52752df   4020754 2ce5398c   35068116 a52752df   4015659 2ce5398c
  35074377 a52752df   4000819 2ce5398c   35068735 a52752df   4016943 2ce5398c
  35067596 a52752df   4025397 2ce5398c   35074365 a52752df   4000108 2ce5398c
  35071050 a52752df   4016190 2ce5398c   35058775 a52752df   4017988 2ce5398c
  35055091 a52752df   4000066 2ce5398c   35201158 a52752df   4000094 2ce5398c




My simple test code appended for anyone who cares...

#include <stdint.h>
#include <stdio.h>

/*  Phi = 0x0.9E3779B97F4A7C15F... */
/* -Phi = 0x0.61C8864680B583EA1... */
#define K 0x61C8864680B583EBull

static inline uint32_t hash1(uint64_t key)
{
       key  = ~key + (key << 18);
       key ^= key >> 31;
       key += (key << 2) + (key << 4);
       key ^= key >> 11;
       key += key << 6;
       key ^= key >> 22;
       return (uint32_t)key;
}

static inline uint32_t hash2(uint64_t key)
{
	return (uint32_t)(key * K >> 32);
}

static inline uint64_t rdtsc(void)
{
	uint32_t lo, hi;
	asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
	return (uint64_t)hi << 32 | lo;
}

int
main(void)
{
	int i, j;
	uint32_t sum, sums[20];
	uint64_t start, times[20];

	for (i = 0; i < 20; i += 4) {
		sum = 0;
		start = rdtsc();
		for (j = 0; j < 1000000; j++)
			sum += hash1(j+0xdeadbeef);
		times[i] = rdtsc() - start;
		sums[i] = sum;

		sum = 0;
		start = rdtsc();
		for (j = 0; j < 1000000; j++)
			sum += hash2(j+0xdeadbeef);
		times[i+1] = rdtsc() - start;
		sums[i+1] = sum;

		sum = 0;
		start = rdtsc();
		for (j = 0; j < 1000000; j++)
			sum += hash1(j+0xdeadbeef);
		times[i+2] = rdtsc() - start;
		sums[i+2] = sum;

		sum = 0;
		start = rdtsc();
		for (j = 0; j < 1000000; j++)
			sum += hash2(j+0xdeadbeef);
		times[i+3] = rdtsc() - start;
		sums[i+3] = sum;
	}
	for (i = 0; i < 20; i++)
		printf("  %llu %08x%c",
			times[i], sums[i], (~i & 3) ? ' ' : '\n');
	return 0;
}

       reply	other threads:[~2016-04-30 20:52 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+55aFxBWfAHQNAdBbdVr+z8ror4GVteyce3D3=vwDWxhu5KqQ@mail.gmail.com>
2016-04-30 20:52 ` George Spelvin [this message]
2016-05-01  8:35   ` [patch 2/7] lib/hashmod: Add modulo based hash mechanism Thomas Gleixner
2016-05-01  9:43     ` George Spelvin
2016-05-01 16:51       ` Linus Torvalds
2016-05-14  3:54         ` George Spelvin
2016-05-14 18:35           ` Linus Torvalds
2016-05-02  7:11       ` Thomas Gleixner
2016-05-02 10:20         ` [PATCH 1/2] <linux/hash.h>: Make hash_64(), hash_ptr() return 32 bits George Spelvin
2016-05-02 10:22           ` [PATCH 2/2] <linux/hash.h>: Fix hash_64()'s horrible collision problem George Spelvin
2016-05-02 20:08             ` Linus Torvalds
2016-05-02 10:27           ` [RFC PATCH 3/2] (Rant) Fix various hash abuses George Spelvin
2016-05-02 10:31           ` [RFC PATCH 4/2] namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS George Spelvin
2016-05-16 18:51             ` Linus Torvalds
2016-05-02 13:28           ` [PATCH 1/2] <linux/hash.h>: Make hash_64(), hash_ptr() return 32 bits Peter Zijlstra
2016-05-02 19:08             ` George Spelvin
2016-05-02 16:24           ` Linus Torvalds
2016-05-02 20:26             ` George Spelvin
2016-05-02 21:19               ` Linus Torvalds
2016-05-02 21:41                 ` Linus Torvalds
2016-05-03  1:59                 ` George Spelvin
2016-05-03  3:01                   ` Linus Torvalds
2016-04-29  2:57 [patch 2/7] lib/hashmod: Add modulo based hash mechanism George Spelvin
2016-04-29  3:16 ` Linus Torvalds
2016-04-29  4:12   ` George Spelvin
2016-04-29 23:31   ` George Spelvin
2016-04-30  0:05     ` Linus Torvalds
2016-04-30  0:32       ` George Spelvin
2016-04-30  1:12         ` Linus Torvalds
2016-04-30  3:04           ` George Spelvin
  -- strict thread matches above, loose matches on Subject: below --
2016-04-28 16:42 [patch 0/7] futex: Add support for process private hashing Thomas Gleixner
2016-04-28 16:42 ` [patch 2/7] lib/hashmod: Add modulo based hash mechanism Thomas Gleixner
2016-04-28 18:32   ` Linus Torvalds
2016-04-28 23:26     ` Thomas Gleixner
2016-04-29  2:25       ` Linus Torvalds
2016-04-30 13:02         ` Thomas Gleixner
2016-04-30 16:45           ` Eric Dumazet
2016-04-30 17:12             ` Linus Torvalds
2016-04-30 17:37               ` Eric Dumazet
2016-06-12 12:18         ` Sandy Harris
2016-04-29 21:10     ` Linus Torvalds
2016-04-29 23:51       ` Linus Torvalds
2016-04-30  1:34         ` Rik van Riel
2016-05-02  9:39         ` Torvald Riegel
2016-04-30 15:22       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160430205235.24232.qmail@ns.horizon.com \
    --to=linux@horizon.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).