All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism
@ 2016-04-29  2:57 George Spelvin
  2016-04-29  3:16 ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: George Spelvin @ 2016-04-29  2:57 UTC (permalink / raw)
  To: tglx; +Cc: linux, linux-kernel, torvalds

Thomas Gleixner wrote:
> I'm not a hashing wizard and I completely failed to understand why
> hash_long/ptr are so horrible for the various test cases I ran.

It's very simple: the constants chosen are bit-sparse, *particularly*
in the least significant bits, and only 32/64 bits of the product are
kept.  Using the high-word of a double-width multiply is even better,
but some machines (*cough* SPARCv9 *cough*) don't have hardware
support for that.

So what you get is:

  (0x9e370001 * (x << 12)) & 0xffffffff
= (0x9e370001 * x & 0xfffff) << 12
= (0x70001 * x & 0xfffff) << 12

*Now* does it make sense?

64 bits is just as bad...  0x9e37fffffffc0001 becomes
0x7fffffffc0001, which is 2^51 - 2^18 + 1.


The challenge is the !CONFIG_ARCH_HAS_FAST_MULTIPLIER case,
when it has to be done with shifts and adds/subtracts.

Now, what's odd is that it's only relevant for 64-bit platforms, and
currently only x86 and POWER7+ have it.

SPARCv9, MIPS64, ARM64, SH64, PPC64, and IA64 all have it turned off.

Is this a bug that should be fixed?

In fact, do *any* 64-bit platforms need multiply emulation?

How many 32-bit platforms nead a multiplier that's easy for GCC to
evaluate via shifts and adds?

Generlly, by the time you've got a machine grunty enough to
need 64 bits, a multiplier is quite affordable.


Anyway, assuming there exists at least one platform that needs the
shift-and-add sequence, it's quite easy to get a higher hamming weight,
you just have to use a few more registers to save some intermediate
results.

E.g.

	u64 x = val, t = val, u;
	x <<= 2;
	u = x += t;	/* val * 5 */
	x <<= 4;	/* val * 80 */
	x -= u;		/* val * 75 = 0b1001011 */

Shall I try to come up with something?


Footnote: useful web pages on shift-and-add/subtract mutliplciation
http://www.vinc17.org/research/mulbyconst/index.en.html
http://www.spiral.net/hardware/multless.html

^ permalink raw reply	[flat|nested] 29+ messages in thread
[parent not found: <CA+55aFxBWfAHQNAdBbdVr+z8ror4GVteyce3D3=vwDWxhu5KqQ@mail.gmail.com>]
* [patch 0/7] futex: Add support for process private hashing
@ 2016-04-28 16:42 Thomas Gleixner
  2016-04-28 16:42 ` [patch 2/7] lib/hashmod: Add modulo based hash mechanism Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-04-28 16:42 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Linus Torvalds, Andrew Morton,
	Sebastian Andrzej Siewior, Darren Hart, Michael Kerrisk,
	Davidlohr Bueso, Chris Mason, Carlos O'Donell,
	Torvald Riegel, Eric Dumazet

The standard futex mechanism in the Linux kernel uses a global hash to store
transient state. Collisions on that hash can lead to performance degradation
and on real-time enabled kernels to unbound priority inversions.

This new attempt to solve the issue does not require user space changes and
operates transparently. On the first futex operation of a process the kernel
allocates a hash private to the process. All process private futexes are
hashed in this hash. Process shared futexes still use the global hash.

For RT applications and pathological use cases a new futex op is provided
which allows the application to preallocate and thereby size the process
private hash.

The series comes with a new 'stupid' hash function based on the good old
modulu prime. That function provides way better hash results than
hash_ptr/hash_long() for small hash sizes.

The last two patches add support to the perf futex-hash benchmark so test can
be run on nodes and the preallocation sizing can be tested.

The last patch contains a first update for the futex man page.

Results from our testing in nice colored charts are available here:

perf bench futex-hash run parallel on 4 nodes with global hash and various
sized private hashes and various numbers of futexes per thread

 https://tglx.de/~tglx/f-ops.png

perf bench futex-hash run parallel on 4 nodes with global hash and various
sized private hashes using the new hash_mod() and various numbers of futexes
per thread

 https://tglx.de/~tglx/f-ops.png

perf bench futex-hash run parallel on 4 nodes with global hash and various
sized private hashes using hash_long() and various numbers of futexes per
thread

 https://tglx.de/~tglx/f-ops-hlong.png

perf bench futex-hash run parallel on 2 nodes with global hash and various
sized private hashes and various numbers of futexes per thread

 https://tglx.de/~tglx/f-ops-2.png

perf bench futex-hash run parallel on 4 nodes with global hash and various
sized private hashes using hash_mod(). 1 futex per thread and various thread
numbers.

 https://tglx.de/~tglx/f-ops-mod-t.png

perf bench futex-hash run parallel on 4 nodes with global hash and various
sized private hashes using hash_long(). 1 futex per thread and various thread
numbers.

 https://tglx.de/~tglx/f-ops-hlong-t.png

Thanks,

	tglx

----
 Documentation/sysctl/kernel.txt |   17 +++
 b/include/linux/futex_types.h   |   14 ++
 b/lib/hashmod.c                 |   44 ++++++++
 include/linux/futex.h           |   39 +++++--
 include/linux/hash.h            |   28 +++++
 include/linux/mm_types.h        |    4 
 include/uapi/linux/futex.h      |    1 
 init/Kconfig                    |    5 
 kernel/fork.c                   |    3 
 kernel/futex.c                  |  219 +++++++++++++++++++++++++++++++++++++++-
 kernel/sysctl.c                 |   21 +++
 lib/Kconfig                     |    3 
 lib/Makefile                    |    1 
 tools/perf/bench/Build          |    4 
 tools/perf/bench/futex-hash.c   |  101 ++++++++++++++++--
 tools/perf/bench/futex.h        |    5 
 16 files changed, 486 insertions(+), 23 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-06-12 12:18 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-29  2:57 [patch 2/7] lib/hashmod: Add modulo based hash mechanism George Spelvin
2016-04-29  3:16 ` Linus Torvalds
2016-04-29  4:12   ` George Spelvin
2016-04-29 23:31   ` George Spelvin
2016-04-30  0:05     ` Linus Torvalds
2016-04-30  0:32       ` George Spelvin
2016-04-30  1:12         ` Linus Torvalds
2016-04-30  3:04           ` George Spelvin
     [not found] <CA+55aFxBWfAHQNAdBbdVr+z8ror4GVteyce3D3=vwDWxhu5KqQ@mail.gmail.com>
2016-04-30 20:52 ` George Spelvin
2016-05-01  8:35   ` Thomas Gleixner
2016-05-01  9:43     ` George Spelvin
2016-05-01 16:51       ` Linus Torvalds
2016-05-14  3:54         ` George Spelvin
2016-05-14 18:35           ` Linus Torvalds
2016-05-02  7:11       ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2016-04-28 16:42 [patch 0/7] futex: Add support for process private hashing Thomas Gleixner
2016-04-28 16:42 ` [patch 2/7] lib/hashmod: Add modulo based hash mechanism Thomas Gleixner
2016-04-28 18:32   ` Linus Torvalds
2016-04-28 23:26     ` Thomas Gleixner
2016-04-29  2:25       ` Linus Torvalds
2016-04-30 13:02         ` Thomas Gleixner
2016-04-30 16:45           ` Eric Dumazet
2016-04-30 17:12             ` Linus Torvalds
2016-04-30 17:37               ` Eric Dumazet
2016-06-12 12:18         ` Sandy Harris
2016-04-29 21:10     ` Linus Torvalds
2016-04-29 23:51       ` Linus Torvalds
2016-04-30  1:34         ` Rik van Riel
2016-05-02  9:39         ` Torvald Riegel
2016-04-30 15:22       ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.