linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] String hash improvements
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
@ 2016-05-25  7:20 ` George Spelvin
  2016-05-25  8:00   ` Geert Uytterhoeven
                     ` (2 more replies)
  2016-05-25  7:21 ` [PATCH 01/10] Pull out string hash to <linux/stringhash.h> George Spelvin
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:20 UTC (permalink / raw)
  To: linux-kernel, torvalds
  Cc: alistair.francis, bfields, geert, gerg, jlayton, linux-m68k,
	linux-nfs, linux, michal.simek, tglx, uclinux-h8-devel, ysato

On Tue, 17 May 2016 at 09:32, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, May 17, 2016 at 6:41 AM, George Spelvin <linux@horizon.com> wrote:
>> I had assumed that since they weren't fully baked when the window opened,
>> they weren't eligible, but I'll try.

> Hey, if they aren't ready, they aren't.

Well, they're close, and I can and did *get* them ready.

> How about just the minimal set of patches that you'er happy with as-is?

The things are a bit interdependent.  I can't fix hash_64() on 32-bit systems until
I get rid of hash_string()'s need for it to return 64 bits, which requires work
on the dcache hashes to make them suitable replacements...

The real fun has come from TPTB deciding to sell the horizon.com domain,
and it turns out that updating rDNS takes the ISP a whole freaking week,
during which time outgoing mail trips everyone's spam filters.

That finally got fixed, just in time for me to put my dominant hand through
a piece of glass.  It's been a week. :-(


Anyway, the patches...

This series does several related things:
1) Gets rid of the string hashes in <linux/sunrpc/svcauth.h>,
   and uses the dcache hash (fs/namei.c) instead.
2) Avoid 64-bit multiplies in hash_64() on 32-bit platforms.
   Two 32-bit multiplies will do well enough.
3) Rids the world of the bad hash multipliers in hash_32.
   This finishes the job started in 689de1d6ca.
   The vast majority of Linux architectures have hardware support
   for 32x32-bit multiply and so derive no benefit from "simplified"
   multipliers.
   The few processors that do not (68000, h8/300 and some models of
   Microblaze) have arch-specific implementations added.  Those patches
   are last in the series so they can go through the relevant arch
   maintainers.
4) Overhauls the dcache hash mixing.
   The patch in 2bf0b16954 was an off-the-cuff suggestion.  Replaced with
   a much more careful design that's simultaneously faster and better.
   (My own invention, as there was noting suitable in the literature I
   could find.  Comments welcome!)

Things I thought about but can wait for now:
5) Modify the hash_name() loop to skip the initial HASH_MIX().
   That would let us salt the hash if we ever wanted to.
6) Modify bytemask_from_count to handle inputs of 1..sizeof(long)
   rather than 0..sizeof(long)-1.  This would simplify all its users
   including full_name_hash.
7) Sort out partial_name_hash().
   The hash function is declared as using a long state, even though
   it's truncated to 32 bits at the end and the extra internal state
   contributes nothing to the result.  And some callers do odd things:
   * fs/hfs/string.c only allocates 32 bits of state
   * fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

I'm not particularly fond of the names of the header files I created,
but if anyone has a better idea please talk fast!

George Spelvin (10):
  Pull out string hash to <linux/stringhash.h>
  fs/namei.c: Add hash_string() function.
  <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hash_string()
  Change hash_64() return value to 32 bits.
  Eliminate bad hash multipliers from hash_32() and hash_64().
  fs/namei.c: Improve dcache hash function
  <linux/hash.h>: Add support for architecture-specific functions
  m68k: Add <asm/archhash.h>
  microblaze: Add <asm/archhash.h>
  h8300: Add <asm/archhash.h>

 arch/Kconfig                           |   8 ++
 arch/h8300/Kconfig                     |   1 +
 arch/h8300/include/asm/archhash.h      |  52 ++++++++++++
 arch/m68k/Kconfig                      |   1 +
 arch/m68k/include/asm/archhash.h       |  67 +++++++++++++++
 arch/microblaze/Kconfig                |   1 +
 arch/microblaze/include/asm/archhash.h |  80 ++++++++++++++++++
 fs/namei.c                             | 149 +++++++++++++++++++++++++--------
 include/linux/dcache.h                 |  27 +-----
 include/linux/hash.h                   | 111 ++++++++++++------------
 include/linux/stringhash.h             |  76 +++++++++++++++++
 include/linux/sunrpc/svcauth.h         |  36 ++------
 12 files changed, 464 insertions(+), 145 deletions(-)
 create mode 100644 arch/h8300/include/asm/archhash.h
 create mode 100644 arch/m68k/include/asm/archhash.h
 create mode 100644 arch/microblaze/include/asm/archhash.h
 create mode 100644 include/linux/stringhash.h

-- 
2.8.1

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/10] Pull out string hash to <linux/stringhash.h>
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
  2016-05-25  7:20 ` [PATCH 00/10] String hash improvements George Spelvin
@ 2016-05-25  7:21 ` George Spelvin
  2016-05-25  7:22 ` [PATCH 02/10] fs/namei.c: Add hash_string() function George Spelvin
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:21 UTC (permalink / raw)
  To: linux-kernel, torvalds; +Cc: linux, tglx

... so they can be used without the rest of <linux/dcache.h>

The hashlen_* macros will make sense next patch.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
---
 include/linux/dcache.h     | 27 +----------------
 include/linux/stringhash.h | 72 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 26 deletions(-)
 create mode 100644 include/linux/stringhash.h

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 7e9422cb..0f9a977c 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -10,6 +10,7 @@
 #include <linux/cache.h>
 #include <linux/rcupdate.h>
 #include <linux/lockref.h>
+#include <linux/stringhash.h>
 
 struct path;
 struct vfsmount;
@@ -52,9 +53,6 @@ struct qstr {
 };
 
 #define QSTR_INIT(n,l) { { { .len = l } }, .name = n }
-#define hashlen_hash(hashlen) ((u32) (hashlen))
-#define hashlen_len(hashlen)  ((u32)((hashlen) >> 32))
-#define hashlen_create(hash,len) (((u64)(len)<<32)|(u32)(hash))
 
 struct dentry_stat_t {
 	long nr_dentry;
@@ -65,29 +63,6 @@ struct dentry_stat_t {
 };
 extern struct dentry_stat_t dentry_stat;
 
-/* Name hashing routines. Initial hash value */
-/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */
-#define init_name_hash()		0
-
-/* partial hash update function. Assume roughly 4 bits per character */
-static inline unsigned long
-partial_name_hash(unsigned long c, unsigned long prevhash)
-{
-	return (prevhash + (c << 4) + (c >> 4)) * 11;
-}
-
-/*
- * Finally: cut down the number of bits to a int value (and try to avoid
- * losing bits)
- */
-static inline unsigned long end_name_hash(unsigned long hash)
-{
-	return (unsigned int) hash;
-}
-
-/* Compute the hash for a name string. */
-extern unsigned int full_name_hash(const unsigned char *, unsigned int);
-
 /*
  * Try to keep struct dentry aligned on 64 byte cachelines (this will
  * give reasonable cacheline footprint with larger lines without the
diff --git a/include/linux/stringhash.h b/include/linux/stringhash.h
new file mode 100644
index 00000000..144d8c0f
--- /dev/null
+++ b/include/linux/stringhash.h
@@ -0,0 +1,72 @@
+#ifndef __LINUX_STRINGHASH_H
+#define __LINUX_STRINGHASH_H
+
+#include <linux/types.h>
+
+/*
+ * Routines for hashing strings of bytes to a 32-bit hash value.
+ *
+ * These hash functions are NOT GUARANTEED STABLE between kernel
+ * versions, architectures, or even repeated boots of the same kernel.
+ * (E.g. they may depend on boot-time hardware detection or be
+ * deliberately randomized.)
+ *
+ * They are also not intended to be secure against collisions caused by
+ * malicious inputs; much slower hash functions are required for that.
+ *
+ * They are optimized for pathname components, meaning short strings.
+ * Even if a majority of files have longer names, the dynamic profile of
+ * pathname components skews short due to short directory names.
+ * (E.g. /usr/lib/libsesquipedalianism.so.3.141.)
+ */
+
+/*
+ * Version 1: one byte at a time.  Example of use:
+ *
+ * unsigned long hash = init_name_hash;
+ * while (*p)
+ *	hash = partial_name_hash(tolower(*p++), hash);
+ * hash = end_name_hash(hash);
+ *
+ * Although this is designed for bytes, fs/hfsplus/unicode.c
+ * abuses it to hash 16-bit values.
+ */
+
+/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */
+#define init_name_hash()		0
+
+/* partial hash update function. Assume roughly 4 bits per character */
+static inline unsigned long
+partial_name_hash(unsigned long c, unsigned long prevhash)
+{
+	return (prevhash + (c << 4) + (c >> 4)) * 11;
+}
+
+/*
+ * Finally: cut down the number of bits to a int value (and try to avoid
+ * losing bits)
+ */
+static inline unsigned long end_name_hash(unsigned long hash)
+{
+	return (unsigned int)hash;
+}
+
+/*
+ * Version 2: One word (32 or 64 bits) at a time.
+ * If CONFIG_DCACHE_WORD_ACCESS is defined (meaning <asm/word-at-a-time.h>
+ * exists, which describes major Linux platforms like x86 and ARM), then
+ * this computes a different hash function much faster.
+ *
+ * If not set, this falls back to a wrapper around the preceding.
+ */
+extern unsigned int full_name_hash(const unsigned char *, unsigned int);
+
+/*
+ * A hash_len is a u64 with the hash of a string in the low
+ * half and the length in the high half.
+ */
+#define hashlen_hash(hashlen) ((u32)(hashlen))
+#define hashlen_len(hashlen)  ((u32)((hashlen) >> 32))
+#define hashlen_create(hash, len) ((u64)(len)<<32 | (u32)(hash))
+
+#endif	/* __LINUX_STRINGHASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/10] fs/namei.c: Add hash_string() function
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
  2016-05-25  7:20 ` [PATCH 00/10] String hash improvements George Spelvin
  2016-05-25  7:21 ` [PATCH 01/10] Pull out string hash to <linux/stringhash.h> George Spelvin
@ 2016-05-25  7:22 ` George Spelvin
  2016-05-25  7:26 ` [PATCH 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hash_string() George Spelvin
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:22 UTC (permalink / raw)
  To: linux-kernel, torvalds; +Cc: linux, tglx

It's not actually used in that file, but goes with hash_name() and
full_name_hash(), so it's simplest to keep it there.

Also simplify the prototype of full_name_hash to be take
a "char *" rather than "unsigned char *".  That's consistent
with hash_name().

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
---
 fs/namei.c                 | 47 +++++++++++++++++++++++++++++++++++++++++++---
 include/linux/stringhash.h |  8 ++++++--
 2 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 42f8ca03..ce640d65 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1822,7 +1822,8 @@ static inline unsigned long mix_hash(unsigned long hash)
 
 #endif
 
-unsigned int full_name_hash(const unsigned char *name, unsigned int len)
+/* Return the hash of a string of known length */
+unsigned int full_name_hash(const char *name, unsigned int len)
 {
 	unsigned long a, hash = 0;
 
@@ -1842,6 +1843,29 @@ done:
 }
 EXPORT_SYMBOL(full_name_hash);
 
+/* Return the "hash_len" (hash and length) of a null-terminated string */
+u64 hash_string(const char *name)
+{
+	unsigned long a, adata, mask, hash, len;
+	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
+
+	hash = a = 0;
+	len = -sizeof(unsigned long);
+	do {
+		hash = mix_hash(hash + a);
+		len += sizeof(unsigned long);
+		a = load_unaligned_zeropad(name+len);
+	} while (!has_zero(a, &adata, &constants));
+
+	adata = prep_zero_mask(a, adata, &constants);
+	mask = create_zero_mask(adata);
+	hash += a & zero_bytemask(mask);
+	len += find_zero(mask);
+
+	return hashlen_create(fold_hash(hash), len);
+}
+EXPORT_SYMBOL(hash_string);
+
 /*
  * Calculate the length and hash of the path component, and
  * return the "hash_len" as the result.
@@ -1872,15 +1896,32 @@ static inline u64 hash_name(const char *name)
 
 #else
 
-unsigned int full_name_hash(const unsigned char *name, unsigned int len)
+/* Return the hash of a string of known length */
+unsigned int full_name_hash(const char *name, unsigned int len)
 {
 	unsigned long hash = init_name_hash();
 	while (len--)
-		hash = partial_name_hash(*name++, hash);
+		hash = partial_name_hash((unsigned char)*name++, hash);
 	return end_name_hash(hash);
 }
 EXPORT_SYMBOL(full_name_hash);
 
+/* Return the "hash_len" (hash and length) of a null-terminated string */
+u64 hash_string(const char *name)
+{
+	unsigned long hash = init_name_hash();
+	unsigned long len = 0, c;
+
+	c = (unsigned char)*name;
+	do {
+		len++;
+		hash = partial_name_hash(c, hash);
+		c = (unsigned char)name[len];
+	} while (c);
+	return hashlen_create(end_name_hash(hash), len);
+}
+EXPORT_SYMBOL(hash_string);
+
 /*
  * We know there's a real path component here of at least
  * one character.
diff --git a/include/linux/stringhash.h b/include/linux/stringhash.h
index 144d8c0f..d1c80ba2 100644
--- a/include/linux/stringhash.h
+++ b/include/linux/stringhash.h
@@ -1,7 +1,8 @@
 #ifndef __LINUX_STRINGHASH_H
 #define __LINUX_STRINGHASH_H
 
-#include <linux/types.h>
+#include <linux/compiler.h>	/* For __pure */
+#include <linux/types.h>	/* For u32, u64 */
 
 /*
  * Routines for hashing strings of bytes to a 32-bit hash value.
@@ -59,7 +60,7 @@ static inline unsigned long end_name_hash(unsigned long hash)
  *
  * If not set, this falls back to a wrapper around the preceding.
  */
-extern unsigned int full_name_hash(const unsigned char *, unsigned int);
+extern unsigned int __pure full_name_hash(const char *, unsigned int);
 
 /*
  * A hash_len is a u64 with the hash of a string in the low
@@ -69,4 +70,7 @@ extern unsigned int full_name_hash(const unsigned char *, unsigned int);
 #define hashlen_len(hashlen)  ((u32)((hashlen) >> 32))
 #define hashlen_create(hash,len) ((u64)(len)<<32 | (u32)(hash))
 
+/* Return the "hash_len" (hash and length) of a null-terminated string */
+extern u64 __pure hash_string(const char *name);
+
 #endif	/* __LINUX_STRINGHASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hash_string()
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
                   ` (2 preceding siblings ...)
  2016-05-25  7:22 ` [PATCH 02/10] fs/namei.c: Add hash_string() function George Spelvin
@ 2016-05-25  7:26 ` George Spelvin
  2016-05-25  7:28 ` [PATCH 04/10] Change hash_64() return value to 32 bits George Spelvin
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:26 UTC (permalink / raw)
  To: bfields, jlayton, linux-kernel, linux-nfs, torvalds; +Cc: linux, tglx

Finally, the first use of previous two patches: Eliminate the
separate ad-hoc string hash functions in the sunrpc code.

This also eliminates the only caller of hash_long which asks for
more than 32 bits of output.

sunrpc guys: Is it okay if I send this to Linus directly?

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-nfs@vger.kernel.org
---
 include/linux/sunrpc/svcauth.h | 36 +++++-------------------------------
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index c00f53a4..ef2b2552 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -16,6 +16,7 @@
 #include <linux/sunrpc/cache.h>
 #include <linux/sunrpc/gss_api.h>
 #include <linux/hash.h>
+#include <linux/stringhash.h>
 #include <linux/cred.h>
 
 struct svc_cred {
@@ -165,41 +166,14 @@ extern int svcauth_unix_set_client(struct svc_rqst *rqstp);
 extern int unix_gid_cache_create(struct net *net);
 extern void unix_gid_cache_destroy(struct net *net);
 
-static inline unsigned long hash_str(char *name, int bits)
+static inline unsigned long hash_str(char const *name, int bits)
 {
-	unsigned long hash = 0;
-	unsigned long l = 0;
-	int len = 0;
-	unsigned char c;
-	do {
-		if (unlikely(!(c = *name++))) {
-			c = (char)len; len = -1;
-		}
-		l = (l << 8) | c;
-		len++;
-		if ((len & (BITS_PER_LONG/8-1))==0)
-			hash = hash_long(hash^l, BITS_PER_LONG);
-	} while (len);
-	return hash >> (BITS_PER_LONG - bits);
+	return hash_32(hashlen_hash(hash_string(name)), bits);
 }
 
-static inline unsigned long hash_mem(char *buf, int length, int bits)
+static inline unsigned long hash_mem(char const *buf, int length, int bits)
 {
-	unsigned long hash = 0;
-	unsigned long l = 0;
-	int len = 0;
-	unsigned char c;
-	do {
-		if (len == length) {
-			c = (char)len; len = -1;
-		} else
-			c = *buf++;
-		l = (l << 8) | c;
-		len++;
-		if ((len & (BITS_PER_LONG/8-1))==0)
-			hash = hash_long(hash^l, BITS_PER_LONG);
-	} while (len);
-	return hash >> (BITS_PER_LONG - bits);
+	return hash_32(full_name_hash(buf, length), bits);
 }
 
 #endif /* __KERNEL__ */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/10] Change hash_64() return value to 32 bits
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
                   ` (3 preceding siblings ...)
  2016-05-25  7:26 ` [PATCH 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hash_string() George Spelvin
@ 2016-05-25  7:28 ` George Spelvin
  2016-05-25  7:29 ` [PATCH 05/10] Eliminate bad hash multipliers from hash_32() and hash_64() George Spelvin
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:28 UTC (permalink / raw)
  To: linux-kernel, torvalds; +Cc: linux, tglx

That's all that's ever asked for, and it makes the return
type of hash_long() consistent.

It also allows (upcoming patch) an optimized implementation
of hash_64 on 32-bit machines.

There's a WARN_ON in there in case I missed anything.  Most callers pass
a compile-time constant bits and will have no run-time overhead.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
---
 include/linux/hash.h | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/hash.h b/include/linux/hash.h
index 79c52fa8..b9201c33 100644
--- a/include/linux/hash.h
+++ b/include/linux/hash.h
@@ -48,7 +48,7 @@
 #define GOLDEN_RATIO_32 0x61C88647
 #define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
-static __always_inline u64 hash_64(u64 val, unsigned int bits)
+static __always_inline u32 hash_64(u64 val, unsigned int bits)
 {
 	u64 hash = val;
 
@@ -71,8 +71,14 @@ static __always_inline u64 hash_64(u64 val, unsigned int bits)
 	hash += n;
 #endif
 
+	if (__builtin_constant_p(bits > 32 || bits == 0)) {
+		BUILD_BUG_ON(bits > 32 || bits == 0);
+	} else {
+		WARN_ON(bits > 32 || bits == 0);
+	}
+
 	/* High bits are more random, so use them. */
-	return hash >> (64 - bits);
+	return (u32)(hash >> (64 - bits));
 }
 
 static inline u32 hash_32(u32 val, unsigned int bits)
@@ -84,7 +90,7 @@ static inline u32 hash_32(u32 val, unsigned int bits)
 	return hash >> (32 - bits);
 }
 
-static inline unsigned long hash_ptr(const void *ptr, unsigned int bits)
+static inline u32 hash_ptr(const void *ptr, unsigned int bits)
 {
 	return hash_long((unsigned long)ptr, bits);
 }
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/10] Eliminate bad hash multipliers from hash_32() and hash_64()
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
                   ` (4 preceding siblings ...)
  2016-05-25  7:28 ` [PATCH 04/10] Change hash_64() return value to 32 bits George Spelvin
@ 2016-05-25  7:29 ` George Spelvin
  2016-05-25  7:31 ` [PATCH 06/10] fs/namei.c: Improve dcache hash function George Spelvin
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:29 UTC (permalink / raw)
  To: linux-kernel, torvalds; +Cc: linux, tglx

To avoid inefficiency, hash_64() on 32-bit systems is changed
to use a different algorithm.  It makes two calls to hash_32()
instead.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
---
 include/linux/hash.h | 100 ++++++++++++++++++++++-----------------------------
 1 file changed, 43 insertions(+), 57 deletions(-)

diff --git a/include/linux/hash.h b/include/linux/hash.h
index b9201c33..8926f369 100644
--- a/include/linux/hash.h
+++ b/include/linux/hash.h
@@ -3,91 +3,76 @@
 /* Fast hashing routine for ints,  longs and pointers.
    (C) 2002 Nadia Yvette Chambers, IBM */
 
-/*
- * Knuth recommends primes in approximately golden ratio to the maximum
- * integer representable by a machine word for multiplicative hashing.
- * Chuck Lever verified the effectiveness of this technique:
- * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
- *
- * These primes are chosen to be bit-sparse, that is operations on
- * them can use shifts and additions instead of multiplications for
- * machines where multiplications are slow.
- */
-
 #include <asm/types.h>
 #include <linux/compiler.h>
 
-/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
-#define GOLDEN_RATIO_PRIME_32 0x9e370001UL
-/*  2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
-#define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL
-
+/*
+ * The "GOLDEN_RATIO_PRIME" is used in ifs/btrfs/brtfs_inode.h and
+ * fs/inode.c.  It's not actually prime any more (the previous primes
+ * were actively bad for hashing), but the name remains.
+ */
 #if BITS_PER_LONG == 32
-#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_32
+#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_32
 #define hash_long(val, bits) hash_32(val, bits)
 #elif BITS_PER_LONG == 64
 #define hash_long(val, bits) hash_64(val, bits)
-#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_64
+#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_64
 #else
 #error Wordsize not 32 or 64
 #endif
 
 /*
- * The above primes are actively bad for hashing, since they are
- * too sparse. The 32-bit one is mostly ok, the 64-bit one causes
- * real problems. Besides, the "prime" part is pointless for the
- * multiplicative hash.
+ * This hash multiplies the input by a large odd number and takes the
+ * high bits.  Since multiplication propagates changes to the most
+ * significant end only, it is essential that the high bits of the
+ * product be used for the hash value.
+ *
+ * Chuck Lever verified the effectiveness of this technique:
+ * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
  *
  * Although a random odd number will do, it turns out that the golden
  * ratio phi = (sqrt(5)-1)/2, or its negative, has particularly nice
- * properties.
+ * properties.  (See Knuth vol 3, section 6.4, exercise 9.)
  *
- * These are the negative, (1 - phi) = (phi^2) = (3 - sqrt(5))/2.
- * (See Knuth vol 3, section 6.4, exercise 9.)
+ * These are the negative, (1 - phi) = phi**2 = (3 - sqrt(5))/2,
+ * which is very slightly easier to multiply by and makes no
+ * difference to the hash distribution.
  */
 #define GOLDEN_RATIO_32 0x61C88647
 #define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
+static inline u32 __hash_32(u32 val)
+{
+	return val * GOLDEN_RATIO_32;
+}
+
+static inline u32 hash_32(u32 val, unsigned int bits)
+{
+	/* High bits are more random, so use them. */
+	return __hash_32(val) >> (32 - bits);
+}
+
 static __always_inline u32 hash_64(u64 val, unsigned int bits)
 {
-	u64 hash = val;
-
-#if BITS_PER_LONG == 64
-	hash = hash * GOLDEN_RATIO_64;
-#else
-	/*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
-	u64 n = hash;
-	n <<= 18;
-	hash -= n;
-	n <<= 33;
-	hash -= n;
-	n <<= 3;
-	hash += n;
-	n <<= 3;
-	hash -= n;
-	n <<= 4;
-	hash += n;
-	n <<= 2;
-	hash += n;
-#endif
-
 	if (__builtin_constant_p(bits > 32 || bits == 0)) {
 		BUILD_BUG_ON(bits > 32 || bits == 0);
 	} else {
 		WARN_ON(bits > 32 || bits == 0);
 	}
 
-	/* High bits are more random, so use them. */
-	return (unsigned)(hash >> (64 - bits));
-}
-
-static inline u32 hash_32(u32 val, unsigned int bits)
-{
-	/* On some cpus multiply is faster, on others gcc will do shifts */
-	u32 hash = val * GOLDEN_RATIO_PRIME_32;
-
-	/* High bits are more random, so use them. */
-	return hash >> (32 - bits);
+#if BITS_PER_LONG == 64
+	/* 64x64-bit multiply is efficient on all 64-bit processors */
+	return val * GOLDEN_RATIO_64 >> (64 - bits);
+#else
+	/*
+	 * Hash 64 bits using only 32x32-bit multiply.	GOLDEN_RATIO is
+	 * phi**2 = 1-phi = 0.38196601.  The square of that is phi**4 =
+	 * 0.14589803 = 1/6.85, which is starting to have the low bits of
+	 * (val >> 32) not affect the high bits of the hash.  By subtracting,
+	 * we end up with phi**3 = 0.23606798, which is a bit better.
+	 */
+	return hash_32((u32)val - __hash_32(val >> 32), bits);
+#endif
 }
 
 static inline u32 hash_ptr(const void *ptr, unsigned int bits)
@@ -95,6 +80,7 @@ static inline u32 hash_ptr(const void *ptr, unsigned int bits)
 	return hash_long((unsigned long)ptr, bits);
 }
 
+/* This really should be called fold32_ptr; it does no hashing to speak of. */
 static inline u32 hash32_ptr(const void *ptr)
 {
 	unsigned long val = (unsigned long)ptr;
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/10] fs/namei.c: Improve dcache hash function
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
                   ` (5 preceding siblings ...)
  2016-05-25  7:29 ` [PATCH 05/10] Eliminate bad hash multipliers from hash_32() and hash_64() George Spelvin
@ 2016-05-25  7:31 ` George Spelvin
  2016-05-25  7:33 ` [PATCH 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:31 UTC (permalink / raw)
  To: linux-kernel, torvalds; +Cc: linux, tglx

Patch 0fed3ac866 improved the hash mixing, but the function is slower
than necessary; there's a 7-instruction dependency chain (10 on x86)
each loop iteration.

Word-at-a-time access is a very tight loop (which is good, because
link_path_walk() is one of the hottest code paths in the entire kernel),
and the hash mixing function must not have a longer latency to avoid
slowing it down.

There do not appear to be any published fast hash functions that:
1) Operate on the input a word at a time, and
2) Don't need to know the length of the input beforehand, and
3) Have a single iterated mixing function, not needing conditional
   branches or unrolling to distinguish different loop iterations.

One of the algorithms which comes closest is Yann Collet's xxHash, but
that's two dependent multiplies per word, which is too much.

The key insights in this design are:

1) Except for multiplies, to diffuse one input bit across 64 bits of hash
   state takes at least log2(64) = 6 sequentially dependent instructions.
   That is more cycles than we'd like.
2) An operation like "hash ^= hash << 13" requires a second temporary
   register anyway, and on a 2-operand machine like x86, it's three
   instructions.
3) A better use of a second register is to hold a two-word hash state.
   With careful design, no temporaries are needed at all, so it doesn't
   increase register pressure.  And this gets rid of register copying
   on 2-operand machines, so the code is smaller and faster.
4) Using two words of state weakens the requriement for one-round mixing;
   we now have two rounds of mixing before cancellation is possible.
5) A two-word hash state also allows operations on both halves to be
   done in parallel, so on a superscalar processor we get more mixing
   in fewer cycles.

I ended up using a mixing function inspired by the ChaCha and Speck
round functions.  It is 6 simple instructions and 3 cycles per iteration
(assuming mutliply by 9 can be done by an "lea" isntruction):

		x ^= *input++;
	y ^= x;	x = ROL(x, K1);
	x += y;	y = ROL(y, K2);
	y *= 9;

Not only is this reversible, two consecutive rounds are reversible:
if you are given the initial and final states, but not the intermediate
state, it is possible to compute both input words.  This means that at
least 3 words of input are required to create a collision.

(It also has the property, used by hash_name() to avoid a branch, that
it hashes all-zero to all-zero.)

The rotate constants K1 and K2 were found by experiment.  The search took
a sample of random initial states (I used 1023) and considered the effect
of flipping each of the 64 input bits on each of the 128 output bits two
rounds later.  Each of the 8192 pairs can be considered a biased coin, and
adding up the Shannon entropy of all of them produces a score.

The best-scoring shifts also did well in other tests (flipping bits in y,
trying 3 or 4 rounds of mixing, flipping all 64*63/2 pairs of input bits),
so the choice was made with the additional constraint that the sum of the
shifts is odd and not too close to the word size.

The final state is then folded into a 32-bit hash value by a less carefully
optimized multiply-based scheme.  This also has to be fast, as pathname
components tend to be short (the most common case is one iteration!), but
there's some room for latency, as there is a fair bit of intervening logic
before the hash value is used for anything.

(Performance verified with "bonnie++ -s 0 -n 1536:-2" on tmpfs.  I need
a better benchmark; the numbers seem to show a slight dip in performance
between 4.6.0 and this patch, but they're too noisy to quote.)

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
---
 fs/namei.c | 110 ++++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 73 insertions(+), 37 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ce640d65..2b8d0650 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -35,6 +35,7 @@
 #include <linux/fs_struct.h>
 #include <linux/posix_acl.h>
 #include <linux/hash.h>
+#include <linux/bitops.h>
 #include <asm/uaccess.h>
 
 #include "internal.h"
@@ -1788,36 +1789,75 @@ static int walk_component(struct nameidata *nd, int flags)
 #include <asm/word-at-a-time.h>
 
 #ifdef CONFIG_64BIT
-
-static inline unsigned int fold_hash(unsigned long hash)
-{
-	return hash_64(hash, 32);
-}
+/*
+ * Register pressure in the mixing function is an issue, particularly
+ * on 32-bit x86, but almost any function requires one state value and
+ * one temporary.  Instead, use a function designed for two state values
+ * and no temporaries.
+ *
+ * This function cannot create a collision in only two iterations, so
+ * we have two iterations to achieve avalanche.  In those two iterations,
+ * we have six layers of mixing, which is enough to spread one bit's
+ * influence out to 2^6 = 64 state bits.
+ *
+ * Rotate constants are scored by considering either 64 one-bit input
+ * deltas or 64*63/2 = 2016 two-bit input deltas, and finding the
+ * probability of that delta causing a change to each of the 128 output
+ * bits, using a sample of random initial states.
+ *
+ * The Shannon entropy of the computed probabilities is then summed
+ * to produce a score.  Ideally, any input change has a 50% chance of
+ * toggling any given output bit.
+ *
+ * Mixing scores (in bits) for (12,45):
+ * Input delta: 1-bit      2-bit
+ * 1 round:     713.3    42542.6
+ * 2 rounds:   2753.7   140389.8
+ * 3 rounds:   5954.1   233458.2
+ * 4 rounds:   7862.6   256672.2
+ * Perfect:    8192     258048
+ *            (64*128) (64*63/2 * 128)
+ */
+#define HASH_MIX(x, y, a)	\
+	(	x ^= (a),	\
+	y ^= x,	x = rol64(x,12),\
+	x += y,	y = rol64(y,45),\
+	y *= 9			)
 
 /*
- * This is George Marsaglia's XORSHIFT generator.
- * It implements a maximum-period LFSR in only a few
- * instructions.  It also has the property (required
- * by hash_name()) that mix_hash(0) = 0.
+ * Fold two longs into one 32-bit hash value.  This must be fast, but
+ * latency isn't quite as critical, as there is a fair bit of additional
+ * work done before the hash value is used.
  */
-static inline unsigned long mix_hash(unsigned long hash)
+static inline unsigned int fold_hash(unsigned long x, unsigned long y)
 {
-	hash ^= hash << 13;
-	hash ^= hash >> 7;
-	hash ^= hash << 17;
-	return hash;
+	y ^= x * GOLDEN_RATIO_64;
+	y *= GOLDEN_RATIO_64;
+	return y >> 32;
 }
 
 #else	/* 32-bit case */
 
-#define fold_hash(x) (x)
+/*
+ * Mixing scores (in bits) for (7,20):
+ * Input delta: 1-bit      2-bit
+ * 1 round:     330.3     9201.6
+ * 2 rounds:   1246.4    25475.4
+ * 3 rounds:   1907.1    31295.1
+ * 4 rounds:   2042.3    31718.6
+ * Perfect:    2048      31744
+ *            (32*64)   (32*31/2 * 64)
+ */
+#define HASH_MIX(x, y, a)	\
+	(	x ^= (a),	\
+	y ^= x,	x = rol32(x, 7),\
+	x += y,	y = rol32(y,20),\
+	y *= 9			)
 
-static inline unsigned long mix_hash(unsigned long hash)
+static inline unsigned int fold_hash(unsigned long x, unsigned long y)
 {
-	hash ^= hash << 13;
-	hash ^= hash >> 17;
-	hash ^= hash << 5;
-	return hash;
+	/* Use arch-optimized multiply if one exists */
+	return __hash32(y ^ __hash32(x));
 }
 
 #endif
@@ -1825,44 +1865,42 @@ static inline unsigned long mix_hash(unsigned long hash)
 /* Return the hash of a string of known length */
 unsigned int full_name_hash(const char *name, unsigned int len)
 {
-	unsigned long a, hash = 0;
+	unsigned long a, x = 0, y = 0;
 
 	for (;;) {
 		a = load_unaligned_zeropad(name);
 		if (len < sizeof(unsigned long))
 			break;
-		hash = mix_hash(hash + a);
+		HASH_MIX(x, y, a);
 		name += sizeof(unsigned long);
 		len -= sizeof(unsigned long);
 		if (!len)
 			goto done;
 	}
-	hash += a & bytemask_from_count(len);
+	HASH_MIX(x, y, a & bytemask_from_count(len));
 done:
-	return fold_hash(hash);
+	return fold_hash(x, y);
 }
 EXPORT_SYMBOL(full_name_hash);
 
 /* Return the "hash_len" (hash and length) of a null-terminated string */
 u64 hash_string(const char *name)
 {
-	unsigned long a, adata, mask, hash, len;
+	unsigned long a = 0, x = 0, y = 0, adata, mask, len;
 	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 
-	hash = a = 0;
 	len = -sizeof(unsigned long);
 	do {
-		hash = mix_hash(hash + a);
+		HASH_MIX(x, y, a);
 		len += sizeof(unsigned long);
 		a = load_unaligned_zeropad(name+len);
 	} while (!has_zero(a, &adata, &constants));
 
 	adata = prep_zero_mask(a, adata, &constants);
 	mask = create_zero_mask(adata);
-	hash += a & zero_bytemask(mask);
-	len += find_zero(mask);
+	HASH_MIX(x, y, a & zero_bytemask(mask));
 
-	return hashlen_create(fold_hash(hash), len);
+	return hashlen_create(fold_hash(x, y), len + find_zero(mask));
 }
 EXPORT_SYMBOL(hash_string);
 
@@ -1872,13 +1910,12 @@ EXPORT_SYMBOL(hash_string);
  */
 static inline u64 hash_name(const char *name)
 {
-	unsigned long a, b, adata, bdata, mask, hash, len;
+	unsigned long a = 0, b, x = 0, y = 0, adata, bdata, mask, len;
 	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 
-	hash = a = 0;
 	len = -sizeof(unsigned long);
 	do {
-		hash = mix_hash(hash + a);
+		HASH_MIX(x, y, a);
 		len += sizeof(unsigned long);
 		a = load_unaligned_zeropad(name+len);
 		b = a ^ REPEAT_BYTE('/');
@@ -1889,12 +1926,11 @@ static inline u64 hash_name(const char *name)
 
 	mask = create_zero_mask(adata | bdata);
 
-	hash += a & zero_bytemask(mask);
-	len += find_zero(mask);
-	return hashlen_create(fold_hash(hash), len);
+	HASH_MIX(x, y, a & zero_bytemask(mask));
+	return hashlen_create(fold_hash(x, y), len + find_zero(mask));
 }
 
-#else
+#else	/* !CONFIG_DCACHE_WORD_ACCESS: Slow, byte-at-a-time version */
 
 /* Return the hash of a string of known length */
 unsigned int full_name_hash(const char *name, unsigned int len)
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/10] <linux/hash.h>: Add support for architecture-specific functions
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
                   ` (6 preceding siblings ...)
  2016-05-25  7:31 ` [PATCH 06/10] fs/namei.c: Improve dcache hash function George Spelvin
@ 2016-05-25  7:33 ` George Spelvin
  2016-05-26 17:16   ` [PATCH v2 " George Spelvin
  2016-05-25  7:34 ` [PATCH 08/10] m68k: Add <asm/archhash.h> George Spelvin
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:33 UTC (permalink / raw)
  To: linux-kernel, torvalds
  Cc: alistair.francis, geert, gerg, linux-m68k, linux, michal.simek,
	tglx, uclinux-h8-devel, ysato

This is just the infrastructure; there are no users yet.

This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/archhash.h>.

That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis <alistai@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
---
 arch/Kconfig         |  8 ++++++++
 fs/namei.c           |  6 +++++-
 include/linux/hash.h | 11 +++++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 81869a5e..33e8d7b1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -589,6 +589,14 @@ config HAVE_STACK_VALIDATION
 	  Architecture supports the 'objtool check' host tool command, which
 	  performs compile-time stack metadata validation.
 
+config HAVE_ARCH_HASH
+	bool
+	default n
+	help
+	  If this is set, the architecture provides an <asm/archhash.h>
+	  file which provides platform-specific implementations of some
+	  functions in <linux/hash.h> or fs/namei.c.
+
 #
 # ABI hall of shame
 #
diff --git a/fs/namei.c b/fs/namei.c
index 2b8d0650..380e8057 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1788,7 +1788,11 @@ static int walk_component(struct nameidata *nd, int flags)
 
 #include <asm/word-at-a-time.h>
 
-#ifdef CONFIG_64BIT
+#ifdef HASH_MIX
+
+/* Architecture provides HASH_MIX and fold_hash() in <linux/hash.h> */
+
+#elif defined(CONFIG_64BIT)
 /*
  * Register pressure in the mixing function is an issue, particularly
  * on 32-bit x86, but almost any function requires one state value and
diff --git a/include/linux/hash.h b/include/linux/hash.h
index 8926f369..838bc84b 100644
--- a/include/linux/hash.h
+++ b/include/linux/hash.h
@@ -41,17 +41,27 @@
 #define GOLDEN_RATIO_32 0x61C88647
 #define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
+#ifdef CONFIG_HAVE_ARCH_HASH
+/* This header may use the GOLDEN_RATIO_xx constants */
+#include <asm/archhash.h>
+#endif
+
+#ifndef HAVE_ARCH__HASH_32
 static inline u32 __hash_32(u32 val)
 {
 	return val * GOLDEN_RATIO_32;
 }
+#endif
 
+#ifndef HAVE_ARCH_HASH_32
 static inline u32 hash_32(u32 val, unsigned int bits)
 {
 	/* High bits are more random, so use them. */
 	return __hash_32(val) >> (32 - bits);
 }
+#endif
 
+#ifndef HAVE_ARCH_HASH_64
 static __always_inline u32 hash_64(u64 val, unsigned int bits)
 {
 	if (__builtin_constant_p(bits > 32 || bits == 0)) {
@@ -74,6 +84,7 @@ static __always_inline u32 hash_64(u64 val, unsigned int bits)
 	return hash_32((u32)val - __hash_32(val >> 32), bits);
 #endif
 }
+#endif
 
 static inline u32 hash_ptr(const void *ptr, unsigned int bits)
 {
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/10] m68k: Add <asm/archhash.h>
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
                   ` (7 preceding siblings ...)
  2016-05-25  7:33 ` [PATCH 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
@ 2016-05-25  7:34 ` George Spelvin
  2016-05-25  7:34 ` George Spelvin
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:34 UTC (permalink / raw)
  To: geert, gerg, linux-kernel, linux-m68k, torvalds; +Cc: linux, tglx



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 08/10] m68k: Add <asm/archhash.h>
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
                   ` (8 preceding siblings ...)
  2016-05-25  7:34 ` [PATCH 08/10] m68k: Add <asm/archhash.h> George Spelvin
@ 2016-05-25  7:34 ` George Spelvin
  2016-05-25  8:07   ` Geert Uytterhoeven
                     ` (3 more replies)
  2016-05-25  7:37 ` [PATCH 09/10] microblaze: Add <asm/archhash.h> George Spelvin
  2016-05-25  7:38 ` [PATCH 10/10] h8300: Add <asm/archhash.h> George Spelvin
  11 siblings, 4 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:34 UTC (permalink / raw)
  To: geert, gerg, linux-kernel, linux-m68k, torvalds; +Cc: linux, tglx

This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647
for the original mc68000, which lacks a 32x32-bit multiply instruction.

Yes, the amount of optimization effort put in is excessive. :-)

Addition chains found by Yevgen Voronenko's Hcub algorithm at
http://spiral.ece.cmu.edu/mcm/gen.html

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
---
 arch/m68k/Kconfig                |  1 +
 arch/m68k/include/asm/archhash.h | 67 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)
 create mode 100644 arch/m68k/include/asm/archhash.h

diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 498b567f..95197d5e 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -23,6 +23,7 @@ config M68K
 	select MODULES_USE_ELF_RELA
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
+	select HAVE_ARCH_HASH
 
 config RWSEM_GENERIC_SPINLOCK
 	bool
diff --git a/arch/m68k/include/asm/archhash.h b/arch/m68k/include/asm/archhash.h
new file mode 100644
index 00000000..c2bb2fc5
--- /dev/null
+++ b/arch/m68k/include/asm/archhash.h
@@ -0,0 +1,67 @@
+#ifndef _ASM_ARCHHASH_H
+#define _ASM_ARCHHASH_H
+
+/*
+ * The only 68k processors that lack MULU.L and so need this workaround
+ * are the original 68000 and 68010.
+ *
+ * Annoyingly, GCC defines __mc68000 for all processors in the family;
+ * the only way to identify an mc68000 is by the *absence* of other
+ * symbols; __mcpu32, __mcoldfire__, __mc68020, etc.
+ */
+#if ! (defined(__mc68020) || \
+	defined(__mc68030) || \
+	defined(__mc68040) || \
+	defined(__mc68060) || \
+	defined(__mcpu32)  || \
+	defined(__mcoldfire))
+
+#define HAVE_ARCH__HASH_32 1
+/*
+ * While it would be legal to substitute a different hash operation
+ * entirely, let's keep it simple and just use an optimized multiply
+ * by GOLDEN_RATIO_32 = 0x61C88647.
+ *
+ * The best way to do that appears to be to multiply by 0x8647 with
+ * shifts and adds, and use mulu.w to multiply the high half by 0x61C8.
+ *
+ * Because the 68000 has multi-cycle shifts, this addition chain is
+ * chosen to minimise the shift distances.
+ *
+ * Despite every attempt to spoon-feed GCC simple operations, GCC 6.1.1
+ * doggedly insists on doing annoying things like converting "lsl.l #2,<reg>"
+ * (12 cycles) to two adds (8+8 cycles).
+ *
+ * It also likes to notice two shifts in a row, like "a = x << 2" and
+ * "a <<= 7", and convert that to "a = x << 9".  But shifts longer than
+ * 8 bits are extra-slow on m68k, so that's a lose.
+ *
+ * Since the 68000 is a very simple in-order processor with no instruction
+ * scheduling effects on execution time, we can safely take it out of GCC's
+ * hands and write one big asm() block.
+ *
+ * Without calling overhead, this operation is 30 bytes (14 instructions
+ * plus one immediate constant) and 166 cycles.
+ */
+static inline u32 __attribute_const__ __hash_32(u32 x)
+{
+	u32 a, b;
+
+	asm(   "move.l %2,%0"	/* 0x0001 */
+	"\n	lsl.l #2,%0"	/* 0x0004 */
+	"\n	move.l %0,%1"
+	"\n	lsl.l #7,%0"	/* 0x0200 */
+	"\n	add.l %2,%0"	/* 0x0201 */
+	"\n	add.l %0,%1"	/* 0x0205 */
+	"\n	add.l %0,%0"	/* 0x0402 */
+	"\n	add.l %0,%1"	/* 0x0607 */
+	"\n	lsl.l #5,%0"	/* 0x8040 */
+				/* 0x8647 */
+	: "=&d" (a), "=&r" (b)
+	: "g" (x));
+
+	return ((u16)(x*0x61c8) << 16) + a + b;
+}
+#endif	/* HAVE_ARCH__HASH_32 */
+
+#endif	/* _ASM_ARCHHASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/10] microblaze: Add <asm/archhash.h>
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
                   ` (9 preceding siblings ...)
  2016-05-25  7:34 ` George Spelvin
@ 2016-05-25  7:37 ` George Spelvin
  2016-05-26 17:21   ` [PATCH v2 09/10] microblaze: Add <asm/hash.h> George Spelvin
  2016-05-25  7:38 ` [PATCH 10/10] h8300: Add <asm/archhash.h> George Spelvin
  11 siblings, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:37 UTC (permalink / raw)
  To: alistair.francis, linux-kernel, michal.simek, torvalds; +Cc: linux, tglx

Microblaze is an FPGA soft core that can be configured various ways.

If it is configured without a multiplier, the standard __hash_32()
will require a call to __mulsi3, which is a slow software loop.

Instead, use a shift-and-add sequence for the constant multiply.
GCC knows how to do this, but it's not as clever as some.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Alistair Francis <alistai@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
---
 arch/microblaze/Kconfig                |  1 +
 arch/microblaze/include/asm/archhash.h | 80 ++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+)
 create mode 100644 arch/microblaze/include/asm/archhash.h

diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 3d793b55..ce3e5125 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -16,6 +16,7 @@ config MICROBLAZE
 	select GENERIC_IRQ_SHOW
 	select GENERIC_PCI_IOMAP
 	select GENERIC_SCHED_CLOCK
+	select HAVE_ARCH_HASH
 	select HAVE_ARCH_KGDB
 	select HAVE_DEBUG_KMEMLEAK
 	select HAVE_DMA_API_DEBUG
diff --git a/arch/microblaze/include/asm/archhash.h b/arch/microblaze/include/asm/archhash.h
new file mode 100644
index 00000000..d63cd19e
--- /dev/null
+++ b/arch/microblaze/include/asm/archhash.h
@@ -0,0 +1,80 @@
+#ifndef _ASM_ARCHHASH_H
+#define _ASM_ARCHHASH_H
+
+/*
+ * Fortunately, most people who want to run Linux on Microblaze enable
+ * both multiplier and barrel shifter, but omitting them is technically
+ * a supported configuration.
+ *
+ * With just a barrel shifter, we can implement an efficient constant
+ * multiply using shifts and adds.  GCC can find a 9-step solution, but
+ * this 6-step solution was found by Yevgen Voronenko's implementation
+ * of the Hcub algorithm at http://spiral.ece.cmu.edu/mcm/gen.html.
+ *
+ * That software is really not designed for a single multiplier this large,
+ * but if you run it enough times with different seeds, it'll find several
+ * 6-shift, 6-add sequences for computing x * 0x61C88647.  They are all
+ *	c = (x << 19) + x;
+ *	a = (x <<  9) + c;
+ *	b = (x << 23) + a;
+ *	return (a<<11) + (b<<6) + (c<<3) - b;
+ * with variations on the order of the final add.
+ *
+ * Without even a shifter, it's hopless; any hash function will suck.
+ */
+
+#if CONFIG_XILINX_MICROBLAZE0_USE_HW_MUL == 0
+
+#define HAVE_ARCH__HASH_32 1
+
+/* Multiply by GOLDEN_RATIO_32 = 0x61C88647 */
+static inline u32 __attribute_const__ __hash_32(u32 a)
+{
+#if CONFIG_XILINX_MICROBLAZE0_USE_BARREL
+	unsigned b, c;
+
+	/* Phase 1: Compute three intermediate values */
+	b =  a << 23;
+	c = (a << 19) + a;
+	a = (a <<  9) + c;
+	b += a;
+
+	/* Phase 2: Compute (a << 11) + (b << 6) + (c << 3) - b */
+	a <<= 5;
+	a += b;		/* (a << 5) + b */
+	a <<= 3;
+	a += c;		/* (a << 8) + (b << 3) + c */
+	a <<= 3;
+	return a - b;	/* (a << 11) + (b << 6) + (c << 3) - b */
+#else
+	/*
+	 * "This is really going to hurt."
+	 *
+	 * Without a barrel shifter, left shifts are implemented as
+	 * repeated additions, and the best we can do is an optimal
+	 * addition-subtraction chain.  This one is not known to be
+	 * optimal, but at 37 steps, it's decent for a 31-bit multiplier.
+	 *
+	 * Question: given its size (37*4 = 148 bytes per instance),
+	 * and slowness, is this worth having inline?
+	 */
+	unsigned b, c, d;
+	b = a << 4;	/* 4    */
+	c = b << 1;	/* 1  5 */
+	b += a;		/* 1  6 */
+	c += b;		/* 1  7 */
+	c <<= 3;	/* 3 10 */
+	c -= a;		/* 1 11 */
+	d = c << 7;	/* 7 18 */
+	d += b;		/* 1 19 */
+	d <<= 8;	/* 8 27 */
+	d += a;		/* 1 28 */
+	d <<= 1;	/* 1 29 */
+	d += b;		/* 1 30 */
+	d <<= 6;	/* 6 36 */
+	return d + c;	/* 1 37 total instructions*/
+#endif
+}
+
+#endif /* !CONFIG_XILINX_MICROBLAZE0_USE_HW_MUL */
+#endif /* _ASM_ARCHHASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/10] h8300: Add <asm/archhash.h>
       [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
                   ` (10 preceding siblings ...)
  2016-05-25  7:37 ` [PATCH 09/10] microblaze: Add <asm/archhash.h> George Spelvin
@ 2016-05-25  7:38 ` George Spelvin
  2016-05-26 17:23   ` [PATCH v2 10/10] h8300: Add <asm/hash.h> George Spelvin
  11 siblings, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-05-25  7:38 UTC (permalink / raw)
  To: linux-kernel, torvalds, uclinux-h8-devel, ysato; +Cc: linux, tglx

This will improve the performance of hash_32() and hash_64(), but due
to complete lack of multi-bit shift instructions on H8, performance will
still be bad in surrounding code.

Designing H8-specific hash algorithms to work around that is a separate
project.  (But if the maintainers would like to get in touch...)

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
---
 arch/h8300/Kconfig                |  1 +
 arch/h8300/include/asm/archhash.h | 52 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)
 create mode 100644 arch/h8300/include/asm/archhash.h

diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index 986ea84c..6c583dbb 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -20,6 +20,7 @@ config H8300
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZO
 	select HAVE_ARCH_KGDB
+	select HAVE_ARCH_HASH
 
 config RWSEM_GENERIC_SPINLOCK
 	def_bool y
diff --git a/arch/h8300/include/asm/archhash.h b/arch/h8300/include/asm/archhash.h
new file mode 100644
index 00000000..018ed96a
--- /dev/null
+++ b/arch/h8300/include/asm/archhash.h
@@ -0,0 +1,52 @@
+#ifndef _ASM_ARCHHASH_H
+#define _ASM_ARCHHASH_H
+
+/*
+ * The later H8SX models have a 32x32-bit multiply, but the H8/300H
+ * and H8S have only 16x16->32.  Since it's tolerably compact, this
+ * is basically an inlined version of the __mulsi3 code.  It's also
+ * simplfied by skipping the early-out checks.
+ *
+ * (Since neither CPU has any multi-bit shift instructions, a
+ * shift-and-add version is a non-starter.)
+ *
+ * TODO: come up with an arch-specific version of the hashing in fs/namei.c,
+ * since that is heavily dependent on rotates.  Which, as mentioned, suck
+ * horribly on H8.
+ */
+
+#if defined(CONFIG_CPU_H300H) || defined(CONFIG_CPU_H8S)
+
+#define HAVE_ARCH__HASH_32 1
+
+/*
+ * Multiply by k = 0x61C88647.  Fitting this into three registers requires
+ * one extra instruction, but reducing register pressure will probably
+ * make that back and then some.
+ *
+ * GCC asm note: %e1 is the high half of operand %1, while %f1 is the
+ * low half.  So if %1 is er4, then %e1 is e4 and %f1 is r4.
+ *
+ * This has been designed to modify x in place, since that's the most
+ * common usage, but preserve k, since hash_64() makes two calls
+ * in quick succession.
+ */
+static inline u32 __attribute_const__ __hash_32(u32 x)
+{
+	u32 temp;
+
+	asm(   "mov.w	%e1,%f0"
+	"\n	mulxu.w	%f2,%0"		/* klow * xhigh */
+	"\n	mov.w	%f0,%e1"	/* The extra instruction */
+	"\n	mov.w	%f1,%f0"
+	"\n	mulxu.w	%e2,%0"		/* khigh * xlow */
+	"\n	add.w	%e1,%f0"
+	"\n	mulxu.w	%f2,%1"		/* klow * xlow */
+	"\n	add.w	%f0,%e1"
+	: "=&r" (temp), "=r" (x)
+	: "%r" (GOLDEN_RATIO_32), "1" (x));
+	return x;
+}
+
+#endif /* CONFIG_ARCH_H300H */
+#endif /* _ASM_ARCHHASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/10] String hash improvements
  2016-05-25  7:20 ` [PATCH 00/10] String hash improvements George Spelvin
@ 2016-05-25  8:00   ` Geert Uytterhoeven
  2016-05-25  8:11     ` George Spelvin
  2016-05-25 16:08   ` Linus Torvalds
  2016-05-26 17:09   ` [PATCH v2 " George Spelvin
  2 siblings, 1 reply; 55+ messages in thread
From: Geert Uytterhoeven @ 2016-05-25  8:00 UTC (permalink / raw)
  To: George Spelvin
  Cc: linux-kernel, Linus Torvalds, alistair.francis, Bruce Fields,
	Greg Ungerer, Jeff Layton, linux-m68k, open list:NFS, SUNRPC,
	AND...,
	Michal Simek, Thomas Gleixner, uclinux-h8-devel, Yoshinori Sato

Hi George,

On Wed, May 25, 2016 at 9:20 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
> I'm not particularly fond of the names of the header files I created,
> but if anyone has a better idea please talk fast!

Usually this is handled through include/asm-generic/.
Put the generic default implementation in include/asm-generic/hash.h.

Architectures that need to override provide their own version, e.g.
arch/m68k/include/asm/hash.h. They may #include <asm-generic/hash.h>
if they still want to reuse parts of the generic implementation.

Other architectures add "generic-y += hash.h" to their
arch/<ARCH>/include/asm/Kbuild.

<linux/hash.h> includes <asm/hash.h> t.

>  arch/h8300/include/asm/archhash.h      |  52 ++++++++++++
>  arch/m68k/include/asm/archhash.h       |  67 +++++++++++++++
>  arch/microblaze/include/asm/archhash.h |  80 ++++++++++++++++++
>  include/linux/hash.h                   | 111 ++++++++++++------------
>  include/linux/stringhash.h             |  76 +++++++++++++++++

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/10] m68k: Add <asm/archhash.h>
  2016-05-25  7:34 ` George Spelvin
@ 2016-05-25  8:07   ` Geert Uytterhoeven
  2016-05-25  8:19     ` George Spelvin
  2016-05-25  8:24     ` [PATCH 08v2/10] " George Spelvin
  2016-05-25  8:56   ` [PATCH 08/10] " Philippe De Muyter
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 55+ messages in thread
From: Geert Uytterhoeven @ 2016-05-25  8:07 UTC (permalink / raw)
  To: George Spelvin
  Cc: Greg Ungerer, linux-kernel, linux-m68k, Linus Torvalds, Thomas Gleixner

Hi George,

On Wed, May 25, 2016 at 9:34 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
> This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647
> for the original mc68000, which lacks a 32x32-bit multiply instruction.
>
> Yes, the amount of optimization effort put in is excessive. :-)
>
> Addition chains found by Yevgen Voronenko's Hcub algorithm at
> http://spiral.ece.cmu.edu/mcm/gen.html
>
> Signed-off-by: George Spelvin <linux@sciencehorizons.net>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Greg Ungerer <gerg@linux-m68k.org>
> Cc: linux-m68k@lists.linux-m68k.org
> ---
>  arch/m68k/Kconfig                |  1 +
>  arch/m68k/include/asm/archhash.h | 67 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 68 insertions(+)
>  create mode 100644 arch/m68k/include/asm/archhash.h
>
> diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
> index 498b567f..95197d5e 100644
> --- a/arch/m68k/Kconfig
> +++ b/arch/m68k/Kconfig
> @@ -23,6 +23,7 @@ config M68K
>         select MODULES_USE_ELF_RELA
>         select OLD_SIGSUSPEND3
>         select OLD_SIGACTION
> +       select HAVE_ARCH_HASH

"select HAVE_ARCH_HASH if M68000"?

Or better, move the select to the M68000 section in arch/m68k/Kconfig.cpu.

> --- /dev/null
> +++ b/arch/m68k/include/asm/archhash.h
> @@ -0,0 +1,67 @@
> +#ifndef _ASM_ARCHHASH_H
> +#define _ASM_ARCHHASH_H
> +
> +/*
> + * The only 68k processors that lack MULU.L and so need this workaround
> + * are the original 68000 and 68010.
> + *
> + * Annoyingly, GCC defines __mc68000 for all processors in the family;
> + * the only way to identify an mc68000 is by the *absence* of other
> + * symbols; __mcpu32, __mcoldfire__, __mc68020, etc.
> + */
> +#if ! (defined(__mc68020) || \
> +       defined(__mc68030) || \
> +       defined(__mc68040) || \
> +       defined(__mc68060) || \
> +       defined(__mcpu32)  || \
> +       defined(__mcoldfire))

With my comment above, you wouldn't need this, but I'm gonna comment anyway.

We don't use special GCCs to target specific CPU variants. Hence inside the
kernel, you should check the config symbols, to see if support for 68000 or
68010 (which isn't supported by the kernel yet) is enabled.

Hence the check should be:

    #if defined(CONFIG_M68000) || defined(CONFIG_M68010)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/10] String hash improvements
  2016-05-25  8:00   ` Geert Uytterhoeven
@ 2016-05-25  8:11     ` George Spelvin
  2016-05-25  8:50       ` Geert Uytterhoeven
  0 siblings, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-05-25  8:11 UTC (permalink / raw)
  To: geert, linux
  Cc: alistair.francis, bfields, gerg, jlayton, linux-kernel,
	linux-m68k, linux-nfs, michal.simek, tglx, torvalds,
	uclinux-h8-devel, ysato

Geert Uytterhoeven wrote:
> Usually this is handled through include/asm-generic/.
> Put the generic default implementation in include/asm-generic/hash.h.
>
> Architectures that need to override provide their own version, e.g.
> arch/m68k/include/asm/hash.h. They may #include <asm-generic/hash.h>
> if they still want to reuse parts of the generic implementation.
>
> Other architectures add "generic-y += hash.h" to their
> arch/<ARCH>/include/asm/Kbuild.

I thought about that, but then I'd have to edit *every* architecture,
and might need acks from all the maintainers.

I was looking for something that was a total no-op on most architectures.

But if this is preferred, it's not technically difficult at all.


If asm-generic were in the <asm/*.h> search path, it would magically
Just Work, but leftover files from a broken checkout would be a big
potential problem.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/10] m68k: Add <asm/archhash.h>
  2016-05-25  8:07   ` Geert Uytterhoeven
@ 2016-05-25  8:19     ` George Spelvin
  2016-05-25  8:24     ` [PATCH 08v2/10] " George Spelvin
  1 sibling, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  8:19 UTC (permalink / raw)
  To: geert, linux; +Cc: gerg, linux-kernel, linux-m68k, tglx, torvalds

> With my comment above, you wouldn't need this, but I'm gonna comment anyway.
> 
> We don't use special GCCs to target specific CPU variants. Hence inside the
> kernel, you should check the config symbols, to see if support for 68000 or
> 68010 (which isn't supported by the kernel yet) is enabled.

Do you remember some earlier discussion about the m68k Makefile and old
GCC versions?  In particular, lines like:

cpuflags-$(CONFIG_M525x)        := $(call cc-option,-mcpu=5253,-m5200)
cpuflags-$(CONFIG_M5249)        := $(call cc-option,-mcpu=5249,-m5200)
cpuflags-$(CONFIG_M520x)        := $(call cc-option,-mcpu=5208,-m5200)
cpuflags-$(CONFIG_M5206e)       := $(call cc-option,-mcpu=5206e,-m5200)
cpuflags-$(CONFIG_M5206)        := $(call cc-option,-mcpu=5206,-m5200)

The problem is that whether MULU.L exists depends on the targeted
architecture, and *that* depends on this Makefile trickery, not
just CONFIG symbols...

Oh, f*** me.


I misremembered.  That problem exists, but only for DIVU.L.  As I said in
the comments (which I wrote *after* deciding I needed this approach), all
ColdFire have MULU.L.  It's DIVU.L that's missing from some early ones.

You're absolutely right.  MULU.L support can be done perfectly from
CONFIG_ options.


Improved patch coming in a few minutes.  My sincere apologies.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08v2/10] m68k: Add <asm/archhash.h>
  2016-05-25  8:07   ` Geert Uytterhoeven
  2016-05-25  8:19     ` George Spelvin
@ 2016-05-25  8:24     ` George Spelvin
  2016-05-25  8:48       ` Geert Uytterhoeven
  1 sibling, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-05-25  8:24 UTC (permalink / raw)
  To: geert, linux; +Cc: gerg, linux-kernel, linux-m68k, tglx, torvalds

This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647
for the original mc68000, which lacks a 32x32-bit multiply instruction.

Yes, the amount of optimization effort put in is excessive. :-)

Addition chains found by Yevgen Voronenko's Hcub algorithm at
http://spiral.ece.cmu.edu/mcm/gen.html

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
---
 arch/m68k/Kconfig.cpu            |  1 +
 arch/m68k/include/asm/archhash.h | 58 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)
 create mode 100644 arch/m68k/include/asm/archhash.h

diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
index 0dfcf128..bf3de464 100644
--- a/arch/m68k/Kconfig.cpu
+++ b/arch/m68k/Kconfig.cpu
@@ -40,6 +40,7 @@ config M68000
 	select CPU_HAS_NO_MULDIV64
 	select CPU_HAS_NO_UNALIGNED
 	select GENERIC_CSUM
+	select HAVE_ARCH_HASH
 	help
 	  The Freescale (was Motorola) 68000 CPU is the first generation of
 	  the well known M68K family of processors. The CPU core as well as
diff --git a/arch/m68k/include/asm/archhash.h b/arch/m68k/include/asm/archhash.h
new file mode 100644
index 00000000..2532cf92
--- /dev/null
+++ b/arch/m68k/include/asm/archhash.h
@@ -0,0 +1,58 @@
+#ifndef _ASM_ARCHHASH_H
+#define _ASM_ARCHHASH_H
+
+/*
+ * The only 68k processors that lack MULU.L and so need this workaround
+ * are the original 68000 and 68010.
+ */
+#if defined(CONFIG_M68000) || defined(CONFIG_M68010)
+
+#define HAVE_ARCH__HASH_32 1
+/*
+ * While it would be legal to substitute a different hash operation
+ * entirely, let's keep it simple and just use an optimized multiply
+ * by GOLDEN_RATIO_32 = 0x61C88647.
+ *
+ * The best way to do that appears to be to multiply by 0x8647 with
+ * shifts and adds, and use mulu.w to multiply the high half by 0x61C8.
+ *
+ * Because the 68000 has multi-cycle shifts, this addition chain is
+ * chosen to minimise the shift distances.
+ *
+ * Despite every attempt to spoon-feed GCC simple operations, GCC 6.1.1
+ * doggedly insists on doing annoying things like converting "lsl.l #2,<reg>"
+ * (12 cycles) to two adds (8+8 cycles).
+ *
+ * It also likes to notice two shifts in a row, like "a = x << 2" and
+ * "a <<= 7", and convert that to "a = x << 9".  But shifts longer than
+ * 8 bits are extra-slow on m68k, so that's a lose.
+ *
+ * Since the 68000 is a very simple in-order processor with no instruction
+ * scheduling effects on execution time, we can safely take it out of GCC's
+ * hands and write one big asm() block.
+ *
+ * Without calling overhead, this operation is 30 bytes (14 instructions
+ * plus one immediate constant) and 166 cycles.
+ */
+static inline u32 __attribute_const__ __hash_32(u32 x)
+{
+	u32 a, b;
+
+	asm(   "move.l %2,%0"	/* 0x0001 */
+	"\n	lsl.l #2,%0"	/* 0x0004 */
+	"\n	move.l %0,%1"
+	"\n	lsl.l #7,%0"	/* 0x0200 */
+	"\n	add.l %2,%0"	/* 0x0201 */
+	"\n	add.l %0,%1"	/* 0x0205 */
+	"\n	add.l %0,%0"	/* 0x0402 */
+	"\n	add.l %0,%1"	/* 0x0607 */
+	"\n	lsl.l #5,%0"	/* 0x8040 */
+				/* 0x8647 */
+	: "=&d" (a), "=&r" (b)
+	: "g" (x));
+
+	return ((u16)(x*0x61c8) << 16) + a + b;
+}
+#endif	/* HAVE_ARCH__HASH_32 */
+
+#endif	/* _ASM_ARCHHASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 08v2/10] m68k: Add <asm/archhash.h>
  2016-05-25  8:24     ` [PATCH 08v2/10] " George Spelvin
@ 2016-05-25  8:48       ` Geert Uytterhoeven
  0 siblings, 0 replies; 55+ messages in thread
From: Geert Uytterhoeven @ 2016-05-25  8:48 UTC (permalink / raw)
  To: George Spelvin
  Cc: Greg Ungerer, linux-kernel, linux-m68k, Thomas Gleixner, Linus Torvalds

On Wed, May 25, 2016 at 10:24 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
> --- a/arch/m68k/Kconfig.cpu
> +++ b/arch/m68k/Kconfig.cpu
> @@ -40,6 +40,7 @@ config M68000
>         select CPU_HAS_NO_MULDIV64
>         select CPU_HAS_NO_UNALIGNED
>         select GENERIC_CSUM
> +       select HAVE_ARCH_HASH
>         help
>           The Freescale (was Motorola) 68000 CPU is the first generation of
>           the well known M68K family of processors. The CPU core as well as
> diff --git a/arch/m68k/include/asm/archhash.h b/arch/m68k/include/asm/archhash.h
> new file mode 100644
> index 00000000..2532cf92
> --- /dev/null
> +++ b/arch/m68k/include/asm/archhash.h
> @@ -0,0 +1,58 @@
> +#ifndef _ASM_ARCHHASH_H
> +#define _ASM_ARCHHASH_H
> +
> +/*
> + * The only 68k processors that lack MULU.L and so need this workaround
> + * are the original 68000 and 68010.
> + */
> +#if defined(CONFIG_M68000) || defined(CONFIG_M68010)

As I said before, I don't think you need this check, given HAVE_ARCH_HASH is
selected by M68000, and M68010 doesn't exist.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/10] String hash improvements
  2016-05-25  8:11     ` George Spelvin
@ 2016-05-25  8:50       ` Geert Uytterhoeven
  2016-05-25  9:07         ` George Spelvin
  0 siblings, 1 reply; 55+ messages in thread
From: Geert Uytterhoeven @ 2016-05-25  8:50 UTC (permalink / raw)
  To: George Spelvin
  Cc: alistair.francis, Bruce Fields, Greg Ungerer, Jeff Layton,
	linux-kernel, linux-m68k, open list:NFS, SUNRPC, AND...,
	Michal Simek, Thomas Gleixner, Linus Torvalds, uclinux-h8-devel,
	Yoshinori Sato

On Wed, May 25, 2016 at 10:11 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
> Geert Uytterhoeven wrote:
>> Usually this is handled through include/asm-generic/.
>> Put the generic default implementation in include/asm-generic/hash.h.
>>
>> Architectures that need to override provide their own version, e.g.
>> arch/m68k/include/asm/hash.h. They may #include <asm-generic/hash.h>
>> if they still want to reuse parts of the generic implementation.
>>
>> Other architectures add "generic-y += hash.h" to their
>> arch/<ARCH>/include/asm/Kbuild.
>
> I thought about that, but then I'd have to edit *every* architecture,
> and might need acks from all the maintainers.
>
> I was looking for something that was a total no-op on most architectures.
>
> But if this is preferred, it's not technically difficult at all.

As you only include <asm/archhash.h> if CONFIG_HAVE_ARCH_HASH
is defined, you can also just call the arch-specific one <asm/hash.h>.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/10] m68k: Add <asm/archhash.h>
  2016-05-25  7:34 ` George Spelvin
  2016-05-25  8:07   ` Geert Uytterhoeven
@ 2016-05-25  8:56   ` Philippe De Muyter
  2016-05-25  9:14     ` George Spelvin
  2016-05-25 13:24   ` Philippe De Muyter
  2016-05-26 17:19   ` [PATCH v2 08/10] m68k: Add <asm/hash.h> George Spelvin
  3 siblings, 1 reply; 55+ messages in thread
From: Philippe De Muyter @ 2016-05-25  8:56 UTC (permalink / raw)
  To: George Spelvin; +Cc: geert, gerg, linux-kernel, linux-m68k, torvalds, tglx

On Wed, May 25, 2016 at 03:34:55AM -0400, George Spelvin wrote:
> +static inline u32 __attribute_const__ __hash_32(u32 x)
> +{
> +	u32 a, b;
> +
> +	asm(   "move.l %2,%0"	/* 0x0001 */
> +	"\n	lsl.l #2,%0"	/* 0x0004 */
> +	"\n	move.l %0,%1"
> +	"\n	lsl.l #7,%0"	/* 0x0200 */
> +	"\n	add.l %2,%0"	/* 0x0201 */
> +	"\n	add.l %0,%1"	/* 0x0205 */
> +	"\n	add.l %0,%0"	/* 0x0402 */
> +	"\n	add.l %0,%1"	/* 0x0607 */
> +	"\n	lsl.l #5,%0"	/* 0x8040 */
> +				/* 0x8647 */

There is no standard way to write asm in the kernel, but I prefer
a simple semicolon after each insn

	asm("move.l	%2,%0;"	/* 0x0001 */
	    "lsl.l	#2,%0;"	/* 0x0004 */
	    "move.l	%0,%1;"
	    "lsl.l	#7,%0;"	/* 0x0200 */
	    "add.l	%2,%0;"	/* 0x0201 */
	    "add.l	%0,%1;"	/* 0x0205 */
	    "add.l	%0,%0;"	/* 0x0402 */
	    "add.l	%0,%1;"	/* 0x0607 */
	    "lsl.l	#5,%0"	/* 0x8040 */
				/* 0x8647 */

Also, it took me some time to understand the hexadecimal constants
in the comments (and the last one predicts a future event :)).

> +	: "=&d" (a), "=&r" (b)
> +	: "g" (x));
> +
> +	return ((u16)(x*0x61c8) << 16) + a + b;
> +}

Just my two cents

Philippe

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/10] String hash improvements
  2016-05-25  8:50       ` Geert Uytterhoeven
@ 2016-05-25  9:07         ` George Spelvin
  0 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  9:07 UTC (permalink / raw)
  To: geert, linux
  Cc: alistair.francis, bfields, gerg, jlayton, linux-kernel,
	linux-m68k, linux-nfs, michal.simek, tglx, torvalds,
	uclinux-h8-devel, ysato

>> +#if defined(CONFIG_M68000) || defined(CONFIG_M68010)

> As I said before, I don't think you need this check, given HAVE_ARCH_HASH is
> selected by M68000, and M68010 doesn't exist.

I was going belt & suspenders on general principles, but yes, I'm happy
to leave it out.

I noticed that CONFIG_M68010 doesn't exist in Linus' tree, but you
recommended it, so I thought you might know something I don't.

> As you only include <asm/archhash.h> if CONFIG_HAVE_ARCH_HASH
> is defined, you can also just call the arch-specific one <asm/hash.h>.

Yes, that's a possibility, too.  But weren't we still discussing whether
I should use conditional #inclusion based on a symbol, or asm-generic?

If neither of us has a killer argument that convinces the other, style
issues like this are amenable to voting, so I was going to wait a little
bit for others to chime in.


Thank you very much for the comments!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/10] m68k: Add <asm/archhash.h>
  2016-05-25  8:56   ` [PATCH 08/10] " Philippe De Muyter
@ 2016-05-25  9:14     ` George Spelvin
  2016-05-25  9:31       ` Andreas Schwab
  2016-05-25  9:51       ` Philippe De Muyter
  0 siblings, 2 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25  9:14 UTC (permalink / raw)
  To: linux, phdm; +Cc: geert, gerg, linux-kernel, linux-m68k, tglx, torvalds

> On Wed, May 25, 2016 at 03:34:55AM -0400, George Spelvin wrote:
>> +static inline u32 __attribute_const__ __hash_32(u32 x)
>> +{
>> +	u32 a, b;
>> +
>> +	asm(   "move.l %2,%0"	/* 0x0001 */
>> +	"\n	lsl.l #2,%0"	/* 0x0004 */
>> +	"\n	move.l %0,%1"
>> +	"\n	lsl.l #7,%0"	/* 0x0200 */
>> +	"\n	add.l %2,%0"	/* 0x0201 */
>> +	"\n	add.l %0,%1"	/* 0x0205 */
>> +	"\n	add.l %0,%0"	/* 0x0402 */
>> +	"\n	add.l %0,%1"	/* 0x0607 */
>> +	"\n	lsl.l #5,%0"	/* 0x8040 */
>> +				/* 0x8647 */

> There is no standard way to write asm in the kernel, but I prefer
> a simple semicolon after each insn

I did it the way I did above because it makes the gcc -S output very
legible.  Just like I put a space before the perands on m68k but a tab
on h8300: that's what GCC does on those platforms.

I started with the "\n\t" suffixes on each line like so much other
kernel code, but then figured out the format above which is legible
both in C source and compiler output.

>>	asm("move.l	%2,%0;"	/* 0x0001 */
>>	    "lsl.l	#2,%0;"	/* 0x0004 */
>>	    "move.l	%0,%1;"
>>	    "lsl.l	#7,%0;"	/* 0x0200 */
>>	    "add.l	%2,%0;"	/* 0x0201 */
>>	    "add.l	%0,%1;"	/* 0x0205 */
>>	    "add.l	%0,%0;"	/* 0x0402 */
>>	    "add.l	%0,%1;"	/* 0x0607 */
>>	    "lsl.l	#5,%0"	/* 0x8040 */
>>				/* 0x8647 */

> Also, it took me some time to understand the hexadecimal constants
> in the comments (and the last one predicts a future event :)).


Can you recmmend a better way to comment this?  My nose is so deep
in the code it's hard for me to judge.

> Just my two cents

And thank you very much for them!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/10] m68k: Add <asm/archhash.h>
  2016-05-25  9:14     ` George Spelvin
@ 2016-05-25  9:31       ` Andreas Schwab
  2016-05-25  9:51       ` Philippe De Muyter
  1 sibling, 0 replies; 55+ messages in thread
From: Andreas Schwab @ 2016-05-25  9:31 UTC (permalink / raw)
  To: George Spelvin
  Cc: phdm, geert, gerg, linux-kernel, linux-m68k, tglx, torvalds

"George Spelvin" <linux@sciencehorizons.net> writes:

> Can you recmmend a better way to comment this?  My nose is so deep
> in the code it's hard for me to judge.

It's probably best to express the effect of the insns in plain C.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/10] m68k: Add <asm/archhash.h>
  2016-05-25  9:14     ` George Spelvin
  2016-05-25  9:31       ` Andreas Schwab
@ 2016-05-25  9:51       ` Philippe De Muyter
  1 sibling, 0 replies; 55+ messages in thread
From: Philippe De Muyter @ 2016-05-25  9:51 UTC (permalink / raw)
  To: George Spelvin; +Cc: geert, gerg, linux-kernel, linux-m68k, tglx, torvalds

On Wed, May 25, 2016 at 05:14:35AM -0400, George Spelvin wrote:
> 
> I did it the way I did above because it makes the gcc -S output very
> legible.  Just like I put a space before the perands on m68k but a tab
> on h8300: that's what GCC does on those platforms.
> 
> I started with the "\n\t" suffixes on each line like so much other
> kernel code, but then figured out the format above which is legible
> both in C source and compiler output.

OK thanks.

> 
> >>	asm("move.l	%2,%0;"	/* 0x0001 */
> >>	    "lsl.l	#2,%0;"	/* 0x0004 */
> >>	    "move.l	%0,%1;"
> >>	    "lsl.l	#7,%0;"	/* 0x0200 */
> >>	    "add.l	%2,%0;"	/* 0x0201 */
> >>	    "add.l	%0,%1;"	/* 0x0205 */
> >>	    "add.l	%0,%0;"	/* 0x0402 */
> >>	    "add.l	%0,%1;"	/* 0x0607 */
> >>	    "lsl.l	#5,%0"	/* 0x8040 */
> >>				/* 0x8647 */
> 
> > Also, it took me some time to understand the hexadecimal constants
> > in the comments (and the last one predicts a future event :)).
> 
> 
> Can you recmmend a better way to comment this?  My nose is so deep
> in the code it's hard for me to judge.

I second Andreas' suggestion.

Philippe
-- 
Philippe De Muyter +32 2 6101532 Macq SA rue de l'Aeronef 2 B-1140 Bruxelles

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/10] m68k: Add <asm/archhash.h>
  2016-05-25  7:34 ` George Spelvin
  2016-05-25  8:07   ` Geert Uytterhoeven
  2016-05-25  8:56   ` [PATCH 08/10] " Philippe De Muyter
@ 2016-05-25 13:24   ` Philippe De Muyter
  2016-05-25 13:42     ` George Spelvin
  2016-05-26 17:19   ` [PATCH v2 08/10] m68k: Add <asm/hash.h> George Spelvin
  3 siblings, 1 reply; 55+ messages in thread
From: Philippe De Muyter @ 2016-05-25 13:24 UTC (permalink / raw)
  To: George Spelvin; +Cc: geert, gerg, linux-kernel, linux-m68k, torvalds, tglx

On Wed, May 25, 2016 at 03:34:55AM -0400, George Spelvin wrote:
> This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647
> for the original mc68000, which lacks a 32x32-bit multiply instruction.
> 
> Addition chains found by Yevgen Voronenko's Hcub algorithm at
> http://spiral.ece.cmu.edu/mcm/gen.html

Shouldn't you put that reference in the comments of your archhash.h file ?

Philippe

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/10] m68k: Add <asm/archhash.h>
  2016-05-25 13:24   ` Philippe De Muyter
@ 2016-05-25 13:42     ` George Spelvin
  0 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-25 13:42 UTC (permalink / raw)
  To: linux, phdm; +Cc: geert, gerg, linux-kernel, linux-m68k, tglx, torvalds

Philippe De Muyter wrote:
> On Wed, May 25, 2016 at 03:34:55AM -0400, George Spelvin wrote:
>> Addition chains found by Yevgen Voronenko's Hcub algorithm at
>> http://spiral.ece.cmu.edu/mcm/gen.html

> Shouldn't you put that reference in the comments of your archhash.h file ?

I don't really care either way, but generally comments show what the
code does and commit messages talk about how it was created and by whom.
That references seemed to fall into the latter category.

Rationales (*why* it does what it does) can go in both places, with the
commit message providing more room.


I have a revised set of arch/ patches including all of the suggestions
made so far, currently awaiting the requested self-test.

(I found a clean way to do it using the *value* of the HAVE_FOO define
to indicate whether the function is meant to be equivalent to the
generic one.  If it's 1, the self-test will compare the arch-specific
and generic implementations.)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/10] String hash improvements
  2016-05-25  7:20 ` [PATCH 00/10] String hash improvements George Spelvin
  2016-05-25  8:00   ` Geert Uytterhoeven
@ 2016-05-25 16:08   ` Linus Torvalds
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
  2016-06-02 22:59     ` [PATCH " Fubo Chen
  2016-05-26 17:09   ` [PATCH v2 " George Spelvin
  2 siblings, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2016-05-25 16:08 UTC (permalink / raw)
  To: George Spelvin
  Cc: lkml, alistair.francis, Bruce Fields, geert, gerg, jlayton,
	linux-m68k, linux-nfs, michal.simek, Thomas Gleixner,
	uclinux-h8-devel, ysato

On Wed, May 25, 2016 at 12:20 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
>
> Well, they're close, and I can and did *get* them ready.

Ok, thanks. For some odd reason all your emails in this series got
marked as spam. Every single one, including the cover letter (but not
your replies to the replies to this).

Stupidly, I unmarked them before thinking to ask google *why* they got
marked as spam.

Did you do something different to send this series than you usually do?

Would you mind sending it again (maybe you have changes due to some o
the comments, but even if not, I'd like to see what the spam filter
says)? You could do it just to me to avoid filling up the mailing list
with duplicates.

                    Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 00/10] String hash improvements
  2016-05-25  7:20 ` [PATCH 00/10] String hash improvements George Spelvin
  2016-05-25  8:00   ` Geert Uytterhoeven
  2016-05-25 16:08   ` Linus Torvalds
@ 2016-05-26 17:09   ` George Spelvin
  2 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-26 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: alistair.francis, bfields, geert, gerg, linux-m68k, michal.simek,
	phdm, schwab, uclinux-h8-devel, ysato

This is just the arch-specific part, updated as per requests.

* Fix the stupidly overcomplex m68k conditionals (per Geert Uytterhoeven)
* Renamed the arch file to <asm/hash.h> (per Geert Uytterhoeven)
* Improved the comments on the progress of shift-and-add (Philippe De Muyter)
* Added a self-test (per Michal Simek)

Things I did *not* do, because I thought the concerns were withdrawn
after discussion:  (If I misintrepreted silence, please correct me!)

* Use unconditional inclusion with an <asm-generic/hash.h> fallback.
* Change the multi-line asm() formatting (Philippe De Muyter)
* Move reference about algorithm credit (Philippe De Muyter)

The big change is the addition of a self-test in lib/test_hash.c.
That had to be written and the various tests validated by introducing
deliberate bugs into <arch/hash.h>.

Handling the fact that architectures are allowed to change the computed
function (even though none of the ones written so far have used that
freedom) added some complexity.  I ended up using the value of the
HAVE_ARCH_* macro.  If it's 1, that means it's expected to be a clone
of the generic versions, and it's compared against them.  If it's 0,
it's its own magic thing.

The big question is, should these go through the arch trees, or may I
just include them in the series to Linus?  The latter is simpler for me,
but obviously not okay without permission.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 07/10] <linux/hash.h>: Add support for architecture-specific functions
  2016-05-25  7:33 ` [PATCH 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
@ 2016-05-26 17:16   ` George Spelvin
  0 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-26 17:16 UTC (permalink / raw)
  To: linux-kernel, linux
  Cc: alistair.francis, bfields, geert, gerg, linux-m68k, michal.simek,
	phdm, schwab, uclinux-h8-devel, ysato

This is just the infrastructure; there are no users yet.

This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/hash.h>.

That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis <alistai@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
---
 arch/Kconfig         |   8 ++
 fs/namei.c           |   6 +-
 include/linux/hash.h |  24 ++++-
 lib/Kconfig.debug    |  11 +++
 lib/Makefile         |   1 +
 lib/test_hash.c      | 250 +++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 296 insertions(+), 4 deletions(-)
 create mode 100644 lib/test_hash.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 81869a5e..96406e4d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -589,6 +589,14 @@ config HAVE_STACK_VALIDATION
 	  Architecture supports the 'objtool check' host tool command, which
 	  performs compile-time stack metadata validation.
 
+config HAVE_ARCH_HASH
+	bool
+	default n
+	help
+	  If this is set, the architecture provides an <asm/hash.h>
+	  file which provides platform-specific implementations of some
+	  functions in <linux/hash.h> or fs/namei.c.
+
 #
 # ABI hall of shame
 #
diff --git a/fs/namei.c b/fs/namei.c
index 2b8d0650..cb438b84 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1788,7 +1788,11 @@ static int walk_component(struct nameidata *nd, int flags)
 
 #include <asm/word-at-a-time.h>
 
-#ifdef CONFIG_64BIT
+#ifdef HASH_MIX
+
+/* Architecture provides HASH_MIX and fold_hash() in <asm/hash.h> */
+
+#elif defined(CONFIG_64BIT)
 /*
  * Register pressure in the mixing function is an issue, particularly
  * on 32-bit x86, but almost any function requires one state value and
diff --git a/include/linux/hash.h b/include/linux/hash.h
index 8926f369..ab62a9a4 100644
--- a/include/linux/hash.h
+++ b/include/linux/hash.h
@@ -41,18 +41,36 @@
 #define GOLDEN_RATIO_32 0x61C88647
 #define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
-static inline u32 __hash_32(u32 val)
+#ifdef CONFIG_HAVE_ARCH_HASH
+/* This header may use the GOLDEN_RATIO_xx constants */
+#include <asm/hash.h>
+#endif
+
+/*
+ * The _generic versions exist only so lib/test_hash.c can
+ * compare the arch-optimized versions with the generic.
+ */
+#ifndef HAVE_ARCH__HASH_32
+#define __hash_32 __hash_32_generic
+#endif
+static inline u32 __hash_32_generic(u32 val)
 {
 	return val * GOLDEN_RATIO_32;
 }
 
-static inline u32 hash_32(u32 val, unsigned int bits)
+#ifndef HAVE_ARCH_HASH_32
+#define hash_32 hash_32_generic
+#endif
+static inline u32 hash_32_generic(u32 val, unsigned int bits)
 {
 	/* High bits are more random, so use them. */
 	return __hash_32(val) >> (32 - bits);
 }
 
-static __always_inline u32 hash_64(u64 val, unsigned int bits)
+#ifndef HAVE_ARCH_HASH_64
+#define hash_64 hash_64_generic
+#endif
+static __always_inline u32 hash_64_generic(u64 val, unsigned int bits)
 {
 	if (__builtin_constant_p(bits > 32 || bits == 0)) {
 		BUILD_BUG_ON(bits > 32 || bits == 0);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1e9a6075..18ec69ba 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1815,6 +1815,17 @@ config TEST_RHASHTABLE
 
 	  If unsure, say N.
 
+config TEST_HASH
+	tristate "Perform selftest on hash functions"
+	default n
+	help
+	  Enable this option to test the kernel's integer (<linux/hash,h>)
+	  and string (<linux/stringhash.h>) hash functions on boot
+	  (or module load).
+
+	  This is intended to help people writing architecture-specific
+	  optimized versions.  If unsure, say N.
+
 endmenu # runtime tests
 
 config PROVIDE_OHCI1394_DMA_INIT
diff --git a/lib/Makefile b/lib/Makefile
index 9d9804e5..16c3bda5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
 obj-y += kstrtox.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
 obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
+obj-$(CONFIG_TEST_HASH) += test_hash.o
 obj-$(CONFIG_TEST_KASAN) += test_kasan.o
 obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
 obj-$(CONFIG_TEST_LKM) += test_module.o
diff --git a/lib/test_hash.c b/lib/test_hash.c
new file mode 100644
index 00000000..bb43dac1
--- /dev/null
+++ b/lib/test_hash.c
@@ -0,0 +1,250 @@
+/*
+ * Test cases for <linux/hash.h> and <linux/stringhash.h>
+ * This just verifies that various ways of computing a hash
+ * produce the same thing and, for cases where a k-bit hash
+ * value is requested, is of the requested size.
+ *
+ * We fill a buffer with a 255-byte null-terminated string,
+ * and use both full_name_hash() and hash_string() to hash the
+ * substrings from i to j, where 0 <= i < j < 256.
+ *
+ * The returned values are used to check that __hash_32() and
+ * __hash_32_generic() compute the same thing.  Likewise hash_32()
+ * and hash_64().
+ * */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt "\n"
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/hash.h>
+#include <linux/stringhash.h>
+#include <linux/printk.h>
+
+/* 32-bit XORSHIFT generator.  Seed must not be zero. */
+static u32 __init __attribute_const__
+xorshift(u32 seed)
+{
+	seed ^= seed << 13;
+	seed ^= seed >> 17;
+	seed ^= seed << 5;
+	return seed;
+}
+
+/* Given a non-zero x, returns a non-zero byte. */
+static u8 __init __attribute_const__
+mod255(u32 x)
+{
+	x = (x & 0xffff) + (x >> 16);	/* 1 <= x <= 0x1fffe */
+	x = (x & 0xff) + (x >> 8);	/* 1 <= x <= 0x2fd */
+	x = (x & 0xff) + (x >> 8);	/* 1 <= x <= 0x100 */
+	x = (x & 0xff) + (x >> 8);	/* 1 <= x <= 0xff */
+	return x;
+}
+
+/* Fill the buffer with non-zero bytes. */
+static void __init
+fill_buf(char *buf, size_t len, u32 seed)
+{
+	size_t i;
+
+	for (i = 0; i < len; i++) {
+		seed = xorshift(seed);
+		buf[i] = mod255(seed);
+	}
+}
+
+/*
+ * Test the various integer hash functions.  h64 (or its low-order bits)
+ * is the integer to hash.  hash_or accumulates the OR of the hash values,
+ * which are later checked to see that they cover all the requested bits.
+ *
+ * Because these functions (as opposed to the string hashes) are all
+ * inline, the code being tested is actually in the module, and you can
+ * recompile and re-test the module without rebooting.
+ */
+static bool __init
+test_int_hash(unsigned long long h64, u32 hash_or[2][33])
+{
+	int k;
+	u32 h0 = (u32)h64, h1, h2;
+
+	/* Test __hash32 */
+	hash_or[0][0] |= h1 = __hash_32(h0);
+#ifdef HAVE_ARCH__HASH_32
+	hash_or[1][0] |= h2 = __hash_32_generic(h0);
+#if HAVE_ARCH__HASH_32 == 1
+	if (h1 != h2) {
+		pr_err("__hash_32(%#x) = %#x != __hash_32_generic() = %#x",
+			h0, h1, h2);
+		return false;
+	}
+#endif
+#endif
+
+	/* Test k = 1..32 bits */
+	for (k = 1; k <= 32; k++) {
+		u32 const m = ((u32)2 << (k-1)) - 1;	/* Low k bits set */
+
+		/* Test hash_32 */
+		hash_or[0][k] |= h1 = hash_32(h0, k);
+		if (h1 > m) {
+			pr_err("hash_32(%#x, %d) = %#x > %#x", h0, k, h1, m);
+			return false;
+		}
+#ifdef HAVE_ARCH_HASH_32
+		h2 = hash_32_generic(h0, k);
+#if HAVE_ARCH_HASH_32 == 1
+		if (h1 != h2) {
+			pr_err("hash_32(%#x, %d) = %#x != hash_32_generic() "
+				" = %#x", h0, k, h1, h2);
+			return false;
+		}
+#else
+		if (h2 > m) {
+			pr_err("hash_32_generic(%#x, %d) = %#x > %#x",
+				h0, k, h1, m);
+			return false;
+		}
+#endif
+#endif
+		/* Test hash_64 */
+		hash_or[1][k] |= h1 = hash_64(h64, k);
+		if (h1 > m) {
+			pr_err("hash_64(%#llx, %d) = %#x > %#x", h64, k, h1, m);
+			return false;
+		}
+#ifdef HAVE_ARCH_HASH_64
+		h2 = hash_64_generic(h64, k);
+#if HAVE_ARCH_HASH_64 == 1
+		if (h1 != h2) {
+			pr_err("hash_64(%#llx, %d) = %#x != hash_64_generic() "
+				"= %#x", h64, k, h1, h2);
+			return false;
+		}
+#else
+		if (h2 > m) {
+			pr_err("hash_64_generic(%#llx, %d) = %#x > %#x",
+				h64, k, h1, m);
+			return false;
+		}
+#endif
+#endif
+	}
+
+	(void)h2;	/* Suppress unused variable warning */
+	return true;
+}
+
+#define SIZE 256	/* Run time is cubic in SIZE */
+
+static int __init
+test_hash_init(void)
+{
+	char buf[SIZE];
+	u32 string_or = 0, hash_or[2][33] = { 0 };
+	unsigned tests = 0;
+	unsigned long long h64 = 0;
+	int i, j;
+
+	fill_buf(buf, SIZE-1, 1);
+
+	/* Test every possible non-empty substring in the buffer. */
+	for (j = SIZE; j > 0; --j) {
+		buf[j-1] = '\0';
+
+		for (i = 0; i < j; i++) {
+			u32 h0 = full_name_hash(buf+i, j-i);
+			u64 hashlen = hash_string(buf+i);
+
+			/* Check that hash_string gets the length right */
+			if (hashlen_len(hashlen) != j-i-1) {
+				pr_err("hash_string(%d..%d) returned length "
+					"%u, expected %d",
+					i, j, hashlen_len(hashlen), j-i-1);
+				return -EINVAL;
+			}
+			/* Check that the hashes match */
+			if (hashlen_hash(hashlen) != h0) {
+				pr_err("hash_string(%d..%d) = %08x != "
+					"full_name_hash() = %08x",
+					i, j, hashlen_hash(hashlen), h0);
+				return -EINVAL;
+			}
+
+			string_or |= h0;
+			h64 = h64 << 32 | h0;	/* For use with hash_64 */
+			if (!test_int_hash(h64, hash_or))
+				return -EINVAL;
+			tests++;
+		} /* i */
+	} /* j */
+
+	/* The OR of all the hash values should cover all the bits */
+	if (~string_or) {
+		pr_err("OR of all string hash results = %#x != %#x",
+			string_or, -1u);
+		return -EINVAL;
+	}
+	if (~hash_or[0][0]) {
+		pr_err("OR of all __hash_32 results = %#x != %#x",
+			hash_or[0][0], -1u);
+		return -EINVAL;
+	}
+#ifdef HAVE_ARCH__HASH_32
+#if HAVE_ARCH__HASH_32 != 1	/* Test is pointless if results match */
+	if (~hash_or[1][0]) {
+		pr_err("OR of all __hash_32_generic results = %#x != %#x",
+			hash_or[1][0], -1u);
+		return -EINVAL;
+	}
+#endif
+#endif
+
+	/* Likewise for all the i-bit hash values */
+	for (i = 1; i <= 32; i++) {
+		u32 const m = ((u32)2 << (i-1)) - 1;	/* Low i bits set */
+
+		if (hash_or[0][i] != m) {
+			pr_err("OR of all hash_32(%d) results = %#x "
+				"(%#x expected)", i, hash_or[0][i], m);
+			return -EINVAL;
+		}
+		if (hash_or[1][i] != m) {
+			pr_err("OR of all hash_64(%d) results = %#x "
+				"(%#x expected)", i, hash_or[1][i], m);
+			return -EINVAL;
+		}
+	}
+
+	/* Issue notices about skipped tests. */
+#ifndef HAVE_ARCH__HASH_32
+	pr_info("__hash_32() has no arch implementation to test.");
+#elif HAVE_ARCH__HASH_32 != 1
+	pr_info("__hash_32() is arch-specific; not compared to generic.");
+#endif
+#ifndef HAVE_ARCH_HASH_32
+	pr_info("hash_32() has no arch implementation to test.");
+#elif HAVE_ARCH_HASH_32 != 1
+	pr_info("hash_32() is arch-specific; not compared to generic.");
+#endif
+#ifndef HAVE_ARCH_HASH_64
+	pr_info("hash_64() has no arch implementation to test.");
+#elif HAVE_ARCH_HASH_64 != 1
+	pr_info("hash_64() is arch-specific; not compared to generic.");
+#endif
+
+	pr_notice("%u tests passed.", tests);
+
+	return 0;
+}
+
+static void __exit test_hash_exit(void)
+{
+}
+
+module_init(test_hash_init);	/* Does everything */
+module_exit(test_hash_exit);	/* Does nothing */
+
+MODULE_LICENSE("GPL");
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 08/10] m68k: Add <asm/hash.h>
  2016-05-25  7:34 ` George Spelvin
                     ` (2 preceding siblings ...)
  2016-05-25 13:24   ` Philippe De Muyter
@ 2016-05-26 17:19   ` George Spelvin
  3 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-26 17:19 UTC (permalink / raw)
  To: geert, gerg, linux-kernel, linux-m68k, linux
  Cc: alistair.francis, michal.simek, phdm, schwab, uclinux-h8-devel, ysato

This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647
for the original mc68000, which lacks a 32x32-bit multiply instruction.

Yes, the amount of optimization effort put in is excessive. :-)

Shift-add chain found by Yevgen Voronenko's Hcub algorithm at
http://spiral.ece.cmu.edu/mcm/gen.html

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
---
 arch/m68k/Kconfig.cpu        |  1 +
 arch/m68k/include/asm/hash.h | 59 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)
 create mode 100644 arch/m68k/include/asm/hash.h

diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
index 0dfcf128..bf3de464 100644
--- a/arch/m68k/Kconfig.cpu
+++ b/arch/m68k/Kconfig.cpu
@@ -40,6 +40,7 @@ config M68000
 	select CPU_HAS_NO_MULDIV64
 	select CPU_HAS_NO_UNALIGNED
 	select GENERIC_CSUM
+	select HAVE_ARCH_HASH
 	help
 	  The Freescale (was Motorola) 68000 CPU is the first generation of
 	  the well known M68K family of processors. The CPU core as well as
diff --git a/arch/m68k/include/asm/hash.h b/arch/m68k/include/asm/hash.h
new file mode 100644
index 00000000..6407af84
--- /dev/null
+++ b/arch/m68k/include/asm/hash.h
@@ -0,0 +1,59 @@
+#ifndef _ASM_HASH_H
+#define _ASM_HASH_H
+
+/*
+ * If CONFIG_M68000=y (original mc68000/010), this file is #included
+ * to work around the lack of a MULU.L instruction.
+ */
+
+#define HAVE_ARCH__HASH_32 1
+/*
+ * While it would be legal to substitute a different hash operation
+ * entirely, let's keep it simple and just use an optimized multiply
+ * by GOLDEN_RATIO_32 = 0x61C88647.
+ *
+ * The best way to do that appears to be to multiply by 0x8647 with
+ * shifts and adds, and use mulu.w to multiply the high half by 0x61C8.
+ *
+ * Because the 68000 has multi-cycle shifts, this addition chain is
+ * chosen to minimise the shift distances.
+ *
+ * Despite every attempt to spoon-feed it simple operations, GCC
+ * 6.1.1 doggedly insists on doing annoying things like converting
+ * "lsl.l #2,<reg>" (12 cycles) to two adds (8+8 cycles).
+ *
+ * It also likes to notice two shifts in a row, like "a = x << 2" and
+ * "a <<= 7", and convert that to "a = x << 9".  But shifts longer
+ * than 8 bits are extra-slow on m68k, so that's a lose.
+ *
+ * Since the 68000 is a very simple in-order processor with no
+ * instruction scheduling effects on execution time, we can safely
+ * take it out of GCC's hands and write one big asm() block.
+ *
+ * Without calling overhead, this operation is 30 bytes (14 instructions
+ * plus one immediate constant) and 166 cycles.
+ *
+ * (Because %2 is fetched twice, it can't be postincrement, and thus it
+ * can't be a fully general "g" or "m".  Register is preferred, but
+ * offsettable memory or immediate will work.)
+ */
+static inline u32 __attribute_const__ __hash_32(u32 x)
+{
+	u32 a, b;
+
+	asm(   "move.l %2,%0"	/* a = x * 0x0001 */
+	"\n	lsl.l #2,%0"	/* a = x * 0x0004 */
+	"\n	move.l %0,%1"
+	"\n	lsl.l #7,%0"	/* a = x * 0x0200 */
+	"\n	add.l %2,%0"	/* a = x * 0x0201 */
+	"\n	add.l %0,%1"	/* b = x * 0x0205 */
+	"\n	add.l %0,%0"	/* a = x * 0x0402 */
+	"\n	add.l %0,%1"	/* b = x * 0x0607 */
+	"\n	lsl.l #5,%0"	/* a = x * 0x8040 */
+	: "=&d,d" (a), "=&r,r" (b)
+	: "r,roi?" (x));	/* a+b = x*0x8647 */
+
+	return ((u16)(x*0x61c8) << 16) + a + b;
+}
+
+#endif	/* _ASM_HASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 09/10] microblaze: Add <asm/hash.h>
  2016-05-25  7:37 ` [PATCH 09/10] microblaze: Add <asm/archhash.h> George Spelvin
@ 2016-05-26 17:21   ` George Spelvin
  0 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-26 17:21 UTC (permalink / raw)
  To: alistair.francis, linux-kernel, linux, michal.simek
  Cc: geert, gerg, linux-m68k, phdm, schwab, uclinux-h8-devel, ysato

Microblaze is an FPGA soft core that can be configured various ways.

If it is configured without a multiplier, the standard __hash_32()
will require a call to __mulsi3, which is a slow software loop.

Instead, use a shift-and-add sequence for the constant multiply.
GCC knows how to do this, but it's not as clever as some.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
---
 arch/microblaze/Kconfig            |  1 +
 arch/microblaze/include/asm/hash.h | 81 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)
 create mode 100644 arch/microblaze/include/asm/hash.h

diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 3d793b55..ce3e5125 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -16,6 +16,7 @@ config MICROBLAZE
 	select GENERIC_IRQ_SHOW
 	select GENERIC_PCI_IOMAP
 	select GENERIC_SCHED_CLOCK
+	select HAVE_ARCH_HASH
 	select HAVE_ARCH_KGDB
 	select HAVE_DEBUG_KMEMLEAK
 	select HAVE_DMA_API_DEBUG
diff --git a/arch/microblaze/include/asm/hash.h b/arch/microblaze/include/asm/hash.h
new file mode 100644
index 00000000..2f0c2a69
--- /dev/null
+++ b/arch/microblaze/include/asm/hash.h
@@ -0,0 +1,81 @@
+#ifndef _ASM_HASH_H
+#define _ASM_HASH_H
+
+/*
+ * Fortunately, most people who want to run Linux on Microblaze enable
+ * both multiplier and barrel shifter, but omitting them is technically
+ * a supported configuration.
+ *
+ * With just a barrel shifter, we can implement an efficient constant
+ * multiply using shifts and adds.  GCC can find a 9-step solution, but
+ * this 6-step solution was found by Yevgen Voronenko's implementation
+ * of the Hcub algorithm at http://spiral.ece.cmu.edu/mcm/gen.html.
+ *
+ * That software is really not designed for a single multiplier this large,
+ * but if you run it enough times with different seeds, it'll find several
+ * 6-shift, 6-add sequences for computing x * 0x61C88647.  They are all
+ *	c = (x << 19) + x;
+ *	a = (x <<  9) + c;
+ *	b = (x << 23) + a;
+ *	return (a<<11) + (b<<6) + (c<<3) - b;
+ * with variations on the order of the final add.
+ *
+ * Without even a shifter, it's hopless; any hash function will suck.
+ */
+
+#if CONFIG_XILINX_MICROBLAZE0_USE_HW_MUL == 0
+
+#define HAVE_ARCH__HASH_32 1
+
+/* Multiply by GOLDEN_RATIO_32 = 0x61C88647 */
+static inline u32 __attribute_const__ __hash_32(u32 a)
+{
+#if CONFIG_XILINX_MICROBLAZE0_USE_BARREL
+	unsigned b, c;
+
+	/* Phase 1: Compute three intermediate values */
+	b =  a << 23;
+	c = (a << 19) + a;
+	a = (a <<  9) + c;
+	b += a;
+
+	/* Phase 2: Compute (a << 11) + (b << 6) + (c << 3) - b */
+	a <<= 5;
+	a += b;		/* (a << 5) + b */
+	a <<= 3;
+	a += c;		/* (a << 8) + (b << 3) + c */
+	a <<= 3;
+	return a - b;	/* (a << 11) + (b << 6) + (c << 3) - b */
+#else
+	/*
+	 * "This is really going to hurt."
+	 *
+	 * Without a barrel shifter, left shifts are implemented as
+	 * repeated additions, and the best we can do is an optimal
+	 * addition-subtraction chain.  This one is not known to be
+	 * optimal, but at 37 steps, it's decent for a 31-bit multiplier.
+	 *
+	 * Question: given its size (37*4 = 148 bytes per instance),
+	 * and slowness, is this worth having inline?
+	 */
+	unsigned b, c, d;
+
+	b = a << 4;	/* 4    */
+	c = b << 1;	/* 1  5 */
+	b += a;		/* 1  6 */
+	c += b;		/* 1  7 */
+	c <<= 3;	/* 3 10 */
+	c -= a;		/* 1 11 */
+	d = c << 7;	/* 7 18 */
+	d += b;		/* 1 19 */
+	d <<= 8;	/* 8 27 */
+	d += a;		/* 1 28 */
+	d <<= 1;	/* 1 29 */
+	d += b;		/* 1 30 */
+	d <<= 6;	/* 6 36 */
+	return d + c;	/* 1 37 total instructions*/
+#endif
+}
+
+#endif /* !CONFIG_XILINX_MICROBLAZE0_USE_HW_MUL */
+#endif /* _ASM_HASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 10/10] h8300: Add <asm/hash.h>
  2016-05-25  7:38 ` [PATCH 10/10] h8300: Add <asm/archhash.h> George Spelvin
@ 2016-05-26 17:23   ` George Spelvin
  0 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-26 17:23 UTC (permalink / raw)
  To: linux-kernel, uclinux-h8-devel, ysato
  Cc: alistair.francis, bfields, geert, gerg, linux-m68k, linux,
	michal.simek, phdm, schwab

This will improve the performance of hash_32() and hash_64(), but due
to complete lack of multi-bit shift instructions on H8, performance will
still be bad in surrounding code.

Designing H8-specific hash algorithms to work around that is a separate
project.  (But if the maintainers would like to get in touch...)

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
---
 arch/h8300/Kconfig            |  1 +
 arch/h8300/include/asm/hash.h | 53 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)
 create mode 100644 arch/h8300/include/asm/hash.h

diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index 986ea84c..6c583dbb 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -20,6 +20,7 @@ config H8300
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZO
 	select HAVE_ARCH_KGDB
+	select HAVE_ARCH_HASH
 
 config RWSEM_GENERIC_SPINLOCK
 	def_bool y
diff --git a/arch/h8300/include/asm/hash.h b/arch/h8300/include/asm/hash.h
new file mode 100644
index 00000000..04cfbd2b
--- /dev/null
+++ b/arch/h8300/include/asm/hash.h
@@ -0,0 +1,53 @@
+#ifndef _ASM_HASH_H
+#define _ASM_HASH_H
+
+/*
+ * The later H8SX models have a 32x32-bit multiply, but the H8/300H
+ * and H8S have only 16x16->32.  Since it's tolerably compact, this is
+ * basically an inlined version of the __mulsi3 code.  Since the inputs
+ * are not expected to be small, it's also simplfied by skipping the
+ * early-out checks.
+ *
+ * (Since neither CPU has any multi-bit shift instructions, a
+ * shift-and-add version is a non-starter.)
+ *
+ * TODO: come up with an arch-specific version of the hashing in fs/namei.c,
+ * since that is heavily dependent on rotates.  Which, as mentioned, suck
+ * horribly on H8.
+ */
+
+#if defined(CONFIG_CPU_H300H) || defined(CONFIG_CPU_H8S)
+
+#define HAVE_ARCH__HASH_32 1
+
+/*
+ * Multiply by k = 0x61C88647.  Fitting this into three registers requires
+ * one extra instruction, but reducing register pressure will probably
+ * make that back and then some.
+ *
+ * GCC asm note: %e1 is the high half of operand %1, while %f1 is the
+ * low half.  So if %1 is er4, then %e1 is e4 and %f1 is r4.
+ *
+ * This has been designed to modify x in place, since that's the most
+ * common usage, but preserve k, since hash_64() makes two calls in
+ * quick succession.
+ */
+static inline u32 __attribute_const__ __hash_32(u32 x)
+{
+	u32 temp;
+
+	asm(   "mov.w	%e1,%f0"
+	"\n	mulxu.w	%f2,%0"		/* klow * xhigh */
+	"\n	mov.w	%f0,%e1"	/* The extra instruction */
+	"\n	mov.w	%f1,%f0"
+	"\n	mulxu.w	%e2,%0"		/* khigh * xlow */
+	"\n	add.w	%e1,%f0"
+	"\n	mulxu.w	%f2,%1"		/* klow * xlow */
+	"\n	add.w	%f0,%e1"
+	: "=&r" (temp), "=r" (x)
+	: "%r" (GOLDEN_RATIO_32), "1" (x));
+	return x;
+}
+
+#endif
+#endif /* _ASM_HASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 00/10] String hash improvements
  2016-05-25 16:08   ` Linus Torvalds
@ 2016-05-28 19:57     ` George Spelvin
  2016-05-28 19:57       ` [PATCH v3 01/10] Pull out string hash to <linux/stringhash.h> George Spelvin
                         ` (10 more replies)
  2016-06-02 22:59     ` [PATCH " Fubo Chen
  1 sibling, 11 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml; +Cc: J . Bruce Fields, George Spelvin

Okay, Linus, it's still warm from the oven, but I think it's fully baked.

Special thanks to J. Bruce Fields for testing and finding bugs in v1.
(I learned some humbling lessons about "obviously correct" code.)

On the arch-specific front, the m68k assembly has been tested in
a standalone test harness, I've been in contact with the Microblaze
maintainers who mostly don't care, as the hardware multiplier is never
omitted in real-world applications, and I haven't heard anything from
the H8/300 world.

You can either drop them in and we can debug them during -rc, or
I can send them through the arch trees for 4.8.

This series does several related things:
1) Makes the dcache hash (fs/namei.c) useful for general kernel use.
   (Thanks to Bruce for noticing the zero-length corner case.)
2) Converts the string hashes in <linux/sunrpc/svcauth.h> to use the above.
3) Avoids 64-bit multiplies in hash_64() on 32-bit platforms.
   Two 32-bit multiplies will do well enough.
4) Rids the world of the bad hash multipliers in hash_32.
   This finishes the job started in 689de1d6ca.
   The vast majority of Linux architectures have hardware support
   for 32x32-bit multiply and so derive no benefit from "simplified"
   multipliers.
   The few processors that do not (68000, h8/300 and some models of
   Microblaze) have arch-specific implementations added.  Those patches
   are last in the series so they can go through the relevant arch
   maintainers.
5) Overhauls the dcache hash mixing.
   The patch in 2bf0b16954 was an off-the-cuff suggestion.  Replaced with
   a much more careful design that's simultaneously faster and better.
   (My own invention, as there was noting suitable in the literature I
   could find.  Comments welcome!)

Things I thought about but can wait for now:
6) Modify the hash_name() loop to skip the initial HASH_MIX().
   That would let us salt the hash if we ever wanted to.
7) Sort out partial_name_hash().
   The hash function is declared as using a long state, even though
   it's truncated to 32 bits at the end and the extra internal state
   contributes nothing to the result.  And some callers do odd things:
   * fs/hfs/string.c only allocates 32 bits of state
   * fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes
8) Modify bytemask_from_count to handle inputs of 1..sizeof(long)
   rather than 0..sizeof(long)-1.  This would simplify users other
   than full_name_hash.

George Spelvin (10):
  Pull out string hash to <linux/stringhash.h>
  fs/namei.c: Add hashlen_string() function
  <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
  Change hash_64() return value to 32 bits
  Eliminate bad hash multipliers from hash_32() and  hash_64()
  fs/namei.c: Improve dcache hash function
  <linux/hash.h>: Add support for architecture-specific functions
  m68k: Add <asm/hash.h>
  microblaze: Add <asm/hash.h>
  h8300: Add <asm/hash.h>

 arch/Kconfig                          |   8 ++
 arch/h8300/Kconfig                    |   1 +
 arch/h8300/include/asm/hash.h         |  53 +++++++
 arch/m68k/Kconfig.cpu                 |   1 +
 arch/m68k/include/asm/hash.h          |  59 ++++++++
 arch/microblaze/Kconfig               |   1 +
 arch/microblaze/include/asm/hash.h    |  81 +++++++++++
 drivers/media/usb/dvb-usb-v2/af9015.c |   2 +
 fs/dcache.c                           |   3 +-
 fs/namei.c                            | 160 +++++++++++++++++-----
 include/linux/dcache.h                |  27 +---
 include/linux/hash.h                  | 112 +++++++--------
 include/linux/stringhash.h            |  76 +++++++++++
 include/linux/sunrpc/svcauth.h        |  40 ++----
 lib/Kconfig.debug                     |  11 ++
 lib/Makefile                          |   1 +
 lib/test_hash.c                       | 250 ++++++++++++++++++++++++++++++++++
 17 files changed, 735 insertions(+), 151 deletions(-)
 create mode 100644 arch/h8300/include/asm/hash.h
 create mode 100644 arch/m68k/include/asm/hash.h
 create mode 100644 arch/microblaze/include/asm/hash.h
 create mode 100644 include/linux/stringhash.h
 create mode 100644 lib/test_hash.c

-- 
2.8.1

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v3 01/10] Pull out string hash to <linux/stringhash.h>
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-28 19:57       ` [PATCH v3 02/10] fs/namei.c: Add hashlen_string() function George Spelvin
                         ` (9 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml; +Cc: J . Bruce Fields, George Spelvin

... so they can be used without the rest of <linux/dcache.h>

The hashlen_* macros will make sense next patch.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
---
 include/linux/dcache.h     | 27 +----------------
 include/linux/stringhash.h | 72 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 26 deletions(-)
 create mode 100644 include/linux/stringhash.h

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 7e9422cb..0f9a977c 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -10,6 +10,7 @@
 #include <linux/cache.h>
 #include <linux/rcupdate.h>
 #include <linux/lockref.h>
+#include <linux/stringhash.h>
 
 struct path;
 struct vfsmount;
@@ -52,9 +53,6 @@ struct qstr {
 };
 
 #define QSTR_INIT(n,l) { { { .len = l } }, .name = n }
-#define hashlen_hash(hashlen) ((u32) (hashlen))
-#define hashlen_len(hashlen)  ((u32)((hashlen) >> 32))
-#define hashlen_create(hash,len) (((u64)(len)<<32)|(u32)(hash))
 
 struct dentry_stat_t {
 	long nr_dentry;
@@ -65,29 +63,6 @@ struct dentry_stat_t {
 };
 extern struct dentry_stat_t dentry_stat;
 
-/* Name hashing routines. Initial hash value */
-/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */
-#define init_name_hash()		0
-
-/* partial hash update function. Assume roughly 4 bits per character */
-static inline unsigned long
-partial_name_hash(unsigned long c, unsigned long prevhash)
-{
-	return (prevhash + (c << 4) + (c >> 4)) * 11;
-}
-
-/*
- * Finally: cut down the number of bits to a int value (and try to avoid
- * losing bits)
- */
-static inline unsigned long end_name_hash(unsigned long hash)
-{
-	return (unsigned int) hash;
-}
-
-/* Compute the hash for a name string. */
-extern unsigned int full_name_hash(const unsigned char *, unsigned int);
-
 /*
  * Try to keep struct dentry aligned on 64 byte cachelines (this will
  * give reasonable cacheline footprint with larger lines without the
diff --git a/include/linux/stringhash.h b/include/linux/stringhash.h
new file mode 100644
index 00000000..2eaaaf6d
--- /dev/null
+++ b/include/linux/stringhash.h
@@ -0,0 +1,72 @@
+#ifndef __LINUX_STRINGHASH_H
+#define __LINUX_STRINGHASH_H
+
+#include <linux/types.h>
+
+/*
+ * Routines for hashing strings of bytes to a 32-bit hash value.
+ *
+ * These hash functions are NOT GUARANTEED STABLE between kernel
+ * versions, architectures, or even repeated boots of the same kernel.
+ * (E.g. they may depend on boot-time hardware detection or be
+ * deliberately randomized.)
+ *
+ * They are also not intended to be secure against collisions caused by
+ * malicious inputs; much slower hash functions are required for that.
+ *
+ * They are optimized for pathname components, meaning short strings.
+ * Even if a majority of files have longer names, the dynamic profile of
+ * pathname components skews short due to short directory names.
+ * (E.g. /usr/lib/libsesquipedalianism.so.3.141.)
+ */
+
+/*
+ * Version 1: one byte at a time.  Example of use:
+ *
+ * unsigned long hash = init_name_hash;
+ * while (*p)
+ *	hash = partial_name_hash(tolower(*p++), hash);
+ * hash = end_name_hash(hash);
+ *
+ * Although this is designed for bytes, fs/hfsplus/unicode.c
+ * abuses it to hash 16-bit values.
+ */
+
+/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */
+#define init_name_hash()		0
+
+/* partial hash update function. Assume roughly 4 bits per character */
+static inline unsigned long
+partial_name_hash(unsigned long c, unsigned long prevhash)
+{
+	return (prevhash + (c << 4) + (c >> 4)) * 11;
+}
+
+/*
+ * Finally: cut down the number of bits to a int value (and try to avoid
+ * losing bits)
+ */
+static inline unsigned long end_name_hash(unsigned long hash)
+{
+	return (unsigned int)hash;
+}
+
+/*
+ * Version 2: One word (32 or 64 bits) at a time.
+ * If CONFIG_DCACHE_WORD_ACCESS is defined (meaning <asm/word-at-a-time.h>
+ * exists, which describes major Linux platforms like x86 and ARM), then
+ * this computes a different hash function much faster.
+ *
+ * If not set, this falls back to a wrapper around the preceding.
+ */
+extern unsigned int full_name_hash(const unsigned char *, unsigned int);
+
+/*
+ * A hash_len is a u64 with the hash of a string in the low
+ * half and the length in the high half.
+ */
+#define hashlen_hash(hashlen) ((u32)(hashlen))
+#define hashlen_len(hashlen)  ((u32)((hashlen) >> 32))
+#define hashlen_create(hash, len) ((u64)(len)<<32 | (u32)(hash))
+
+#endif	/* __LINUX_STRINGHASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 02/10] fs/namei.c: Add hashlen_string() function
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
  2016-05-28 19:57       ` [PATCH v3 01/10] Pull out string hash to <linux/stringhash.h> George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-28 19:57       ` [PATCH v3 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() George Spelvin
                         ` (8 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml; +Cc: J . Bruce Fields, George Spelvin

We'd like to make more use of the highly-optimized dcache hash functions
throughout the kernel, rather than have every subsystem create its own,
and a function that hashes basic null-terminated strings is required
for that.

(The name is to emphasize that it returns both hash and length.)

It's actually useful in the dcache itself, specifically d_alloc_name().
Other uses in the next patch.

full_name_hash() is also tweaked to make it more generally useful:
1) Take a "char *" rather than "unsigned char *" argument, to
   be consistent with hash_name().
2) Handle zero-length inputs.  If we want more callers, we don't want
   to make them worry about corner cases.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
---
 fs/dcache.c                |  3 +--
 fs/namei.c                 | 51 +++++++++++++++++++++++++++++++++++++++++-----
 include/linux/stringhash.h |  8 ++++++--
 3 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index d5ecc6e4..19b75180 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1653,8 +1653,7 @@ struct dentry *d_alloc_name(struct dentry *parent, const char *name)
 	struct qstr q;
 
 	q.name = name;
-	q.len = strlen(name);
-	q.hash = full_name_hash(q.name, q.len);
+	q.hash_len = hashlen_string(name);
 	return d_alloc(parent, &q);
 }
 EXPORT_SYMBOL(d_alloc_name);
diff --git a/fs/namei.c b/fs/namei.c
index 42f8ca03..dd98d43a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1822,19 +1822,20 @@ static inline unsigned long mix_hash(unsigned long hash)
 
 #endif
 
-unsigned int full_name_hash(const unsigned char *name, unsigned int len)
+/* Return the hash of a string of known length */
+unsigned int full_name_hash(const char *name, unsigned int len)
 {
 	unsigned long a, hash = 0;
 
 	for (;;) {
+		if (!len)
+			goto done;
 		a = load_unaligned_zeropad(name);
 		if (len < sizeof(unsigned long))
 			break;
 		hash = mix_hash(hash + a);
 		name += sizeof(unsigned long);
 		len -= sizeof(unsigned long);
-		if (!len)
-			goto done;
 	}
 	hash += a & bytemask_from_count(len);
 done:
@@ -1842,6 +1843,29 @@ done:
 }
 EXPORT_SYMBOL(full_name_hash);
 
+/* Return the "hash_len" (hash and length) of a null-terminated string */
+u64 hashlen_string(const char *name)
+{
+	unsigned long a, adata, mask, hash, len;
+	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
+
+	hash = a = 0;
+	len = -sizeof(unsigned long);
+	do {
+		hash = mix_hash(hash + a);
+		len += sizeof(unsigned long);
+		a = load_unaligned_zeropad(name+len);
+	} while (!has_zero(a, &adata, &constants));
+
+	adata = prep_zero_mask(a, adata, &constants);
+	mask = create_zero_mask(adata);
+	hash += a & zero_bytemask(mask);
+	len += find_zero(mask);
+
+	return hashlen_create(fold_hash(hash), len);
+}
+EXPORT_SYMBOL(hashlen_string);
+
 /*
  * Calculate the length and hash of the path component, and
  * return the "hash_len" as the result.
@@ -1872,15 +1896,32 @@ static inline u64 hash_name(const char *name)
 
 #else
 
-unsigned int full_name_hash(const unsigned char *name, unsigned int len)
+/* Return the hash of a string of known length */
+unsigned int full_name_hash(const char *name, unsigned int len)
 {
 	unsigned long hash = init_name_hash();
 	while (len--)
-		hash = partial_name_hash(*name++, hash);
+		hash = partial_name_hash((unsigned char)*name++, hash);
 	return end_name_hash(hash);
 }
 EXPORT_SYMBOL(full_name_hash);
 
+/* Return the "hash_len" (hash and length) of a null-terminated string */
+u64 hash_string(const char *name)
+{
+	unsigned long hash = init_name_hash();
+	unsigned long len = 0, c;
+
+	c = (unsigned char)*name;
+	do {
+		len++;
+		hash = partial_name_hash(c, hash);
+		c = (unsigned char)name[len];
+	} while (c);
+	return hashlen_create(end_name_hash(hash), len);
+}
+EXPORT_SYMBOL(hash_string);
+
 /*
  * We know there's a real path component here of at least
  * one character.
diff --git a/include/linux/stringhash.h b/include/linux/stringhash.h
index 2eaaaf6d..451771d9 100644
--- a/include/linux/stringhash.h
+++ b/include/linux/stringhash.h
@@ -1,7 +1,8 @@
 #ifndef __LINUX_STRINGHASH_H
 #define __LINUX_STRINGHASH_H
 
-#include <linux/types.h>
+#include <linux/compiler.h>	/* For __pure */
+#include <linux/types.h>	/* For u32, u64 */
 
 /*
  * Routines for hashing strings of bytes to a 32-bit hash value.
@@ -59,7 +60,7 @@ static inline unsigned long end_name_hash(unsigned long hash)
  *
  * If not set, this falls back to a wrapper around the preceding.
  */
-extern unsigned int full_name_hash(const unsigned char *, unsigned int);
+extern unsigned int __pure full_name_hash(const char *, unsigned int);
 
 /*
  * A hash_len is a u64 with the hash of a string in the low
@@ -69,4 +70,7 @@ extern unsigned int full_name_hash(const unsigned char *, unsigned int);
 #define hashlen_len(hashlen)  ((u32)((hashlen) >> 32))
 #define hashlen_create(hash, len) ((u64)(len)<<32 | (u32)(hash))
 
+/* Return the "hash_len" (hash and length) of a null-terminated string */
+extern u64 __pure hashlen_string(const char *name);
+
 #endif	/* __LINUX_STRINGHASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
  2016-05-28 19:57       ` [PATCH v3 01/10] Pull out string hash to <linux/stringhash.h> George Spelvin
  2016-05-28 19:57       ` [PATCH v3 02/10] fs/namei.c: Add hashlen_string() function George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-28 19:57       ` [PATCH v3 04/10] Change hash_64() return value to 32 bits George Spelvin
                         ` (7 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml
  Cc: J . Bruce Fields, George Spelvin, Jeff Layton, linux-nfs

Finally, the first use of previous two patches: eliminate the
separate ad-hoc string hash functions in the sunrpc code.

Now hash_str() is a wrapper around hash_string(), and hash_mem() is
likewise a wrapper around full_name_hash().

Note that sunrpc code *does* call hash_mem() with a zero length, which
is why the previous patch needed to handle that in full_name_hash().
(Thanks, Bruce, for finding that!)

This also eliminates the only caller of hash_long which asks for
more than 32 bits of output.

The comment about the quality of hashlen_string() and full_name_hash()
is jumping the gun by a few patches; they aren't very impressive now,
but will be improved greatly later in the series.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-nfs@vger.kernel.org
---
 include/linux/sunrpc/svcauth.h | 40 +++++++++-------------------------------
 1 file changed, 9 insertions(+), 31 deletions(-)

diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index c00f53a4..91d5a5d6 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -16,6 +16,7 @@
 #include <linux/sunrpc/cache.h>
 #include <linux/sunrpc/gss_api.h>
 #include <linux/hash.h>
+#include <linux/stringhash.h>
 #include <linux/cred.h>
 
 struct svc_cred {
@@ -165,41 +166,18 @@ extern int svcauth_unix_set_client(struct svc_rqst *rqstp);
 extern int unix_gid_cache_create(struct net *net);
 extern void unix_gid_cache_destroy(struct net *net);
 
-static inline unsigned long hash_str(char *name, int bits)
+/*
+ * The <stringhash.h> functions are good enough that we don't need to
+ * use hash_32() on them; just extracting the high bits is enough.
+ */
+static inline unsigned long hash_str(char const *name, int bits)
 {
-	unsigned long hash = 0;
-	unsigned long l = 0;
-	int len = 0;
-	unsigned char c;
-	do {
-		if (unlikely(!(c = *name++))) {
-			c = (char)len; len = -1;
-		}
-		l = (l << 8) | c;
-		len++;
-		if ((len & (BITS_PER_LONG/8-1))==0)
-			hash = hash_long(hash^l, BITS_PER_LONG);
-	} while (len);
-	return hash >> (BITS_PER_LONG - bits);
+	return hashlen_hash(hashlen_string(name)) >> (32 - bits);
 }
 
-static inline unsigned long hash_mem(char *buf, int length, int bits)
+static inline unsigned long hash_mem(char const *buf, int length, int bits)
 {
-	unsigned long hash = 0;
-	unsigned long l = 0;
-	int len = 0;
-	unsigned char c;
-	do {
-		if (len == length) {
-			c = (char)len; len = -1;
-		} else
-			c = *buf++;
-		l = (l << 8) | c;
-		len++;
-		if ((len & (BITS_PER_LONG/8-1))==0)
-			hash = hash_long(hash^l, BITS_PER_LONG);
-	} while (len);
-	return hash >> (BITS_PER_LONG - bits);
+	return full_name_hash(buf, length) >> (32 - bits);
 }
 
 #endif /* __KERNEL__ */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 04/10] Change hash_64() return value to 32 bits
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
                         ` (2 preceding siblings ...)
  2016-05-28 19:57       ` [PATCH v3 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-28 19:57       ` [PATCH v3 05/10] Eliminate bad hash multipliers from hash_32() and hash_64() George Spelvin
                         ` (6 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml; +Cc: J . Bruce Fields, George Spelvin

That's all that's ever asked for, and it makes the return
type of hash_long() consistent.

It also allows (upcoming patch) an optimized implementation
of hash_64 on 32-bit machines.

I tried adding a BUILD_BUG_ON to ensure the number of bits requested
was never more than 32 (most callers use a compile-time constant), but
adding <linux/bug.h> to <linux/hash.h> breaks the tools/perf compiler
unless tools/perf/MANIFEST is updated, and understanding that code base
well enough to update it is too much trouble.  I did the rest of an
allyesconfig build with such a check, and nothing tripped.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
---
 include/linux/hash.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/hash.h b/include/linux/hash.h
index 79c52fa8..f967dedb 100644
--- a/include/linux/hash.h
+++ b/include/linux/hash.h
@@ -48,7 +48,7 @@
 #define GOLDEN_RATIO_32 0x61C88647
 #define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
-static __always_inline u64 hash_64(u64 val, unsigned int bits)
+static __always_inline u32 hash_64(u64 val, unsigned int bits)
 {
 	u64 hash = val;
 
@@ -72,7 +72,7 @@ static __always_inline u64 hash_64(u64 val, unsigned int bits)
 #endif
 
 	/* High bits are more random, so use them. */
-	return hash >> (64 - bits);
+	return (u32)(hash >> (64 - bits));
 }
 
 static inline u32 hash_32(u32 val, unsigned int bits)
@@ -84,7 +84,7 @@ static inline u32 hash_32(u32 val, unsigned int bits)
 	return hash >> (32 - bits);
 }
 
-static inline unsigned long hash_ptr(const void *ptr, unsigned int bits)
+static inline u32 hash_ptr(const void *ptr, unsigned int bits)
 {
 	return hash_long((unsigned long)ptr, bits);
 }
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 05/10] Eliminate bad hash multipliers from hash_32() and  hash_64()
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
                         ` (3 preceding siblings ...)
  2016-05-28 19:57       ` [PATCH v3 04/10] Change hash_64() return value to 32 bits George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-28 19:57       ` [PATCH v3 06/10] fs/namei.c: Improve dcache hash function George Spelvin
                         ` (5 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml
  Cc: J . Bruce Fields, George Spelvin, Antti Palosaari, Mauro Carvalho Chehab

The "simplified" prime multipliers made very bad hash functions, so get rid
of them.  This completes the work of 689de1d6ca.

To avoid the inefficiency which was the motivation for the "simplified"
multipliers, hash_64() on 32-bit systems is changed to use a different
algorithm.  It makes two calls to hash_32() instead.

drivers/media/usb/dvb-usb-v2/af9015.c uses the old GOLDEN_RATIO_PRIME_32
for some horrible reason, so it inherits a copy of the old definition.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
---
 drivers/media/usb/dvb-usb-v2/af9015.c |  2 +
 include/linux/hash.h                  | 87 ++++++++++++++---------------------
 2 files changed, 36 insertions(+), 53 deletions(-)

diff --git a/drivers/media/usb/dvb-usb-v2/af9015.c b/drivers/media/usb/dvb-usb-v2/af9015.c
index 95a7388e..09e0f58f 100644
--- a/drivers/media/usb/dvb-usb-v2/af9015.c
+++ b/drivers/media/usb/dvb-usb-v2/af9015.c
@@ -398,6 +398,8 @@ error:
 }
 
 #define AF9015_EEPROM_SIZE 256
+/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
+#define GOLDEN_RATIO_PRIME_32 0x9e370001UL
 
 /* hash (and dump) eeprom */
 static int af9015_eeprom_hash(struct dvb_usb_device *d)
diff --git a/include/linux/hash.h b/include/linux/hash.h
index f967dedb..613cfde3 100644
--- a/include/linux/hash.h
+++ b/include/linux/hash.h
@@ -3,85 +3,65 @@
 /* Fast hashing routine for ints,  longs and pointers.
    (C) 2002 Nadia Yvette Chambers, IBM */
 
-/*
- * Knuth recommends primes in approximately golden ratio to the maximum
- * integer representable by a machine word for multiplicative hashing.
- * Chuck Lever verified the effectiveness of this technique:
- * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
- *
- * These primes are chosen to be bit-sparse, that is operations on
- * them can use shifts and additions instead of multiplications for
- * machines where multiplications are slow.
- */
-
 #include <asm/types.h>
 #include <linux/compiler.h>
 
-/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
-#define GOLDEN_RATIO_PRIME_32 0x9e370001UL
-/*  2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
-#define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL
-
+/*
+ * The "GOLDEN_RATIO_PRIME" is used in ifs/btrfs/brtfs_inode.h and
+ * fs/inode.c.  It's not actually prime any more (the previous primes
+ * were actively bad for hashing), but the name remains.
+ */
 #if BITS_PER_LONG == 32
-#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_32
+#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_32
 #define hash_long(val, bits) hash_32(val, bits)
 #elif BITS_PER_LONG == 64
 #define hash_long(val, bits) hash_64(val, bits)
-#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_64
+#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_64
 #else
 #error Wordsize not 32 or 64
 #endif
 
 /*
- * The above primes are actively bad for hashing, since they are
- * too sparse. The 32-bit one is mostly ok, the 64-bit one causes
- * real problems. Besides, the "prime" part is pointless for the
- * multiplicative hash.
+ * This hash multiplies the input by a large odd number and takes the
+ * high bits.  Since multiplication propagates changes to the most
+ * significant end only, it is essential that the high bits of the
+ * product be used for the hash value.
+ *
+ * Chuck Lever verified the effectiveness of this technique:
+ * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
  *
  * Although a random odd number will do, it turns out that the golden
  * ratio phi = (sqrt(5)-1)/2, or its negative, has particularly nice
- * properties.
+ * properties.  (See Knuth vol 3, section 6.4, exercise 9.)
  *
- * These are the negative, (1 - phi) = (phi^2) = (3 - sqrt(5))/2.
- * (See Knuth vol 3, section 6.4, exercise 9.)
+ * These are the negative, (1 - phi) = phi**2 = (3 - sqrt(5))/2,
+ * which is very slightly easier to multiply by and makes no
+ * difference to the hash distribution.
  */
 #define GOLDEN_RATIO_32 0x61C88647
 #define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
-static __always_inline u32 hash_64(u64 val, unsigned int bits)
+
+static inline u32 __hash_32(u32 val)
 {
-	u64 hash = val;
-
-#if BITS_PER_LONG == 64
-	hash = hash * GOLDEN_RATIO_64;
-#else
-	/*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
-	u64 n = hash;
-	n <<= 18;
-	hash -= n;
-	n <<= 33;
-	hash -= n;
-	n <<= 3;
-	hash += n;
-	n <<= 3;
-	hash -= n;
-	n <<= 4;
-	hash += n;
-	n <<= 2;
-	hash += n;
-#endif
-
-	/* High bits are more random, so use them. */
-	return (u32)(hash >> (64 - bits));
+	return val * GOLDEN_RATIO_32;
 }
 
 static inline u32 hash_32(u32 val, unsigned int bits)
 {
-	/* On some cpus multiply is faster, on others gcc will do shifts */
-	u32 hash = val * GOLDEN_RATIO_PRIME_32;
-
 	/* High bits are more random, so use them. */
-	return hash >> (32 - bits);
+	return __hash_32(val) >> (32 - bits);
+}
+
+static __always_inline u32 hash_64(u64 val, unsigned int bits)
+{
+#if BITS_PER_LONG == 64
+	/* 64x64-bit multiply is efficient on all 64-bit processors */
+	return val * GOLDEN_RATIO_64 >> (64 - bits);
+#else
+	/* Hash 64 bits using only 32x32-bit multiply. */
+	return hash_32((u32)val ^ __hash_32(val >> 32), bits);
+#endif
 }
 
 static inline u32 hash_ptr(const void *ptr, unsigned int bits)
@@ -89,6 +69,7 @@ static inline u32 hash_ptr(const void *ptr, unsigned int bits)
 	return hash_long((unsigned long)ptr, bits);
 }
 
+/* This really should be called fold32_ptr; it does no hashing to speak of. */
 static inline u32 hash32_ptr(const void *ptr)
 {
 	unsigned long val = (unsigned long)ptr;
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
                         ` (4 preceding siblings ...)
  2016-05-28 19:57       ` [PATCH v3 05/10] Eliminate bad hash multipliers from hash_32() and hash_64() George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-30 15:11         ` Peter Zijlstra
  2016-05-28 19:57       ` [PATCH v3 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
                         ` (4 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml; +Cc: J . Bruce Fields, George Spelvin

Patch 0fed3ac866 improved the hash mixing, but the function is slower
than necessary; there's a 7-instruction dependency chain (10 on x86)
each loop iteration.

Word-at-a-time access is a very tight loop (which is good, because
link_path_walk() is one of the hottest code paths in the entire kernel),
and the hash mixing function must not have a longer latency to avoid
slowing it down.

There do not appear to be any published fast hash functions that:
1) Operate on the input a word at a time, and
2) Don't need to know the length of the input beforehand, and
3) Have a single iterated mixing function, not needing conditional
   branches or unrolling to distinguish different loop iterations.

One of the algorithms which comes closest is Yann Collet's xxHash, but
that's two dependent multiplies per word, which is too much.

The key insights in this design are:

1) Barring expensive ops like multiplies, to diffuse one input bit
   across 64 bits of hash state takes at least log2(64) = 6 sequentially
   dependent instructions.  That is more cycles than we'd like.
2) An operation like "hash ^= hash << 13" requires a second temporary
   register anyway, and on a 2-operand machine like x86, it's three
   instructions.
3) A better use of a second register is to hold a two-word hash state.
   With careful design, no temporaries are needed at all, so it doesn't
   increase register pressure.  And this gets rid of register copying
   on 2-operand machines, so the code is smaller and faster.
4) Using two words of state weakens the requirement for one-round mixing;
   we now have two rounds of mixing before cancellation is possible.
5) A two-word hash state also allows operations on both halves to be
   done in parallel, so on a superscalar processor we get more mixing
   in fewer cycles.

I ended up using a mixing function inspired by the ChaCha and Speck
round functions.  It is 6 simple instructions and 3 cycles per iteration
(assuming multiply by 9 can be done by an "lea" instruction):

		x ^= *input++;
	y ^= x;	x = ROL(x, K1);
	x += y;	y = ROL(y, K2);
	y *= 9;

Not only is this reversible, two consecutive rounds are reversible:
if you are given the initial and final states, but not the intermediate
state, it is possible to compute both input words.  This means that at
least 3 words of input are required to create a collision.

(It also has the property, used by hash_name() to avoid a branch, that
it hashes all-zero to all-zero.)

The rotate constants K1 and K2 were found by experiment.  The search took
a sample of random initial states (I used 1023) and considered the effect
of flipping each of the 64 input bits on each of the 128 output bits two
rounds later.  Each of the 8192 pairs can be considered a biased coin, and
adding up the Shannon entropy of all of them produces a score.

The best-scoring shifts also did well in other tests (flipping bits in y,
trying 3 or 4 rounds of mixing, flipping all 64*63/2 pairs of input bits),
so the choice was made with the additional constraint that the sum of the
shifts is odd and not too close to the word size.

The final state is then folded into a 32-bit hash value by a less carefully
optimized multiply-based scheme.  This also has to be fast, as pathname
components tend to be short (the most common case is one iteration!), but
there's some room for latency, as there is a fair bit of intervening logic
before the hash value is used for anything.

(Performance verified with "bonnie++ -s 0 -n 1536:-2" on tmpfs.  I need
a better benchmark; the numbers seem to show a slight dip in performance
between 4.6.0 and this patch, but they're too noisy to quote.)

Special thanks to Bruce fields for diligent testing which uncovered a
nasty fencepost error in an earlier version of this patch.

[checkpatch.pl formatting complaints noted and respectfully disagreed with.]

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Tested-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/namei.c | 121 +++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 81 insertions(+), 40 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index dd98d43a..a49cbd7e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -35,6 +35,7 @@
 #include <linux/fs_struct.h>
 #include <linux/posix_acl.h>
 #include <linux/hash.h>
+#include <linux/bitops.h>
 #include <asm/uaccess.h>
 
 #include "internal.h"
@@ -1788,44 +1789,89 @@ static int walk_component(struct nameidata *nd, int flags)
 #include <asm/word-at-a-time.h>
 
 #ifdef CONFIG_64BIT
-
-static inline unsigned int fold_hash(unsigned long hash)
-{
-	return hash_64(hash, 32);
-}
+/*
+ * Register pressure in the mixing function is an issue, particularly
+ * on 32-bit x86, but almost any function requires one state value and
+ * one temporary.  Instead, use a function designed for two state values
+ * and no temporaries.
+ *
+ * This function cannot create a collision in only two iterations, so
+ * we have two iterations to achieve avalanche.  In those two iterations,
+ * we have six layers of mixing, which is enough to spread one bit's
+ * influence out to 2^6 = 64 state bits.
+ *
+ * Rotate constants are scored by considering either 64 one-bit input
+ * deltas or 64*63/2 = 2016 two-bit input deltas, and finding the
+ * probability of that delta causing a change to each of the 128 output
+ * bits, using a sample of random initial states.
+ *
+ * The Shannon entropy of the computed probabilities is then summed
+ * to produce a score.  Ideally, any input change has a 50% chance of
+ * toggling any given output bit.
+ *
+ * Mixing scores (in bits) for (12,45):
+ * Input delta: 1-bit      2-bit
+ * 1 round:     713.3    42542.6
+ * 2 rounds:   2753.7   140389.8
+ * 3 rounds:   5954.1   233458.2
+ * 4 rounds:   7862.6   256672.2
+ * Perfect:    8192     258048
+ *            (64*128) (64*63/2 * 128)
+ */
+#define HASH_MIX(x, y, a)	\
+	(	x ^= (a),	\
+	y ^= x,	x = rol64(x,12),\
+	x += y,	y = rol64(y,45),\
+	y *= 9			)
 
 /*
- * This is George Marsaglia's XORSHIFT generator.
- * It implements a maximum-period LFSR in only a few
- * instructions.  It also has the property (required
- * by hash_name()) that mix_hash(0) = 0.
+ * Fold two longs into one 32-bit hash value.  This must be fast, but
+ * latency isn't quite as critical, as there is a fair bit of additional
+ * work done before the hash value is used.
  */
-static inline unsigned long mix_hash(unsigned long hash)
+static inline unsigned int fold_hash(unsigned long x, unsigned long y)
 {
-	hash ^= hash << 13;
-	hash ^= hash >> 7;
-	hash ^= hash << 17;
-	return hash;
+	y ^= x * GOLDEN_RATIO_64;
+	y *= GOLDEN_RATIO_64;
+	return y >> 32;
 }
 
 #else	/* 32-bit case */
 
-#define fold_hash(x) (x)
+/*
+ * Mixing scores (in bits) for (7,20):
+ * Input delta: 1-bit      2-bit
+ * 1 round:     330.3     9201.6
+ * 2 rounds:   1246.4    25475.4
+ * 3 rounds:   1907.1    31295.1
+ * 4 rounds:   2042.3    31718.6
+ * Perfect:    2048      31744
+ *            (32*64)   (32*31/2 * 64)
+ */
+#define HASH_MIX(x, y, a)	\
+	(	x ^= (a),	\
+	y ^= x,	x = rol32(x, 7),\
+	x += y,	y = rol32(y,20),\
+	y *= 9			)
 
-static inline unsigned long mix_hash(unsigned long hash)
+static inline unsigned int fold_hash(unsigned long x, unsigned long y)
 {
-	hash ^= hash << 13;
-	hash ^= hash >> 17;
-	hash ^= hash << 5;
-	return hash;
+	/* Use arch-optimized multiply if one exists */
+	return __hash_32(y ^ __hash_32(x));
 }
 
 #endif
 
-/* Return the hash of a string of known length */
+/*
+ * Return the hash of a string of known length.  This is carfully
+ * designed to match hash_name(), which is the more critical function.
+ * In particular, we must end by hashing a final word containing 0..7
+ * payload bytes, to match the way that hash_name() iterates until it
+ * finds the delimiter after the name.
+ */
 unsigned int full_name_hash(const char *name, unsigned int len)
 {
-	unsigned long a, hash = 0;
+	unsigned long a, x = 0, y = 0;
 
 	for (;;) {
 		if (!len)
@@ -1833,36 +1879,34 @@ unsigned int full_name_hash(const char *name, unsigned int len)
 		a = load_unaligned_zeropad(name);
 		if (len < sizeof(unsigned long))
 			break;
-		hash = mix_hash(hash + a);
+		HASH_MIX(x, y, a);
 		name += sizeof(unsigned long);
 		len -= sizeof(unsigned long);
 	}
-	hash += a & bytemask_from_count(len);
+	x ^= a & bytemask_from_count(len);
 done:
-	return fold_hash(hash);
+	return fold_hash(x, y);
 }
 EXPORT_SYMBOL(full_name_hash);
 
 /* Return the "hash_len" (hash and length) of a null-terminated string */
 u64 hashlen_string(const char *name)
 {
-	unsigned long a, adata, mask, hash, len;
+	unsigned long a = 0, x = 0, y = 0, adata, mask, len;
 	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 
-	hash = a = 0;
 	len = -sizeof(unsigned long);
 	do {
-		hash = mix_hash(hash + a);
+		HASH_MIX(x, y, a);
 		len += sizeof(unsigned long);
 		a = load_unaligned_zeropad(name+len);
 	} while (!has_zero(a, &adata, &constants));
 
 	adata = prep_zero_mask(a, adata, &constants);
 	mask = create_zero_mask(adata);
-	hash += a & zero_bytemask(mask);
-	len += find_zero(mask);
+	x ^= a & zero_bytemask(mask);
 
-	return hashlen_create(fold_hash(hash), len);
+	return hashlen_create(fold_hash(x, y), len + find_zero(mask));
 }
 EXPORT_SYMBOL(hashlen_string);
 
@@ -1872,13 +1916,12 @@ EXPORT_SYMBOL(hashlen_string);
  */
 static inline u64 hash_name(const char *name)
 {
-	unsigned long a, b, adata, bdata, mask, hash, len;
+	unsigned long a = 0, b, x = 0, y = 0, adata, bdata, mask, len;
 	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 
-	hash = a = 0;
 	len = -sizeof(unsigned long);
 	do {
-		hash = mix_hash(hash + a);
+		HASH_MIX(x, y, a);
 		len += sizeof(unsigned long);
 		a = load_unaligned_zeropad(name+len);
 		b = a ^ REPEAT_BYTE('/');
@@ -1886,15 +1929,13 @@ static inline u64 hash_name(const char *name)
 
 	adata = prep_zero_mask(a, adata, &constants);
 	bdata = prep_zero_mask(b, bdata, &constants);
-
 	mask = create_zero_mask(adata | bdata);
+	x ^= a & zero_bytemask(mask);
 
-	hash += a & zero_bytemask(mask);
-	len += find_zero(mask);
-	return hashlen_create(fold_hash(hash), len);
+	return hashlen_create(fold_hash(x, y), len + find_zero(mask));
 }
 
-#else
+#else	/* !CONFIG_DCACHE_WORD_ACCESS: Slow, byte-at-a-time version */
 
 /* Return the hash of a string of known length */
 unsigned int full_name_hash(const char *name, unsigned int len)
@@ -1965,7 +2006,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		int type;
 
 		err = may_lookup(nd);
- 		if (err)
+		if (err)
 			return err;
 
 		hash_len = hash_name(name);
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 07/10] <linux/hash.h>: Add support for architecture-specific functions
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
                         ` (5 preceding siblings ...)
  2016-05-28 19:57       ` [PATCH v3 06/10] fs/namei.c: Improve dcache hash function George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-29  7:57         ` Geert Uytterhoeven
  2016-05-28 19:57       ` [PATCH v3 08/10] m68k: Add <asm/hash.h> George Spelvin
                         ` (3 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml
  Cc: J . Bruce Fields, George Spelvin, Geert Uytterhoeven,
	Greg Ungerer, Andreas Schwab, Philippe De Muyter, linux-m68k,
	Alistair Francis, Michal Simek, Yoshinori Sato, uclinux-h8-devel

This is just the infrastructure; there are no users yet.

This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/hash.h>.

That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis <alistai@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
---
 arch/Kconfig         |   8 ++
 fs/namei.c           |   6 +-
 include/linux/hash.h |  27 +++++-
 lib/Kconfig.debug    |  11 +++
 lib/Makefile         |   1 +
 lib/test_hash.c      | 250 +++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 299 insertions(+), 4 deletions(-)
 create mode 100644 lib/test_hash.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 81869a5e..96406e4d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -589,6 +589,14 @@ config HAVE_STACK_VALIDATION
 	  Architecture supports the 'objtool check' host tool command, which
 	  performs compile-time stack metadata validation.
 
+config HAVE_ARCH_HASH
+	bool
+	default n
+	help
+	  If this is set, the architecture provides an <asm/hash.h>
+	  file which provides platform-specific implementations of some
+	  functions in <linux/hash.h> or fs/namei.c.
+
 #
 # ABI hall of shame
 #
diff --git a/fs/namei.c b/fs/namei.c
index a49cbd7e..968dae02 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1788,7 +1788,11 @@ static int walk_component(struct nameidata *nd, int flags)
 
 #include <asm/word-at-a-time.h>
 
-#ifdef CONFIG_64BIT
+#ifdef HASH_MIX
+
+/* Architecture provides HASH_MIX and fold_hash() in <asm/hash.h> */
+
+#elif defined(CONFIG_64BIT)
 /*
  * Register pressure in the mixing function is an issue, particularly
  * on 32-bit x86, but almost any function requires one state value and
diff --git a/include/linux/hash.h b/include/linux/hash.h
index 613cfde3..ad6fa21d 100644
--- a/include/linux/hash.h
+++ b/include/linux/hash.h
@@ -41,19 +41,40 @@
 #define GOLDEN_RATIO_32 0x61C88647
 #define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
+#ifdef CONFIG_HAVE_ARCH_HASH
+/* This header may use the GOLDEN_RATIO_xx constants */
+#include <asm/hash.h>
+#endif
 
-static inline u32 __hash_32(u32 val)
+/*
+ * The _generic versions exist only so lib/test_hash.c can compare
+ * the arch-optimized versions with the generic.
+ *
+ * Note that if you change these, any <asm/hash.h> that aren't updated
+ * to match need to have their HAVE_ARCH_* define values updated so the
+ * self-test will not false-positive.
+ */
+#ifndef HAVE_ARCH__HASH_32
+#define __hash_32 __hash_32_generic
+#endif
+static inline u32 __hash_32_generic(u32 val)
 {
 	return val * GOLDEN_RATIO_32;
 }
 
-static inline u32 hash_32(u32 val, unsigned int bits)
+#ifndef HAVE_ARCH_HASH_32
+#define hash_32 hash_32_generic
+#endif
+static inline u32 hash_32_generic(u32 val, unsigned int bits)
 {
 	/* High bits are more random, so use them. */
 	return __hash_32(val) >> (32 - bits);
 }
 
-static __always_inline u32 hash_64(u64 val, unsigned int bits)
+#ifndef HAVE_ARCH_HASH_64
+#define hash_64 hash_64_generic
+#endif
+static __always_inline u32 hash_64_generic(u64 val, unsigned int bits)
 {
 #if BITS_PER_LONG == 64
 	/* 64x64-bit multiply is efficient on all 64-bit processors */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1e9a6075..18ec69ba 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1815,6 +1815,17 @@ config TEST_RHASHTABLE
 
 	  If unsure, say N.
 
+config TEST_HASH
+	tristate "Perform selftest on hash functions"
+	default n
+	help
+	  Enable this option to test the kernel's integer (<linux/hash,h>)
+	  and string (<linux/stringhash.h>) hash functions on boot
+	  (or module load).
+
+	  This is intended to help people writing architecture-specific
+	  optimized versions.  If unsure, say N.
+
 endmenu # runtime tests
 
 config PROVIDE_OHCI1394_DMA_INIT
diff --git a/lib/Makefile b/lib/Makefile
index 7bd6fd43..f80b1a1b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
 obj-y += kstrtox.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
 obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
+obj-$(CONFIG_TEST_HASH) += test_hash.o
 obj-$(CONFIG_TEST_KASAN) += test_kasan.o
 obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
 obj-$(CONFIG_TEST_LKM) += test_module.o
diff --git a/lib/test_hash.c b/lib/test_hash.c
new file mode 100644
index 00000000..c9549c8b
--- /dev/null
+++ b/lib/test_hash.c
@@ -0,0 +1,250 @@
+/*
+ * Test cases for <linux/hash.h> and <linux/stringhash.h>
+ * This just verifies that various ways of computing a hash
+ * produce the same thing and, for cases where a k-bit hash
+ * value is requested, is of the requested size.
+ *
+ * We fill a buffer with a 255-byte null-terminated string,
+ * and use both full_name_hash() and hashlen_string() to hash the
+ * substrings from i to j, where 0 <= i < j < 256.
+ *
+ * The returned values are used to check that __hash_32() and
+ * __hash_32_generic() compute the same thing.  Likewise hash_32()
+ * and hash_64().
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt "\n"
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/hash.h>
+#include <linux/stringhash.h>
+#include <linux/printk.h>
+
+/* 32-bit XORSHIFT generator.  Seed must not be zero. */
+static u32 __init __attribute_const__
+xorshift(u32 seed)
+{
+	seed ^= seed << 13;
+	seed ^= seed >> 17;
+	seed ^= seed << 5;
+	return seed;
+}
+
+/* Given a non-zero x, returns a non-zero byte. */
+static u8 __init __attribute_const__
+mod255(u32 x)
+{
+	x = (x & 0xffff) + (x >> 16);	/* 1 <= x <= 0x1fffe */
+	x = (x & 0xff) + (x >> 8);	/* 1 <= x <= 0x2fd */
+	x = (x & 0xff) + (x >> 8);	/* 1 <= x <= 0x100 */
+	x = (x & 0xff) + (x >> 8);	/* 1 <= x <= 0xff */
+	return x;
+}
+
+/* Fill the buffer with non-zero bytes. */
+static void __init
+fill_buf(char *buf, size_t len, u32 seed)
+{
+	size_t i;
+
+	for (i = 0; i < len; i++) {
+		seed = xorshift(seed);
+		buf[i] = mod255(seed);
+	}
+}
+
+/*
+ * Test the various integer hash functions.  h64 (or its low-order bits)
+ * is the integer to hash.  hash_or accumulates the OR of the hash values,
+ * which are later checked to see that they cover all the requested bits.
+ *
+ * Because these functions (as opposed to the string hashes) are all
+ * inline, the code being tested is actually in the module, and you can
+ * recompile and re-test the module without rebooting.
+ */
+static bool __init
+test_int_hash(unsigned long long h64, u32 hash_or[2][33])
+{
+	int k;
+	u32 h0 = (u32)h64, h1, h2;
+
+	/* Test __hash32 */
+	hash_or[0][0] |= h1 = __hash_32(h0);
+#ifdef HAVE_ARCH__HASH_32
+	hash_or[1][0] |= h2 = __hash_32_generic(h0);
+#if HAVE_ARCH__HASH_32 == 1
+	if (h1 != h2) {
+		pr_err("__hash_32(%#x) = %#x != __hash_32_generic() = %#x",
+			h0, h1, h2);
+		return false;
+	}
+#endif
+#endif
+
+	/* Test k = 1..32 bits */
+	for (k = 1; k <= 32; k++) {
+		u32 const m = ((u32)2 << (k-1)) - 1;	/* Low k bits set */
+
+		/* Test hash_32 */
+		hash_or[0][k] |= h1 = hash_32(h0, k);
+		if (h1 > m) {
+			pr_err("hash_32(%#x, %d) = %#x > %#x", h0, k, h1, m);
+			return false;
+		}
+#ifdef HAVE_ARCH_HASH_32
+		h2 = hash_32_generic(h0, k);
+#if HAVE_ARCH_HASH_32 == 1
+		if (h1 != h2) {
+			pr_err("hash_32(%#x, %d) = %#x != hash_32_generic() "
+				" = %#x", h0, k, h1, h2);
+			return false;
+		}
+#else
+		if (h2 > m) {
+			pr_err("hash_32_generic(%#x, %d) = %#x > %#x",
+				h0, k, h1, m);
+			return false;
+		}
+#endif
+#endif
+		/* Test hash_64 */
+		hash_or[1][k] |= h1 = hash_64(h64, k);
+		if (h1 > m) {
+			pr_err("hash_64(%#llx, %d) = %#x > %#x", h64, k, h1, m);
+			return false;
+		}
+#ifdef HAVE_ARCH_HASH_64
+		h2 = hash_64_generic(h64, k);
+#if HAVE_ARCH_HASH_64 == 1
+		if (h1 != h2) {
+			pr_err("hash_64(%#llx, %d) = %#x != hash_64_generic() "
+				"= %#x", h64, k, h1, h2);
+			return false;
+		}
+#else
+		if (h2 > m) {
+			pr_err("hash_64_generic(%#llx, %d) = %#x > %#x",
+				h64, k, h1, m);
+			return false;
+		}
+#endif
+#endif
+	}
+
+	(void)h2;	/* Suppress unused variable warning */
+	return true;
+}
+
+#define SIZE 256	/* Run time is cubic in SIZE */
+
+static int __init
+test_hash_init(void)
+{
+	char buf[SIZE+1];
+	u32 string_or = 0, hash_or[2][33] = { 0 };
+	unsigned tests = 0;
+	unsigned long long h64 = 0;
+	int i, j;
+
+	fill_buf(buf, SIZE, 1);
+
+	/* Test every possible non-empty substring in the buffer. */
+	for (j = SIZE; j > 0; --j) {
+		buf[j] = '\0';
+
+		for (i = 0; i <= j; i++) {
+			u64 hashlen = hashlen_string(buf+i);
+			u32 h0 = full_name_hash(buf+i, j-i);
+
+			/* Check that hashlen_string gets the length right */
+			if (hashlen_len(hashlen) != j-i) {
+				pr_err("hashlen_string(%d..%d) returned length"
+					" %u, expected %d",
+					i, j, hashlen_len(hashlen), j-i);
+				return -EINVAL;
+			}
+			/* Check that the hashes match */
+			if (hashlen_hash(hashlen) != h0) {
+				pr_err("hashlen_string(%d..%d) = %08x != "
+					"full_name_hash() = %08x",
+					i, j, hashlen_hash(hashlen), h0);
+				return -EINVAL;
+			}
+
+			string_or |= h0;
+			h64 = h64 << 32 | h0;	/* For use with hash_64 */
+			if (!test_int_hash(h64, hash_or))
+				return -EINVAL;
+			tests++;
+		} /* i */
+	} /* j */
+
+	/* The OR of all the hash values should cover all the bits */
+	if (~string_or) {
+		pr_err("OR of all string hash results = %#x != %#x",
+			string_or, -1u);
+		return -EINVAL;
+	}
+	if (~hash_or[0][0]) {
+		pr_err("OR of all __hash_32 results = %#x != %#x",
+			hash_or[0][0], -1u);
+		return -EINVAL;
+	}
+#ifdef HAVE_ARCH__HASH_32
+#if HAVE_ARCH__HASH_32 != 1	/* Test is pointless if results match */
+	if (~hash_or[1][0]) {
+		pr_err("OR of all __hash_32_generic results = %#x != %#x",
+			hash_or[1][0], -1u);
+		return -EINVAL;
+	}
+#endif
+#endif
+
+	/* Likewise for all the i-bit hash values */
+	for (i = 1; i <= 32; i++) {
+		u32 const m = ((u32)2 << (i-1)) - 1;	/* Low i bits set */
+
+		if (hash_or[0][i] != m) {
+			pr_err("OR of all hash_32(%d) results = %#x "
+				"(%#x expected)", i, hash_or[0][i], m);
+			return -EINVAL;
+		}
+		if (hash_or[1][i] != m) {
+			pr_err("OR of all hash_64(%d) results = %#x "
+				"(%#x expected)", i, hash_or[1][i], m);
+			return -EINVAL;
+		}
+	}
+
+	/* Issue notices about skipped tests. */
+#ifndef HAVE_ARCH__HASH_32
+	pr_info("__hash_32() has no arch implementation to test.");
+#elif HAVE_ARCH__HASH_32 != 1
+	pr_info("__hash_32() is arch-specific; not compared to generic.");
+#endif
+#ifndef HAVE_ARCH_HASH_32
+	pr_info("hash_32() has no arch implementation to test.");
+#elif HAVE_ARCH_HASH_32 != 1
+	pr_info("hash_32() is arch-specific; not compared to generic.");
+#endif
+#ifndef HAVE_ARCH_HASH_64
+	pr_info("hash_64() has no arch implementation to test.");
+#elif HAVE_ARCH_HASH_64 != 1
+	pr_info("hash_64() is arch-specific; not compared to generic.");
+#endif
+
+	pr_notice("%u tests passed.", tests);
+
+	return 0;
+}
+
+static void __exit test_hash_exit(void)
+{
+}
+
+module_init(test_hash_init);	/* Does everything */
+module_exit(test_hash_exit);	/* Does nothing */
+
+MODULE_LICENSE("GPL");
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 08/10] m68k: Add <asm/hash.h>
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
                         ` (6 preceding siblings ...)
  2016-05-28 19:57       ` [PATCH v3 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-28 19:57       ` [PATCH v3 09/10] microblaze: " George Spelvin
                         ` (2 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml
  Cc: J . Bruce Fields, George Spelvin, Geert Uytterhoeven,
	Greg Ungerer, Andreas Schwab, Philippe De Muyter, linux-m68k

This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647
for the original mc68000, which lacks a 32x32-bit multiply instruction.

Yes, the amount of optimization effort put in is excessive. :-)

Shift-add chain found by Yevgen Voronenko's Hcub algorithm at
http://spiral.ece.cmu.edu/mcm/gen.html

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
---
 arch/m68k/Kconfig.cpu        |  1 +
 arch/m68k/include/asm/hash.h | 59 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)
 create mode 100644 arch/m68k/include/asm/hash.h

diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
index 0dfcf128..bf3de464 100644
--- a/arch/m68k/Kconfig.cpu
+++ b/arch/m68k/Kconfig.cpu
@@ -40,6 +40,7 @@ config M68000
 	select CPU_HAS_NO_MULDIV64
 	select CPU_HAS_NO_UNALIGNED
 	select GENERIC_CSUM
+	select HAVE_ARCH_HASH
 	help
 	  The Freescale (was Motorola) 68000 CPU is the first generation of
 	  the well known M68K family of processors. The CPU core as well as
diff --git a/arch/m68k/include/asm/hash.h b/arch/m68k/include/asm/hash.h
new file mode 100644
index 00000000..6407af84
--- /dev/null
+++ b/arch/m68k/include/asm/hash.h
@@ -0,0 +1,59 @@
+#ifndef _ASM_HASH_H
+#define _ASM_HASH_H
+
+/*
+ * If CONFIG_M68000=y (original mc68000/010), this file is #included
+ * to work around the lack of a MULU.L instruction.
+ */
+
+#define HAVE_ARCH__HASH_32 1
+/*
+ * While it would be legal to substitute a different hash operation
+ * entirely, let's keep it simple and just use an optimized multiply
+ * by GOLDEN_RATIO_32 = 0x61C88647.
+ *
+ * The best way to do that appears to be to multiply by 0x8647 with
+ * shifts and adds, and use mulu.w to multiply the high half by 0x61C8.
+ *
+ * Because the 68000 has multi-cycle shifts, this addition chain is
+ * chosen to minimise the shift distances.
+ *
+ * Despite every attempt to spoon-feed it simple operations, GCC
+ * 6.1.1 doggedly insists on doing annoying things like converting
+ * "lsl.l #2,<reg>" (12 cycles) to two adds (8+8 cycles).
+ *
+ * It also likes to notice two shifts in a row, like "a = x << 2" and
+ * "a <<= 7", and convert that to "a = x << 9".  But shifts longer
+ * than 8 bits are extra-slow on m68k, so that's a lose.
+ *
+ * Since the 68000 is a very simple in-order processor with no
+ * instruction scheduling effects on execution time, we can safely
+ * take it out of GCC's hands and write one big asm() block.
+ *
+ * Without calling overhead, this operation is 30 bytes (14 instructions
+ * plus one immediate constant) and 166 cycles.
+ *
+ * (Because %2 is fetched twice, it can't be postincrement, and thus it
+ * can't be a fully general "g" or "m".  Register is preferred, but
+ * offsettable memory or immediate will work.)
+ */
+static inline u32 __attribute_const__ __hash_32(u32 x)
+{
+	u32 a, b;
+
+	asm(   "move.l %2,%0"	/* a = x * 0x0001 */
+	"\n	lsl.l #2,%0"	/* a = x * 0x0004 */
+	"\n	move.l %0,%1"
+	"\n	lsl.l #7,%0"	/* a = x * 0x0200 */
+	"\n	add.l %2,%0"	/* a = x * 0x0201 */
+	"\n	add.l %0,%1"	/* b = x * 0x0205 */
+	"\n	add.l %0,%0"	/* a = x * 0x0402 */
+	"\n	add.l %0,%1"	/* b = x * 0x0607 */
+	"\n	lsl.l #5,%0"	/* a = x * 0x8040 */
+	: "=&d,d" (a), "=&r,r" (b)
+	: "r,roi?" (x));	/* a+b = x*0x8647 */
+
+	return ((u16)(x*0x61c8) << 16) + a + b;
+}
+
+#endif	/* _ASM_HASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 09/10] microblaze: Add <asm/hash.h>
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
                         ` (7 preceding siblings ...)
  2016-05-28 19:57       ` [PATCH v3 08/10] m68k: Add <asm/hash.h> George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-28 19:57       ` [PATCH v3 10/10] h8300: " George Spelvin
  2016-05-28 20:47       ` [PATCH v3 00/10] String hash improvements Linus Torvalds
  10 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml
  Cc: J . Bruce Fields, George Spelvin, Alistair Francis, Michal Simek

Microblaze is an FPGA soft core that can be configured various ways.

If it is configured without a multiplier, the standard __hash_32()
will require a call to __mulsi3, which is a slow software loop.

Instead, use a shift-and-add sequence for the constant multiply.
GCC knows how to do this, but it's not as clever as some.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
---
 arch/microblaze/Kconfig            |  1 +
 arch/microblaze/include/asm/hash.h | 81 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)
 create mode 100644 arch/microblaze/include/asm/hash.h

diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 3d793b55..ce3e5125 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -16,6 +16,7 @@ config MICROBLAZE
 	select GENERIC_IRQ_SHOW
 	select GENERIC_PCI_IOMAP
 	select GENERIC_SCHED_CLOCK
+	select HAVE_ARCH_HASH
 	select HAVE_ARCH_KGDB
 	select HAVE_DEBUG_KMEMLEAK
 	select HAVE_DMA_API_DEBUG
diff --git a/arch/microblaze/include/asm/hash.h b/arch/microblaze/include/asm/hash.h
new file mode 100644
index 00000000..753513ae
--- /dev/null
+++ b/arch/microblaze/include/asm/hash.h
@@ -0,0 +1,81 @@
+#ifndef _ASM_HASH_H
+#define _ASM_HASH_H
+
+/*
+ * Fortunately, most people who want to run Linux on Microblaze enable
+ * both multiplier and barrel shifter, but omitting them is technically
+ * a supported configuration.
+ *
+ * With just a barrel shifter, we can implement an efficient constant
+ * multiply using shifts and adds.  GCC can find a 9-step solution, but
+ * this 6-step solution was found by Yevgen Voronenko's implementation
+ * of the Hcub algorithm at http://spiral.ece.cmu.edu/mcm/gen.html.
+ *
+ * That software is really not designed for a single multiplier this large,
+ * but if you run it enough times with different seeds, it'll find several
+ * 6-shift, 6-add sequences for computing x * 0x61C88647.  They are all
+ *	c = (x << 19) + x;
+ *	a = (x <<  9) + c;
+ *	b = (x << 23) + a;
+ *	return (a<<11) + (b<<6) + (c<<3) - b;
+ * with variations on the order of the final add.
+ *
+ * Without even a shifter, it's hopless; any hash function will suck.
+ */
+
+#if CONFIG_XILINX_MICROBLAZE0_USE_HW_MUL == 0
+
+#define HAVE_ARCH__HASH_32 1
+
+/* Multiply by GOLDEN_RATIO_32 = 0x61C88647 */
+static inline u32 __attribute_const__ __hash_32(u32 a)
+{
+#if CONFIG_XILINX_MICROBLAZE0_USE_BARREL
+	unsigned int b, c;
+
+	/* Phase 1: Compute three intermediate values */
+	b =  a << 23;
+	c = (a << 19) + a;
+	a = (a <<  9) + c;
+	b += a;
+
+	/* Phase 2: Compute (a << 11) + (b << 6) + (c << 3) - b */
+	a <<= 5;
+	a += b;		/* (a << 5) + b */
+	a <<= 3;
+	a += c;		/* (a << 8) + (b << 3) + c */
+	a <<= 3;
+	return a - b;	/* (a << 11) + (b << 6) + (c << 3) - b */
+#else
+	/*
+	 * "This is really going to hurt."
+	 *
+	 * Without a barrel shifter, left shifts are implemented as
+	 * repeated additions, and the best we can do is an optimal
+	 * addition-subtraction chain.  This one is not known to be
+	 * optimal, but at 37 steps, it's decent for a 31-bit multiplier.
+	 *
+	 * Question: given its size (37*4 = 148 bytes per instance),
+	 * and slowness, is this worth having inline?
+	 */
+	unsigned int b, c, d;
+
+	b = a << 4;	/* 4    */
+	c = b << 1;	/* 1  5 */
+	b += a;		/* 1  6 */
+	c += b;		/* 1  7 */
+	c <<= 3;	/* 3 10 */
+	c -= a;		/* 1 11 */
+	d = c << 7;	/* 7 18 */
+	d += b;		/* 1 19 */
+	d <<= 8;	/* 8 27 */
+	d += a;		/* 1 28 */
+	d <<= 1;	/* 1 29 */
+	d += b;		/* 1 30 */
+	d <<= 6;	/* 6 36 */
+	return d + c;	/* 1 37 total instructions*/
+#endif
+}
+
+#endif /* !CONFIG_XILINX_MICROBLAZE0_USE_HW_MUL */
+#endif /* _ASM_HASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 10/10] h8300: Add <asm/hash.h>
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
                         ` (8 preceding siblings ...)
  2016-05-28 19:57       ` [PATCH v3 09/10] microblaze: " George Spelvin
@ 2016-05-28 19:57       ` George Spelvin
  2016-05-28 20:47       ` [PATCH v3 00/10] String hash improvements Linus Torvalds
  10 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 19:57 UTC (permalink / raw)
  To: Linus Torvalds, lkml
  Cc: J . Bruce Fields, George Spelvin, Yoshinori Sato, uclinux-h8-devel

This will improve the performance of hash_32() and hash_64(), but due
to complete lack of multi-bit shift instructions on H8, performance will
still be bad in surrounding code.

Designing H8-specific hash algorithms to work around that is a separate
project.  (But if the maintainers would like to get in touch...)

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
---
 arch/h8300/Kconfig            |  1 +
 arch/h8300/include/asm/hash.h | 53 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)
 create mode 100644 arch/h8300/include/asm/hash.h

diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index 986ea84c..6c583dbb 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -20,6 +20,7 @@ config H8300
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZO
 	select HAVE_ARCH_KGDB
+	select HAVE_ARCH_HASH
 
 config RWSEM_GENERIC_SPINLOCK
 	def_bool y
diff --git a/arch/h8300/include/asm/hash.h b/arch/h8300/include/asm/hash.h
new file mode 100644
index 00000000..04cfbd2b
--- /dev/null
+++ b/arch/h8300/include/asm/hash.h
@@ -0,0 +1,53 @@
+#ifndef _ASM_HASH_H
+#define _ASM_HASH_H
+
+/*
+ * The later H8SX models have a 32x32-bit multiply, but the H8/300H
+ * and H8S have only 16x16->32.  Since it's tolerably compact, this is
+ * basically an inlined version of the __mulsi3 code.  Since the inputs
+ * are not expected to be small, it's also simplfied by skipping the
+ * early-out checks.
+ *
+ * (Since neither CPU has any multi-bit shift instructions, a
+ * shift-and-add version is a non-starter.)
+ *
+ * TODO: come up with an arch-specific version of the hashing in fs/namei.c,
+ * since that is heavily dependent on rotates.  Which, as mentioned, suck
+ * horribly on H8.
+ */
+
+#if defined(CONFIG_CPU_H300H) || defined(CONFIG_CPU_H8S)
+
+#define HAVE_ARCH__HASH_32 1
+
+/*
+ * Multiply by k = 0x61C88647.  Fitting this into three registers requires
+ * one extra instruction, but reducing register pressure will probably
+ * make that back and then some.
+ *
+ * GCC asm note: %e1 is the high half of operand %1, while %f1 is the
+ * low half.  So if %1 is er4, then %e1 is e4 and %f1 is r4.
+ *
+ * This has been designed to modify x in place, since that's the most
+ * common usage, but preserve k, since hash_64() makes two calls in
+ * quick succession.
+ */
+static inline u32 __attribute_const__ __hash_32(u32 x)
+{
+	u32 temp;
+
+	asm(   "mov.w	%e1,%f0"
+	"\n	mulxu.w	%f2,%0"		/* klow * xhigh */
+	"\n	mov.w	%f0,%e1"	/* The extra instruction */
+	"\n	mov.w	%f1,%f0"
+	"\n	mulxu.w	%e2,%0"		/* khigh * xlow */
+	"\n	add.w	%e1,%f0"
+	"\n	mulxu.w	%f2,%1"		/* klow * xlow */
+	"\n	add.w	%f0,%e1"
+	: "=&r" (temp), "=r" (x)
+	: "%r" (GOLDEN_RATIO_32), "1" (x));
+	return x;
+}
+
+#endif
+#endif /* _ASM_HASH_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 00/10] String hash improvements
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
                         ` (9 preceding siblings ...)
  2016-05-28 19:57       ` [PATCH v3 10/10] h8300: " George Spelvin
@ 2016-05-28 20:47       ` Linus Torvalds
  2016-05-28 20:54         ` George Spelvin
  10 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2016-05-28 20:47 UTC (permalink / raw)
  To: George Spelvin; +Cc: lkml, J . Bruce Fields

On Sat, May 28, 2016 at 12:57 PM, George Spelvin
<linux@sciencehorizons.net> wrote:
> Okay, Linus, it's still warm from the oven, but I think it's fully baked.

Hmm. Whioch commit did you use as a base for this? It's not plain 4.6,
but it's not toay's git either.

I did find a point where this applies, but it would be even better if
you just told me what your base was, and I'll start the branch from
there - that gets the git history "correct" too.

Thanks,

               Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 00/10] String hash improvements
  2016-05-28 20:47       ` [PATCH v3 00/10] String hash improvements Linus Torvalds
@ 2016-05-28 20:54         ` George Spelvin
  0 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-05-28 20:54 UTC (permalink / raw)
  To: linux, torvalds; +Cc: bfields, linux-kernel

Oh!  I based in on 0fed3ac866eabf01924457921ee3684c8e4c9005, which
I hope makes sense.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 07/10] <linux/hash.h>: Add support for architecture-specific functions
  2016-05-28 19:57       ` [PATCH v3 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
@ 2016-05-29  7:57         ` Geert Uytterhoeven
  0 siblings, 0 replies; 55+ messages in thread
From: Geert Uytterhoeven @ 2016-05-29  7:57 UTC (permalink / raw)
  To: George Spelvin
  Cc: Linus Torvalds, lkml, J . Bruce Fields, Greg Ungerer,
	Andreas Schwab, Philippe De Muyter, linux-m68k, Alistair Francis,
	Michal Simek, Yoshinori Sato, uclinux-h8-devel

Hi George,

I see this has been applied in the mean time, but improvements
never hurt...

On Sat, May 28, 2016 at 9:57 PM, George Spelvin
<linux@sciencehorizons.net> wrote:
> --- /dev/null
> +++ b/lib/test_hash.c
> @@ -0,0 +1,250 @@

> +/*
> + * Test the various integer hash functions.  h64 (or its low-order bits)
> + * is the integer to hash.  hash_or accumulates the OR of the hash values,
> + * which are later checked to see that they cover all the requested bits.
> + *
> + * Because these functions (as opposed to the string hashes) are all
> + * inline, the code being tested is actually in the module, and you can
> + * recompile and re-test the module without rebooting.
> + */
> +static bool __init
> +test_int_hash(unsigned long long h64, u32 hash_or[2][33])
> +{
> +       int k;
> +       u32 h0 = (u32)h64, h1, h2;
> +
> +       /* Test __hash32 */
> +       hash_or[0][0] |= h1 = __hash_32(h0);
> +#ifdef HAVE_ARCH__HASH_32
> +       hash_or[1][0] |= h2 = __hash_32_generic(h0);

__hash_32_generic() always exist, right?
So you can reduce #ifdefery and increase compile coverage by dropping the
#ifdef ...

> +#if HAVE_ARCH__HASH_32 == 1
> +       if (h1 != h2) {

... and replacing this test by

        if (identical_hashes && (h1 != h2))

with identical_hashes a static const bool initialized depending on
HAVE_ARCH__HASH_32.

> +               pr_err("__hash_32(%#x) = %#x != __hash_32_generic() = %#x",
> +                       h0, h1, h2);
> +               return false;
> +       }
> +#endif
> +#endif

The same comment applies to the other hunks.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
  2016-05-28 19:57       ` [PATCH v3 06/10] fs/namei.c: Improve dcache hash function George Spelvin
@ 2016-05-30 15:11         ` Peter Zijlstra
  2016-05-30 16:06           ` George Spelvin
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2016-05-30 15:11 UTC (permalink / raw)
  To: George Spelvin; +Cc: Linus Torvalds, lkml, J . Bruce Fields

On Sat, May 28, 2016 at 03:57:19PM -0400, George Spelvin wrote:

> +static inline unsigned int fold_hash(unsigned long x, unsigned long y)
>  {
> +	y ^= x * GOLDEN_RATIO_64;
> +	y *= GOLDEN_RATIO_64;
> +	return y >> 32;
>  }

So does it make sense to use that pattern here too?

This code doesn't much care about performance, but wants a decent hash
from the stack of class keys.

---
 kernel/locking/lockdep.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 81f1a7107c0e..c8498efcd5d9 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -309,10 +309,12 @@ static struct hlist_head chainhash_table[CHAINHASH_SIZE];
  * It's a 64-bit hash, because it's important for the keys to be
  * unique.
  */
-#define iterate_chain_key(key1, key2) \
-	(((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \
-	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
-	(key2))
+static inline u64 iterate_chain_key(u64 x, u64 y)
+{
+	y ^= x * GOLDEN_RATIO_64;
+	y *= GOLDEN_RATIO_64;
+	return y;
+}
 
 void lockdep_off(void)
 {

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
  2016-05-30 15:11         ` Peter Zijlstra
@ 2016-05-30 16:06           ` George Spelvin
  2016-05-30 16:27             ` Peter Zijlstra
  0 siblings, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-05-30 16:06 UTC (permalink / raw)
  To: linux, peterz; +Cc: bfields, linux-kernel, torvalds

Peter Zijlstra <peterz@infradead.org> wrote:
> On Sat, May 28, 2016 at 03:57:19PM -0400, George Spelvin wrote:
>> +static inline unsigned int fold_hash(unsigned long x, unsigned long y)
>>  {
>> +	y ^= x * GOLDEN_RATIO_64;
>> +	y *= GOLDEN_RATIO_64;
>> +	return y >> 32;
>>  }

> So does it make sense to use that pattern here too?
>
> This code doesn't much care about performance, but wants a decent hash
> from the stack of class keys.
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 81f1a7107c0e..c8498efcd5d9 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -309,10 +309,12 @@ static struct hlist_head chainhash_table[CHAINHASH_SIZE];
>   * It's a 64-bit hash, because it's important for the keys to be
>   * unique.
>   */
> -#define iterate_chain_key(key1, key2) \
> -	(((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \
> -	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
> -	(key2))
> +static inline u64 iterate_chain_key(u64 x, u64 y)
> +{
> +	y ^= x * GOLDEN_RATIO_64;
> +	y *= GOLDEN_RATIO_64;
> +	return y;
> +}
>  
>  void lockdep_off(void)
>  {

Not quite.  The fold_hash() you quote is used only on 64-bit systems,
which can be assumed to have a reasonable 64-bit multiply.  On 32-bit
platforms, I avoid using GOLDEN_RATIO_64 at all, since 64x64-bit
multiplies are so expensive.

You actually have only 96 bits of input.  The correct prototype is:

static inline u64 iterate_chain_key(u64 key, u32 idx)

If performance mattered, I'd be inclined to use one or two iterations
of the 32-bit HASH_MIX() function, which is specifically designed
to add 32 bits to a 64-bit hash value.

A more thorough mixing would be achieved by __jhash_mix().  Basically:

static inline u64 iterate_chain_key(u64 key, u32 idx)
{
	u32 k0 = key, k1 = key >> 32;

	__jhash_mix(idx, k0, k1)	/* Macro that modifies arguments! */

	return k0 | (u64)k1 << 32;
}

(The order of arguments is chosen to perserve the two "most-hashed" values.)


Also, I just had contact from the hppa folks who have brought to my
attention that it's an example of an out-of-order superscalar CPU that
*doesn't* have a good integer multiplier.  For general multiplies,
you have to move values to the FPU and the code is a pain.

Instead, it has shift-and-add instructions designed to help the compiler
generate multiplies by constants, but large ones like GOLDEN_RATIO_64
is still a pain.

Here's code to take x (in %r26) and multiply it by GOLDEN_RATIO_64,
with the result in %r28:

        shladd,l %r26,1,%r26,%r19
        depd,z %r19,46,47,%r31
        sub %r31,%r19,%r31
        shladd,l %r31,2,%r26,%r31
        shladd,l %r31,2,%r26,%r31
        shladd,l %r31,2,%r31,%r19
        depd,z %r19,60,61,%r31
        sub %r31,%r19,%r31
        depd,z %r31,54,55,%r28
        add,l %r31,%r28,%r31
        shladd,l %r31,3,%r26,%r31
        depd,z %r31,59,60,%r28
        add,l %r31,%r28,%r31
        shladd,l %r31,2,%r26,%r31
        depd,z %r31,60,61,%r28
        sub %r28,%r31,%r28
        depd,z %r28,60,61,%r31
        sub %r31,%r28,%r31
        depd,z %r31,57,58,%r28
        add,l %r31,%r28,%r28
        depd,z %r28,61,62,%r28
        sub %r28,%r26,%r28
        shladd,l %r28,3,%r28,%r28

We're going to work on that.  Either finding a better code sequence,
or using the 32-bit version of hash_64() witht he nice code I've already
found for multiplies by GOLDEN_RATIO_32.


(But thaks, Peter, for looking a the code!)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
  2016-05-30 16:06           ` George Spelvin
@ 2016-05-30 16:27             ` Peter Zijlstra
  2016-05-30 18:10               ` George Spelvin
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2016-05-30 16:27 UTC (permalink / raw)
  To: George Spelvin; +Cc: bfields, linux-kernel, torvalds

On Mon, May 30, 2016 at 12:06:18PM -0400, George Spelvin wrote:
> Not quite.  The fold_hash() you quote is used only on 64-bit systems,
> which can be assumed to have a reasonable 64-bit multiply.  On 32-bit
> platforms, I avoid using GOLDEN_RATIO_64 at all, since 64x64-bit
> multiplies are so expensive.

Right; as stated performance really isn't a goal here.

> You actually have only 96 bits of input.  The correct prototype is:
> 
> static inline u64 iterate_chain_key(u64 key, u32 idx)

Indeed, although I conveniently ignored that because I didn't think it'd
matter :-)

> If performance mattered, I'd be inclined to use one or two iterations
> of the 32-bit HASH_MIX() function, which is specifically designed
> to add 32 bits to a 64-bit hash value.

Ah, I missed that HASH_MIX() had 64 bit state, so much for being able to
read it seems. Also; should we not move that entire section of
fs/namei.c into linux/hash.h ?

These two primitives seem generally useful.

> A more thorough mixing would be achieved by __jhash_mix().  Basically:
> 
> static inline u64 iterate_chain_key(u64 key, u32 idx)
> {
> 	u32 k0 = key, k1 = key >> 32;
> 
> 	__jhash_mix(idx, k0, k1)	/* Macro that modifies arguments! */
> 
> 	return k0 | (u64)k1 << 32;
> }
> 
> (The order of arguments is chosen to perserve the two "most-hashed" values.)

(I'd never have managed to deduce that property given the information in
jhash.h)

OK, that looks good to me, thanks! I'll go use that then.

> Also, I just had contact from the hppa folks who have brought to my
> attention that it's an example of an out-of-order superscalar CPU that
> *doesn't* have a good integer multiplier.  For general multiplies,
> you have to move values to the FPU and the code is a pain.

Egads, that's horrible, but sounds exactly like the thing you 'like'
given these patches :-) Good luck with that.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
  2016-05-30 16:27             ` Peter Zijlstra
@ 2016-05-30 18:10               ` George Spelvin
  2016-06-02  1:18                 ` Linus Torvalds
  0 siblings, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-05-30 18:10 UTC (permalink / raw)
  To: linux, peterz; +Cc: bfields, linux-kernel, torvalds

On Mon, 30 May 2016 at 18:27:21 +0200, Peter Zijlstra wrote:
> On Mon, May 30, 2016 at 12:06:18PM -0400, George Spelvin wrote:
> Right; as stated performance really isn't a goal here.

I understand, but 64x64-bit multiply on 32-bit is pretty annoyingly
expensive.  In time, code size, and register pressure which bloats
surrounding code.

>> If performance mattered, I'd be inclined to use one or two iterations
>> of the 32-bit HASH_MIX() function, which is specifically designed
>> to add 32 bits to a 64-bit hash value.
> 
> Ah, I missed that HASH_MIX() had 64 bit state, so much for being able to
> read it seems. Also; should we not move that entire section of
> fs/namei.c into linux/hash.h ?
> 
> These two primitives seem generally useful.

Actually, the state is 2*sizeof(long), which is 128 bits on 64-bit.

I thought about moving it out to <linux/hash.h> as you suggest, but given
the tight coupling to the dcache hash, I decided not to until another
user showed up.

Remember, HASH_MIX() is *heavily* optimized for speed and just-barely-
adequate hash mixing for the dcache use case.  Other users should think
carefully about using it.

In particular, it's designed for 32 bits of output.  It does *not* achieve
full-width mixing, but rather achieves mixing to at least 32 bits of
output in the two rounds it has before cancellation can occur.  If you
want 64 bits of hash, as in your application, it's kind of marginal.

>> A more thorough mixing would be achieved by __jhash_mix().  Basically:
>> 
>> static inline u64 iterate_chain_key(u64 key, u32 idx)
>> {
>> 	u32 k0 = key, k1 = key >> 32;
>> 
>> 	__jhash_mix(idx, k0, k1)	/* Macro that modifies arguments! */
>> 
>> 	return k0 | (u64)k1 << 32;
>> }
>> 
>> (The order of arguments is chosen to perserve the two "most-hashed" values.)
> 
> (I'd never have managed to deduce that property given the information in
> jhash.h)

The last line of __jhash_mix(a,b,c) is
	c -= b;  c ^= rol32(b, 4);  b += a;

Thus, b and a are the last variables assigned to.  If you had dropped
one of them and returned a instead, you'd have created dead code.

>> Also, I just had contact from the hppa folks who have brought to my
>> attention that it's an example of an out-of-order superscalar CPU that
>> *doesn't* have a good integer multiplier.  For general multiplies,
>> you have to move values to the FPU and the code is a pain.
> 
> Egads, that's horrible, but sounds exactly like the thing you 'like'
> given these patches :-) Good luck with that.

Well, low-level bit-twiddling can be kind of fun.

In this case, the level of effort was required to improve the hash
mixing from "embarrassingly bad" (did you *see* what it was before
0fed3ac866?) without adding delay to a scorchingly hot code path that
Linus watches like a hawk.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
  2016-05-30 18:10               ` George Spelvin
@ 2016-06-02  1:18                 ` Linus Torvalds
  2016-06-02  2:31                   ` George Spelvin
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2016-06-02  1:18 UTC (permalink / raw)
  To: George Spelvin; +Cc: Peter Zijlstra, J. Bruce Fields, Linux Kernel Mailing List

On Mon, May 30, 2016 at 11:10 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
>
> I understand, but 64x64-bit multiply on 32-bit is pretty annoyingly
> expensive.  In time, code size, and register pressure which bloats
> surrounding code.

Side note, the code seems to work fairly well, but I do worry a bit
about the three large multiplies in link_path_walk().

There's two in fold_hash(), and one comes from "find_zero()".

It turns out to work fairly well on at least modern big-core x86
CPU's, because the multiplier is fairly beefy: low latency (3-4 cycles
in the current ctop) and fully pipelined.

Even atom should be 5 cycles and a multiplication result every two
cycles for 64-bit results.

Maybe we don't care, because looking around the modern ARM and POWER
cores do similarly, but I just wanted to point out that that code does
seem to fairly heavily rely on "everybody has bug and pipelined hw
multipliers" for performance.

.. and it's probably true that transistors are cheap, and crypto and
other uses have made CPU designers spend the effort on good
multipliers. I just remember a time when you definitely couldn't rely
on fast multiplies.

                  Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
  2016-06-02  1:18                 ` Linus Torvalds
@ 2016-06-02  2:31                   ` George Spelvin
  2016-06-02 16:35                     ` Linus Torvalds
  0 siblings, 1 reply; 55+ messages in thread
From: George Spelvin @ 2016-06-02  2:31 UTC (permalink / raw)
  To: linux, torvalds; +Cc: bfields, linux-kernel, peterz

Linus Torvalds wrote:
> On Mon, May 30, 2016 at 11:10 AM, George Spelvin wrote:
>>
>> I understand, but 64x64-bit multiply on 32-bit is pretty annoyingly
>> expensive.  In time, code size, and register pressure which bloats
>> surrounding code.

> Side note, the code seems to work fairly well, but I do worry a bit
> about the three large multiplies in link_path_walk().
> 
> There's two in fold_hash(), and one comes from "find_zero()".

I do wonder about the second multiply in fold_hash().

For the 32-bit version, the outer __hash_32() could safely be deleted.
The 32-bit hash gets fed to hash_32() to reduce it to a hash table
index anyway.

(Specifically, it can be deleted from fs/namei.c and moved into
hash_str() and hash_mem() where it's useful in folding the hash value to
less than 32 bits.)

It's the 64-bit version that's an issue.  I need to reduce 128 bits of
weakly mixed hash state to 32, and on x86 and PPC, two multiplies seems
like the fastest way.  The second one could be done with a 32-bit multiply
instead, but 64-bit has been the same latency as 32 ever since Prescott
and Saltwell (Agner Fog says it's one cycle *faster* in many cases, which
I find hard to believe), so re-using the large immediate is a net win.

I could use two more iterations of HASH_MIX() or something similar,
then just take the x value, but that's 6 cycles.  If a multiply is
4 or 5 cycles, that's a net loss.


An issue with the 64-bit version which I hadn't thought through is
false sharing with the length.  As the comments say, nobody actually
uses the hash value until after some code that checks for special cases
like . and .. using the length.

On a 32-bit machine, the length and hash are in separate registers (%edx
and %eax) and scoreboarded separately, so it's possible to examine the
length without stalling on the hash.

But on a 64-bit machine, they're merged in %rax, and it's not possible to
extract the length without waiting for the hash.  :-(

That puts the hash folding on the critical path, so maybe it needs
more attention.

> It turns out to work fairly well on at least modern big-core x86
> CPU's, because the multiplier is fairly beefy: low latency (3-4 cycles
> in the current ctop) and fully pipelined.
> 
> Even atom should be 5 cycles and a multiplication result every two
> cycles for 64-bit results.
> 
> Maybe we don't care, because looking around the modern ARM and POWER
> cores do similarly, but I just wanted to point out that that code does
> seem to fairly heavily rely on "everybody has bug and pipelined hw
> multipliers" for performance.

The problem is it's so damn useful as a mixing function.  When a multiplier
*is* available, with 3-4 cycle latency, it's hard to beat.

But worrying about that is the reason I left provision for arch-specific
hooks, and I'm already working on the first: the PA-RISC doesn't have
an integer multiplier at all, although the FPU can do 32-bit integer
multiplies.

(So much for my theory that 64-bit OOO CPUs always have grunty
multipliers!  That said, the last PA-RISC came out in 2005.)

But it tries to be good at shift-and-add sequences for multiplies by
fixed integers.

Unfortnately, the best 64-bit multiply sequence I've come up with is
13 cycles, which is a mite painful.  A few more HASH_MIX rounds looks
attractive in that case.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
  2016-06-02  2:31                   ` George Spelvin
@ 2016-06-02 16:35                     ` Linus Torvalds
  2016-06-02 18:23                       ` George Spelvin
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2016-06-02 16:35 UTC (permalink / raw)
  To: George Spelvin; +Cc: J. Bruce Fields, Linux Kernel Mailing List, Peter Zijlstra

On Wed, Jun 1, 2016 at 7:31 PM, George Spelvin
<linux@sciencehorizons.net> wrote:
>
> I could use two more iterations of HASH_MIX() or something similar,
> then just take the x value, but that's 6 cycles.  If a multiply is
> 4 or 5 cycles, that's a net loss.

Yes. Especially since the multiply will often end up more able to be
run in parallel with other things.

> But worrying about that is the reason I left provision for arch-specific
> hooks, and I'm already working on the first: the PA-RISC doesn't have
> an integer multiplier at all, although the FPU can do 32-bit integer
> multiplies.

Don't worry about pa-risc. There may be a handful of users, where even
"users" is more of a "boot up occasionally just for perverse fun"
rather than anything else.

That's true of at least half the architectures we support - the only
ones that really matter and where performance is a real isseu are
currently x86, arm and powerpc.

                  Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
  2016-06-02 16:35                     ` Linus Torvalds
@ 2016-06-02 18:23                       ` George Spelvin
  0 siblings, 0 replies; 55+ messages in thread
From: George Spelvin @ 2016-06-02 18:23 UTC (permalink / raw)
  To: linux, torvalds; +Cc: bfields, deller, linux-kernel, peterz

Linus Torvalds wrote:
> Don't worry about pa-risc. There may be a handful of users, where even
> "users" is more of a "boot up occasionally just for perverse fun"
> rather than anything else.

Yes, I'm quite aware that, like alpha and ia64, it's purely of historical
interest.  It's not even in the second tier of chips that are still being
bought to get real work done, like MIPS (a lot of consumer routers!) and
SPARC.

(Offended Itanium fanbois, please go back to sleep until Kittson is
atually for sale.)

But it's an interesting architecture (in the same sense that a platypus
is an "interesting animal"), the maintainers are happy to test patches,
and I presume I'm permitted to amuse muself messing with it as long as it
doesn't cause problems for the non-Jamaican bobsled teams.

(Note that PA-RISC does't support unaligned loads, so it doesn't use the
word-at-a-time code path at all.)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/10] String hash improvements
  2016-05-25 16:08   ` Linus Torvalds
  2016-05-28 19:57     ` [PATCH v3 " George Spelvin
@ 2016-06-02 22:59     ` Fubo Chen
  1 sibling, 0 replies; 55+ messages in thread
From: Fubo Chen @ 2016-06-02 22:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: George Spelvin, lkml, alistair.francis, Bruce Fields, geert,
	gerg, jlayton, linux-m68k, linux-nfs, michal.simek,
	Thomas Gleixner, uclinux-h8-devel, ysato

On Wed, May 25, 2016 at 9:08 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Ok, thanks. For some odd reason all your emails in this series got
> marked as spam. Every single one, including the cover letter (but not
> your replies to the replies to this).

I have added the following filter to my gmail account: "never send
e-mails with [PATCH in the subject to the spam folder". That helps a
lot.

Fubo.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2016-06-02 22:59 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CA+55aFxPSW+84KfQ1N_WmND-wtvgj2zQm8nFPkRcc+gyU=uing@mail.gmail.com>
2016-05-25  7:20 ` [PATCH 00/10] String hash improvements George Spelvin
2016-05-25  8:00   ` Geert Uytterhoeven
2016-05-25  8:11     ` George Spelvin
2016-05-25  8:50       ` Geert Uytterhoeven
2016-05-25  9:07         ` George Spelvin
2016-05-25 16:08   ` Linus Torvalds
2016-05-28 19:57     ` [PATCH v3 " George Spelvin
2016-05-28 19:57       ` [PATCH v3 01/10] Pull out string hash to <linux/stringhash.h> George Spelvin
2016-05-28 19:57       ` [PATCH v3 02/10] fs/namei.c: Add hashlen_string() function George Spelvin
2016-05-28 19:57       ` [PATCH v3 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() George Spelvin
2016-05-28 19:57       ` [PATCH v3 04/10] Change hash_64() return value to 32 bits George Spelvin
2016-05-28 19:57       ` [PATCH v3 05/10] Eliminate bad hash multipliers from hash_32() and hash_64() George Spelvin
2016-05-28 19:57       ` [PATCH v3 06/10] fs/namei.c: Improve dcache hash function George Spelvin
2016-05-30 15:11         ` Peter Zijlstra
2016-05-30 16:06           ` George Spelvin
2016-05-30 16:27             ` Peter Zijlstra
2016-05-30 18:10               ` George Spelvin
2016-06-02  1:18                 ` Linus Torvalds
2016-06-02  2:31                   ` George Spelvin
2016-06-02 16:35                     ` Linus Torvalds
2016-06-02 18:23                       ` George Spelvin
2016-05-28 19:57       ` [PATCH v3 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
2016-05-29  7:57         ` Geert Uytterhoeven
2016-05-28 19:57       ` [PATCH v3 08/10] m68k: Add <asm/hash.h> George Spelvin
2016-05-28 19:57       ` [PATCH v3 09/10] microblaze: " George Spelvin
2016-05-28 19:57       ` [PATCH v3 10/10] h8300: " George Spelvin
2016-05-28 20:47       ` [PATCH v3 00/10] String hash improvements Linus Torvalds
2016-05-28 20:54         ` George Spelvin
2016-06-02 22:59     ` [PATCH " Fubo Chen
2016-05-26 17:09   ` [PATCH v2 " George Spelvin
2016-05-25  7:21 ` [PATCH 01/10] Pull out string hash to <linux/stringhash.h> George Spelvin
2016-05-25  7:22 ` [PATCH 02/10] fs/namei.c: Add hash_string() function George Spelvin
2016-05-25  7:26 ` [PATCH 03/10] <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hash_string() George Spelvin
2016-05-25  7:28 ` [PATCH 04/10] Change hash_64() return value to 32 bits George Spelvin
2016-05-25  7:29 ` [PATCH 05/10] Eliminate bad hash multipliers from hash_32() and hash_64() George Spelvin
2016-05-25  7:31 ` [PATCH 06/10] fs/namei.c: Improve dcache hash function George Spelvin
2016-05-25  7:33 ` [PATCH 07/10] <linux/hash.h>: Add support for architecture-specific functions George Spelvin
2016-05-26 17:16   ` [PATCH v2 " George Spelvin
2016-05-25  7:34 ` [PATCH 08/10] m68k: Add <asm/archhash.h> George Spelvin
2016-05-25  7:34 ` George Spelvin
2016-05-25  8:07   ` Geert Uytterhoeven
2016-05-25  8:19     ` George Spelvin
2016-05-25  8:24     ` [PATCH 08v2/10] " George Spelvin
2016-05-25  8:48       ` Geert Uytterhoeven
2016-05-25  8:56   ` [PATCH 08/10] " Philippe De Muyter
2016-05-25  9:14     ` George Spelvin
2016-05-25  9:31       ` Andreas Schwab
2016-05-25  9:51       ` Philippe De Muyter
2016-05-25 13:24   ` Philippe De Muyter
2016-05-25 13:42     ` George Spelvin
2016-05-26 17:19   ` [PATCH v2 08/10] m68k: Add <asm/hash.h> George Spelvin
2016-05-25  7:37 ` [PATCH 09/10] microblaze: Add <asm/archhash.h> George Spelvin
2016-05-26 17:21   ` [PATCH v2 09/10] microblaze: Add <asm/hash.h> George Spelvin
2016-05-25  7:38 ` [PATCH 10/10] h8300: Add <asm/archhash.h> George Spelvin
2016-05-26 17:23   ` [PATCH v2 10/10] h8300: Add <asm/hash.h> George Spelvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).