All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash()
@ 2016-05-01 23:43 George Spelvin
  2016-05-12  0:28 ` James Simmons
  0 siblings, 1 reply; 8+ messages in thread
From: George Spelvin @ 2016-05-01 23:43 UTC (permalink / raw)
  To: lustre-devel

I notice you have copies of the <linux/hash.h> functions in libcfs_hash.h,
and use a mixture of the generic and copied functions in the Lustre code.

I'm revising the hash_64() function to fix some horrible collision
problems that Thomas Gleixner found, and I wanted to give you a heads-up.

If there are cases where you use the generic ones but need
reproducible outputs, they'll break.

If there are places where you're using the copied ones and *don't*
need reproducible outputs, you're getting bad performance.


I made a brief attempt to figure out what is used for what, but it's
too confusing for me.  It's hard to figure out code like:

static unsigned lu_obj_hop_hash(struct cfs_hash *hs,
                                const void *key, unsigned mask)
{
	struct lu_fid  *fid = (struct lu_fid *)key;
	__u32	   hash;

	hash = fid_flatten32(fid);
	hash += (hash >> 4) + (hash << 12); /* mixing oid and seq */
	hash = hash_long(hash, hs->hs_bkt_bits);

	/* give me another random factor */
	hash -= hash_long((unsigned long)hs, fid_oid(fid) % 11 + 3);

	hash <<= hs->hs_cur_bits - hs->hs_bkt_bits;
	hash |= (fid_seq(fid) + fid_oid(fid)) & (CFS_HASH_NBKT(hs) - 1);

	return hash & mask;
}

The whole business of selecting the number of bits of the hash based
on a mod-11 hash of something else seems quite peculiar.

Second, hash_long is multiplicative, meaning hash_long(a) - hash_long(b)
= hash_long(a-b).  The different bit shifts complicate it, but subtracting
two such values from each other is not a great way to mix.

Third, you're using the functions strangely.  The first hash_long() takes
a 32-bit input and would be more efficient on 64-bit platforms if it
were hash_32.  The second could be hash_ptr(),

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash()
  2016-05-01 23:43 [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash() George Spelvin
@ 2016-05-12  0:28 ` James Simmons
  2016-05-12  1:33   ` George Spelvin
  0 siblings, 1 reply; 8+ messages in thread
From: James Simmons @ 2016-05-12  0:28 UTC (permalink / raw)
  To: lustre-devel


> I notice you have copies of the <linux/hash.h> functions in libcfs_hash.h,
> and use a mixture of the generic and copied functions in the Lustre code.
> 
> I'm revising the hash_64() function to fix some horrible collision
> problems that Thomas Gleixner found, and I wanted to give you a heads-up.
> 
> If there are cases where you use the generic ones but need
> reproducible outputs, they'll break.
> 
> If there are places where you're using the copied ones and *don't*
> need reproducible outputs, you're getting bad performance. 
> 
> I made a brief attempt to figure out what is used for what, but it's
> too confusing for me.  It's hard to figure out code like:

Thanks for looking into this. Do you have a tree some where we can look
at for the changes?

We opened a ticket to help track the progress for this at

https://jira.hpdd.intel.com/browse/LU-8130

Opening tickets is what gets people to look at these types of changes.
Again thanks for letting us know about these changes.

> static unsigned lu_obj_hop_hash(struct cfs_hash *hs,
>                                 const void *key, unsigned mask)
> {
> 	struct lu_fid  *fid = (struct lu_fid *)key;
> 	__u32	   hash;
> 
> 	hash = fid_flatten32(fid);
> 	hash += (hash >> 4) + (hash << 12); /* mixing oid and seq */
> 	hash = hash_long(hash, hs->hs_bkt_bits);
> 
> 	/* give me another random factor */
> 	hash -= hash_long((unsigned long)hs, fid_oid(fid) % 11 + 3);
> 
> 	hash <<= hs->hs_cur_bits - hs->hs_bkt_bits;
> 	hash |= (fid_seq(fid) + fid_oid(fid)) & (CFS_HASH_NBKT(hs) - 1);
> 
> 	return hash & mask;
> }
> 
> The whole business of selecting the number of bits of the hash based
> on a mod-11 hash of something else seems quite peculiar.
> 
> Second, hash_long is multiplicative, meaning hash_long(a) - hash_long(b)
> = hash_long(a-b).  The different bit shifts complicate it, but subtracting
> two such values from each other is not a great way to mix.
> 
> Third, you're using the functions strangely.  The first hash_long() takes
> a 32-bit input and would be more efficient on 64-bit platforms if it
> were hash_32.  The second could be hash_ptr(),

Details about this change are at

https://jira.hpdd.intel.com/browse/LU-143

and the original patch with those changes are at

http://review.whamcloud.com/#/c/374

Is this the only code that doesn't make sense or is their more?
Looking at the source tree I see the hash code being used in 

lnet/lnet/api-ni.c
lnet/lnet/lib-ptl.c
lustre/include/lustre_fid.h
lustre/ldlm/ldlm_resource.c
lustre/obdclass/lu_object.c

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash()
  2016-05-12  0:28 ` James Simmons
@ 2016-05-12  1:33   ` George Spelvin
  2016-05-13 19:38     ` James Simmons
  0 siblings, 1 reply; 8+ messages in thread
From: George Spelvin @ 2016-05-12  1:33 UTC (permalink / raw)
  To: lustre-devel

Thank you very much for the response!

> Thanks for looking into this. Do you have a tree some where we can look
> at for the changes?

Linus committed the most important fix to 4.6-rc7 as
689de1d6ca95b3b5bd8ee446863bf81a4883ea25

I'm following up with additional, less urgent cleanups of kernel hashing.

(Fundamentally, I'm trying to reduce the number of different hash
functions, and find a good balance between execution-time performance
and mixing performance of what remains.)

> Details about this change are at
> 
> https://jira.hpdd.intel.com/browse/LU-143
> 
> and the original patch with those changes are at
> 
> http://review.whamcloud.com/#/c/374

Thanks a lot for the pointers!

> Is this the only code that doesn't make sense or is their more?

I expect there's more; that was just what first jumped out at me.
My primary goal was elsewhere, so once it became clear that Lustre would
be a time-consuming side quest, I gave up and sent that e-mail.

But if Lustre would appreciate more attention, I can take another look.


I did do a pass over the kernel and fixed improper use of the current
<linux/hash.h> functions.  I posted it as

Subject: [RFC PATCH 3/2] (Rant) Fix various hash abuses
https://marc.info/?l=linux-kernel&m=146218484201133

Since Lustre was the single biggest culprit (about 25% of that patch),
I was planning on sending a broken-out patch.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash()
  2016-05-12  1:33   ` George Spelvin
@ 2016-05-13 19:38     ` James Simmons
  2016-05-13 20:25       ` George Spelvin
  0 siblings, 1 reply; 8+ messages in thread
From: James Simmons @ 2016-05-13 19:38 UTC (permalink / raw)
  To: lustre-devel


> Thank you very much for the response!
> 
> > Thanks for looking into this. Do you have a tree some where we can look
> > at for the changes?
> 
> Linus committed the most important fix to 4.6-rc7 as
> 689de1d6ca95b3b5bd8ee446863bf81a4883ea25

Not too huge of a changed. Looks like I need to move to using these 
functions.

> I'm following up with additional, less urgent cleanups of kernel hashing.
> 
> (Fundamentally, I'm trying to reduce the number of different hash
> functions, and find a good balance between execution-time performance
> and mixing performance of what remains.)

Looking at our code I see our duplication is cfs_hash_u32_hash and
cfs_hash_u64_hash which could be replaced by the standard linux functions.
Am I missing anything else?

> > Details about this change are at
> > 
> > https://jira.hpdd.intel.com/browse/LU-143
> > 
> > and the original patch with those changes are at
> > 
> > http://review.whamcloud.com/#/c/374
> 
> Thanks a lot for the pointers!

Any time. I also CC Liang Zhen who wrote the bulk of the hashing code.

> > Is this the only code that doesn't make sense or is their more?
> 
> I expect there's more; that was just what first jumped out at me.
> My primary goal was elsewhere, so once it became clear that Lustre would
> be a time-consuming side quest, I gave up and sent that e-mail.
> 
> But if Lustre would appreciate more attention, I can take another look.

Yep. We are here to help. 

> I did do a pass over the kernel and fixed improper use of the current
> <linux/hash.h> functions.  I posted it as
> 
> Subject: [RFC PATCH 3/2] (Rant) Fix various hash abuses
> https://marc.info/?l=linux-kernel&m=146218484201133
> 
> Since Lustre was the single biggest culprit (about 25% of that patch),
> I was planning on sending a broken-out patch.

I expect this is not all the changes needed. Do you have a newer patch or 
should I run with this patch? Also I will look into replace the 
cfs_hash_u[32|64]_* code with standard linux hash code.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash()
  2016-05-13 19:38     ` James Simmons
@ 2016-05-13 20:25       ` George Spelvin
  2016-05-13 20:35         ` Drokin, Oleg
  2016-05-16 17:28         ` James Simmons
  0 siblings, 2 replies; 8+ messages in thread
From: George Spelvin @ 2016-05-13 20:25 UTC (permalink / raw)
  To: lustre-devel

> Looking at our code I see our duplication is cfs_hash_u32_hash and
> cfs_hash_u64_hash which could be replaced by the standard linux functions.
> Am I missing anything else?

The question is, why were they copied in the first place?

The <linux/hash.h> functions are for in-memory hash tables and *not*
guaranteed stable. between boots.  They could depend on kernel version,
architecture, kernel configuration, and boot-time feature detection.

I thought the reason for the copy might have been to make a stable hash
that could e.g. be sent between hosts.

(em28xx_hash_mem() in drivers/media/usb/em28xx/em28xx-i2c.c is an example
of this.  It's a copy of hash_mem() from <linux/sunrpc/svcauth.h>.))

If there's no reason for the duplication, then yes, merge them.

>> Since Lustre was the single biggest culprit (about 25% of that patch),
>> I was planning on sending a broken-out patch.

> I expect this is not all the changes needed. Do you have a newer patch or 
> should I run with this patch?  Also I will look into replace the 
> cfs_hash_u[32|64]_* code with standard linux hash code.

I don't have anything newer ATM.  I agree there's probably more; if
nothing else I didn't check the copied hash functions at all.  What I
posted was just what I noticed during a search through the kernel for
all users of the functions I was planning on changing.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash()
  2016-05-13 20:25       ` George Spelvin
@ 2016-05-13 20:35         ` Drokin, Oleg
  2016-05-16 17:28         ` James Simmons
  1 sibling, 0 replies; 8+ messages in thread
From: Drokin, Oleg @ 2016-05-13 20:35 UTC (permalink / raw)
  To: lustre-devel


On May 13, 2016, at 4:25 PM, George Spelvin wrote:

>> Looking at our code I see our duplication is cfs_hash_u32_hash and
>> cfs_hash_u64_hash which could be replaced by the standard linux functions.
>> Am I missing anything else?
> 
> The question is, why were they copied in the first place?
> 
> The <linux/hash.h> functions are for in-memory hash tables and *not*
> guaranteed stable. between boots.  They could depend on kernel version,
> architecture, kernel configuration, and boot-time feature detection.
> 
> I thought the reason for the copy might have been to make a stable hash
> that could e.g. be sent between hosts.

they were copied because of other platforms. There used to be a
userspace library, a mac port and so on. That's why we have this
(now trimmed down) compatibility layer that we are chipping at.
the hashes are purely node-local and are not sent over the wire.

Thanks!

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash()
  2016-05-13 20:25       ` George Spelvin
  2016-05-13 20:35         ` Drokin, Oleg
@ 2016-05-16 17:28         ` James Simmons
  2016-05-17  8:52           ` George Spelvin
  1 sibling, 1 reply; 8+ messages in thread
From: James Simmons @ 2016-05-16 17:28 UTC (permalink / raw)
  To: lustre-devel


> > Looking at our code I see our duplication is cfs_hash_u32_hash and
> > cfs_hash_u64_hash which could be replaced by the standard linux functions.
> > Am I missing anything else?
> 
> The question is, why were they copied in the first place?
...
> If there's no reason for the duplication, then yes, merge them.

As Oleg pointed out their is no need for the duplication.

> >> Since Lustre was the single biggest culprit (about 25% of that patch),
> >> I was planning on sending a broken-out patch.
> 
> > I expect this is not all the changes needed. Do you have a newer patch or 
> > should I run with this patch?  Also I will look into replace the 
> > cfs_hash_u[32|64]_* code with standard linux hash code.
> 
> I don't have anything newer ATM.  I agree there's probably more; if
> nothing else I didn't check the copied hash functions at all.  What I
> posted was just what I noticed during a search through the kernel for
> all users of the functions I was planning on changing.

I started a patch with your cleanups. So I started to look at replacing
cfs_hash_u[64|32]_hash() with the standard hash_[64|32]() function. 
First I need to explain the code path to make it clear what is going on.
From the top level the code path in which the cfs_hash_uXX_hash might
be called is:

cfs_hash_bd_get() -> cfs_hash_bd_from_key() -> cfs_hash_id()

cfs_hash_id() is defined as:

static inline unsigned
cfs_hash_id(struct cfs_hash *hs, const void *key, unsigned mask)
{
        return hs->hs_ops->hs_hash(hs, key, mask);
}

and hs_hash is 

struct cfs_hash_ops {
        /** return hashed value from @key */
        unsigned (*hs_hash)(struct cfs_hash *hs, const void *key,
                            unsigned mask);
        ...
}

So ops->hs_hash() is filled in by various lustre layers to generate
some hash value. It is these *hs_hash() that some times uses
cfs_hash_u64_hash() and cfs_hash_u32_hash(). An example of this is
in cl_object.c:

struct cfs_hash_ops pool_hash_operations = {
        .hs_hash        = cl_env_hops_hash,
	...
};

which is defined as:

static unsigned cl_env_hops_hash(struct cfs_hash *lh,
                                 const void *key, unsigned mask)
{
#if BITS_PER_LONG == 64
        return cfs_hash_u64_hash((__u64)key, mask);
#else
        return cfs_hash_u32_hash((__u32)key, mask);
#endif
}

If we change to using hash_64() or hash_32() instead this
will change the behavior we currently have.

Now cfs_hash_u64_hash() is defined as

static inline unsigned
cfs_hash_u64_hash(const __u64 key, unsigned mask)
{
        return ((unsigned)(key * CFS_GOLDEN_RATIO_PRIME_64) & mask);
}

which uses a mask directly. That mask is generated in cfs_hash_id()
with :

unsigned int index = cfs_hash_id(hs, key, (1U << bits) - 1);

So the mask uses the lower bits. We can't change this behavior
since other hp_ops->hs_hash functions that don't use 
cfs_hash_u[64|32]_hash() at all treat this as a mask.

Now for using the standand functions we have for example:

static __always_inline u64 hash_64(u64 val, unsigned int bits)
{
        u64 hash = val;

#if BITS_PER_LONG == 64
        hash = hash * GOLDEN_RATIO_64;
#else
        ....
#endif

        /* High bits are more random, so use them. */
        return hash >> (64 - bits);
}

which generates the mask on the fly using the higher bits.
I can work the code so the mask can be turned into a bit offset
but the mask will still be different. Will this difference
break things?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash()
  2016-05-16 17:28         ` James Simmons
@ 2016-05-17  8:52           ` George Spelvin
  0 siblings, 0 replies; 8+ messages in thread
From: George Spelvin @ 2016-05-17  8:52 UTC (permalink / raw)
  To: lustre-devel

> If we change to using hash_64() or hash_32() instead this
> will change the behavior we currently have.

There is hash_long for exactly this purpose.
The problem was that Lustre was using hash_long() on values that are 64
bits even on 32-bit platforms.  I wasn't trying to get rid of all
calls to hash_long, just the buggy ones.

hash_32: Hash 32 bits
hash_64: Hash 64 bits
hash_long: Alias for one of the preceding, depending on BITS_PER_LONG.

The thing is to use the correct one for the argument.

> Now cfs_hash_u64_hash() is defined as
>
> static inline unsigned
> cfs_hash_u64_hash(const __u64 key, unsigned mask)
> {
>         return ((unsigned)(key * CFS_GOLDEN_RATIO_PRIME_64) & mask);
> }
> 
> which uses a mask directly. That mask is generated in cfs_hash_id() with :
> 
> unsigned int index = cfs_hash_id(hs, key, (1U << bits) - 1);
> 
> So the mask uses the lower bits. We can't change this behavior
> since other hp_ops->hs_hash functions that don't use 
> cfs_hash_u[64|32]_hash() at all treat this as a mask.

This is a bit of a problem.  My hash changes won't make it any worse,
but it's a *current* problem that you probably want to fix.

The thing is, a multiply has lower bits of the input affect higher bits
of the output, but there is no propagation at all in the other direction.
(See http://burtleburtle.net/bob/hash/integer.html for a discussion of
"half-avalanche".)

On other words, for a mask which is a power of 2 minus 1,
(hash_64(x) & mask) == (hash_64(x & mask) & mask).

(This is NOT true if the mask has trailing zero bits.)

Given that a typical mask has far less than 64 bits set, you can
see how this will lead to huge numbers of collisions.


There are a few possible solutions depending on the typical
size of mask:

- Since you're returning only 32 bits at most, shift down 32 bits
  before masking.  This still has you ignoring some high-order bits,
  but it improves things a lot.  (One of my pending patches to hash_64()
  does exactly this.)
- bswap() the result.  May still ignore input bits if the mask is less
  than 8 bits, but it's getting decent.
- (Less efficient on some embedded platforms, but none likely to run Lustre
  in the first place): Use the low bits of the *high* word of the product.
  This has high bits of the hash value unaffected by low bits of the
  argument (except by carry propagation, which is unlikely to
  propagate many bits), but gets you what you need.
- Use a totally different hash function like jhash_2words() that
  achieves more mixing (but is slower).
- Change the Lustre code to use the high bits everywhere.

One alternative to jhash_2words() is Thomas Wang's
64-to-32 bit hash from
https://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm

Translated from Java to C, that's:

u32 hash6432shift(u64 key)
{
	key = ~key + (key << 18);	// key = (key << 18) - key - 1;
	key ^= key >> 31;
	key *= 21;		// key = key + (key << 2) + (key << 4);
	key ^= key >> 11;
	key += key << 6;
	key ^= key >> 22;
	return (u32) key;
}

> I can work the code so the mask can be turned into a bit offset
> but the mask will still be different. Will this difference
> break things?

I'm not sure I understand the question.  As long as the hashing is
always done the same way, the details of *how* the hashing is
done are opaque to the rest of the code.

The problem arises if someone assumes a mathematical relationship
between hash_64(x, bits) and hash_64(x, bits+1), such as when
growing or shrinking a dynamic hash table.  

If the calling code "knows" that hash64(x, mask) can be ANDed with
mask/2 to produce the same number as hash64(x, mask/2) then things
will make a mess.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-17  8:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-01 23:43 [lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash() George Spelvin
2016-05-12  0:28 ` James Simmons
2016-05-12  1:33   ` George Spelvin
2016-05-13 19:38     ` James Simmons
2016-05-13 20:25       ` George Spelvin
2016-05-13 20:35         ` Drokin, Oleg
2016-05-16 17:28         ` James Simmons
2016-05-17  8:52           ` George Spelvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.