* [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-17 10:51 ` David Laight
2015-03-15 10:44 ` [v1 PATCH 2/14] rhashtable: Introduce max_size/min_size Herbert Xu
` (19 subsequent siblings)
20 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
Keeping both size and shift is silly. We only need one.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/rhashtable.h | 2 --
lib/rhashtable.c | 5 ++---
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 1695378..f16e856 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -51,7 +51,6 @@ struct rhash_head {
* @size: Number of hash buckets
* @rehash: Current bucket being rehashed
* @hash_rnd: Random seed to fold into hash
- * @shift: Current size (1 << shift)
* @locks_mask: Mask to apply before accessing locks[]
* @locks: Array of spinlocks protecting individual buckets
* @walkers: List of active walkers
@@ -63,7 +62,6 @@ struct bucket_table {
unsigned int size;
unsigned int rehash;
u32 hash_rnd;
- u32 shift;
unsigned int locks_mask;
spinlock_t *locks;
struct list_head walkers;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index c523d3a..4a99c0c 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -162,7 +162,6 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
return NULL;
tbl->size = nbuckets;
- tbl->shift = ilog2(nbuckets);
if (alloc_bucket_locks(ht, tbl) < 0) {
bucket_table_free(tbl);
@@ -189,7 +188,7 @@ static bool rht_grow_above_75(const struct rhashtable *ht,
{
/* Expand table when exceeding 75% load */
return atomic_read(&ht->nelems) > (tbl->size / 4 * 3) &&
- (!ht->p.max_shift || tbl->shift < ht->p.max_shift);
+ (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift));
}
/**
@@ -202,7 +201,7 @@ static bool rht_shrink_below_30(const struct rhashtable *ht,
{
/* Shrink table beneath 30% load */
return atomic_read(&ht->nelems) < (tbl->size * 3 / 10) &&
- tbl->shift > ht->p.min_shift;
+ tbl->size > (1 << ht->p.min_shift);
}
static int rhashtable_rehash_one(struct rhashtable *ht, unsigned old_hash)
^ permalink raw reply related [flat|nested] 113+ messages in thread
* RE: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-15 10:44 ` [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table Herbert Xu
@ 2015-03-17 10:51 ` David Laight
2015-03-17 10:56 ` tgraf
0 siblings, 1 reply; 113+ messages in thread
From: David Laight @ 2015-03-17 10:51 UTC (permalink / raw)
To: 'Herbert Xu', David Miller, tgraf, netdev, Eric Dumazet
From: Herbert Xu
> Sent: 15 March 2015 10:44
> Keeping both size and shift is silly. We only need one.
...
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -162,7 +162,6 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
> return NULL;
>
> tbl->size = nbuckets;
> - tbl->shift = ilog2(nbuckets);
>
> if (alloc_bucket_locks(ht, tbl) < 0) {
> bucket_table_free(tbl);
> @@ -189,7 +188,7 @@ static bool rht_grow_above_75(const struct rhashtable *ht,
> {
> /* Expand table when exceeding 75% load */
> return atomic_read(&ht->nelems) > (tbl->size / 4 * 3) &&
> - (!ht->p.max_shift || tbl->shift < ht->p.max_shift);
> + (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift));
Looks like you could pre-calculate the 'grow_at' size.
The test above would then be:
return atomic_read(&ht->nelems > tbl->grow_at_size);
Similarly for the shrink.
David
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 10:51 ` David Laight
@ 2015-03-17 10:56 ` tgraf
2015-03-17 11:00 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: tgraf @ 2015-03-17 10:56 UTC (permalink / raw)
To: David Laight; +Cc: 'Herbert Xu', David Miller, netdev, Eric Dumazet
On 03/17/15 at 10:51am, David Laight wrote:
> Looks like you could pre-calculate the 'grow_at' size.
> The test above would then be:
> return atomic_read(&ht->nelems > tbl->grow_at_size);
>
> Similarly for the shrink.
Given the discussions, the grow decision will likely change to
a max bucket length limit anyway.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 10:56 ` tgraf
@ 2015-03-17 11:00 ` Herbert Xu
2015-03-17 11:22 ` tgraf
2015-03-17 11:22 ` David Laight
0 siblings, 2 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-17 11:00 UTC (permalink / raw)
To: tgraf; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On Tue, Mar 17, 2015 at 10:56:57AM +0000, tgraf@suug.ch wrote:
>
> Given the discussions, the grow decision will likely change to
> a max bucket length limit anyway.
Actually no. In my pathces the chain length is only used to
force an immediate rehash. Growing is still based on the number
of elements.
The reason is that the maximum (not average) chain length actually
grows with the hash table size, even at 75% utilisation.
So unless we want ever decreasing utilisation as the table grows,
we have to cope with a few chains with many elements. The limit
I'm currently using is 16 which shouldn't be hit even if you had
a 2^32 table (which you currently cannot).
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 11:00 ` Herbert Xu
@ 2015-03-17 11:22 ` tgraf
2015-03-17 11:27 ` Herbert Xu
2015-03-17 11:22 ` David Laight
1 sibling, 1 reply; 113+ messages in thread
From: tgraf @ 2015-03-17 11:22 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On 03/17/15 at 10:00pm, Herbert Xu wrote:
> On Tue, Mar 17, 2015 at 10:56:57AM +0000, tgraf@suug.ch wrote:
> >
> > Given the discussions, the grow decision will likely change to
> > a max bucket length limit anyway.
>
> Actually no. In my pathces the chain length is only used to
> force an immediate rehash. Growing is still based on the number
> of elements.
>
> The reason is that the maximum (not average) chain length actually
> grows with the hash table size, even at 75% utilisation.
>
> So unless we want ever decreasing utilisation as the table grows,
> we have to cope with a few chains with many elements. The limit
> I'm currently using is 16 which shouldn't be hit even if you had
> a 2^32 table (which you currently cannot).
Not sure I understand the downside of a bucket length based grow
decision with optional forced rehashing plus synchroneous table
realloc if we hit a 2nd watermark as you proposed earlier. Shouldn't
we consider deterministic lookup and insert behaviour more important
than overall table utilization? Given the rehashing, the grow decision
should not be attackable.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 11:22 ` tgraf
@ 2015-03-17 11:27 ` Herbert Xu
2015-03-17 11:57 ` tgraf
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-17 11:27 UTC (permalink / raw)
To: tgraf; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On Tue, Mar 17, 2015 at 11:22:03AM +0000, tgraf@suug.ch wrote:
>
> Not sure I understand the downside of a bucket length based grow
> decision with optional forced rehashing plus synchroneous table
> realloc if we hit a 2nd watermark as you proposed earlier. Shouldn't
> we consider deterministic lookup and insert behaviour more important
> than overall table utilization? Given the rehashing, the grow decision
> should not be attackable.
Do you really want to double the table size when 0.1% of the buckets
have a chain length > 4 but still < 16?
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 11:27 ` Herbert Xu
@ 2015-03-17 11:57 ` tgraf
2015-03-17 12:13 ` David Laight
0 siblings, 1 reply; 113+ messages in thread
From: tgraf @ 2015-03-17 11:57 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On 03/17/15 at 10:27pm, Herbert Xu wrote:
> On Tue, Mar 17, 2015 at 11:22:03AM +0000, tgraf@suug.ch wrote:
> >
> > Not sure I understand the downside of a bucket length based grow
> > decision with optional forced rehashing plus synchroneous table
> > realloc if we hit a 2nd watermark as you proposed earlier. Shouldn't
> > we consider deterministic lookup and insert behaviour more important
> > than overall table utilization? Given the rehashing, the grow decision
> > should not be attackable.
>
> Do you really want to double the table size when 0.1% of the buckets
> have a chain length > 4 but still < 16?
If we constantly hit that bucket because we are handling just a few
TCP flows it would be worth to double the size & rehash to avoid the
additional cache misses of the linked list.
Although a limit of 4 would be too high. Ideally we should resize and
rehash when we add the 2nd entry to a bucket to stay < 100% utilization.
It seems likely though that we'll always have a bucket with >=2
entries so we would end up constantly doubling and rehashing. This is
the only thing that speaks for a table wide nelems counters in my
opinion.
Another option would be to continue resizing & rehashing as long as we
have at least one bucket with >= 4 entries but allow a table size
dependant limit of (length > 1 && length < 4) buckets. This may be
overengineered again though ;-)
^ permalink raw reply [flat|nested] 113+ messages in thread
* RE: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 11:57 ` tgraf
@ 2015-03-17 12:13 ` David Laight
2015-03-17 12:18 ` 'tgraf@suug.ch'
2015-03-17 12:20 ` Herbert Xu
0 siblings, 2 replies; 113+ messages in thread
From: David Laight @ 2015-03-17 12:13 UTC (permalink / raw)
To: 'tgraf@suug.ch', Herbert Xu; +Cc: David Miller, netdev, Eric Dumazet
From: Thomas Graf
> Sent: 17 March 2015 11:58
...
> > Do you really want to double the table size when 0.1% of the buckets
> > have a chain length > 4 but still < 16?
>
> If we constantly hit that bucket because we are handling just a few
> TCP flows it would be worth to double the size & rehash to avoid the
> additional cache misses of the linked list.
>
> Although a limit of 4 would be too high. Ideally we should resize and
> rehash when we add the 2nd entry to a bucket to stay < 100% utilization.
> It seems likely though that we'll always have a bucket with >=2
> entries so we would end up constantly doubling and rehashing. This is
> the only thing that speaks for a table wide nelems counters in my
> opinion.
I think you are seriously overestimating the 'efficiency' of the hash function.
And not doing the 'birthday paradox' maths at all.
The only way you'll get a 'hash' that good is if you can pick the input
value in order to generate a perfect hash.
However you aren't going to manage that for inbound TCP connections since
none of the inputs to the hash can be chosen by the listening system
(unless IPv6 has something than can help you).
You may have to live with a few % of the items being on long chains.
Maybe count the number of items on chains longer than (say) 4 and
rehash or increase the table size if this exceeds a few % of the
table size.
(Or count the number of items that are further than 4 from the start
of the hash chain.)
David
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 12:13 ` David Laight
@ 2015-03-17 12:18 ` 'tgraf@suug.ch'
2015-03-17 12:20 ` Herbert Xu
1 sibling, 0 replies; 113+ messages in thread
From: 'tgraf@suug.ch' @ 2015-03-17 12:18 UTC (permalink / raw)
To: David Laight; +Cc: Herbert Xu, David Miller, netdev, Eric Dumazet
On 03/17/15 at 12:13pm, David Laight wrote:
> From: Thomas Graf
> > Sent: 17 March 2015 11:58
> ...
> > > Do you really want to double the table size when 0.1% of the buckets
> > > have a chain length > 4 but still < 16?
> >
> > If we constantly hit that bucket because we are handling just a few
> > TCP flows it would be worth to double the size & rehash to avoid the
> > additional cache misses of the linked list.
> >
> > Although a limit of 4 would be too high. Ideally we should resize and
> > rehash when we add the 2nd entry to a bucket to stay < 100% utilization.
> > It seems likely though that we'll always have a bucket with >=2
> > entries so we would end up constantly doubling and rehashing. This is
> > the only thing that speaks for a table wide nelems counters in my
> > opinion.
>
> I think you are seriously overestimating the 'efficiency' of the hash function.
> And not doing the 'birthday paradox' maths at all.
>
> The only way you'll get a 'hash' that good is if you can pick the input
> value in order to generate a perfect hash.
> However you aren't going to manage that for inbound TCP connections since
> none of the inputs to the hash can be chosen by the listening system
> (unless IPv6 has something than can help you).
>
> You may have to live with a few % of the items being on long chains.
> Maybe count the number of items on chains longer than (say) 4 and
> rehash or increase the table size if this exceeds a few % of the
> table size.
> (Or count the number of items that are further than 4 from the start
> of the hash chain.)
Did you read my email all the way? You basically omitted to quote and
then rephrased the 2nd half of my email.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 12:13 ` David Laight
2015-03-17 12:18 ` 'tgraf@suug.ch'
@ 2015-03-17 12:20 ` Herbert Xu
2015-03-17 12:40 ` 'tgraf@suug.ch'
1 sibling, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-17 12:20 UTC (permalink / raw)
To: David Laight; +Cc: 'tgraf@suug.ch', David Miller, netdev, Eric Dumazet
On Tue, Mar 17, 2015 at 12:13:42PM +0000, David Laight wrote:
> From: Thomas Graf
> > Sent: 17 March 2015 11:58
> ...
> > > Do you really want to double the table size when 0.1% of the buckets
> > > have a chain length > 4 but still < 16?
> >
> > If we constantly hit that bucket because we are handling just a few
> > TCP flows it would be worth to double the size & rehash to avoid the
> > additional cache misses of the linked list.
> >
> > Although a limit of 4 would be too high. Ideally we should resize and
> > rehash when we add the 2nd entry to a bucket to stay < 100% utilization.
> > It seems likely though that we'll always have a bucket with >=2
> > entries so we would end up constantly doubling and rehashing. This is
> > the only thing that speaks for a table wide nelems counters in my
> > opinion.
>
> I think you are seriously overestimating the 'efficiency' of the hash function.
> And not doing the 'birthday paradox' maths at all.
Agreed. Thomas, an easy test to do is to pump the integers from
0 to 65535 into jhash, then mask it with 65535 and run sort |
uniq -c | sort -n on it, I think you'll see what we're talking
about.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 12:20 ` Herbert Xu
@ 2015-03-17 12:40 ` 'tgraf@suug.ch'
2015-03-17 13:06 ` David Laight
2015-03-17 21:56 ` Herbert Xu
0 siblings, 2 replies; 113+ messages in thread
From: 'tgraf@suug.ch' @ 2015-03-17 12:40 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On 03/17/15 at 11:20pm, Herbert Xu wrote:
> On Tue, Mar 17, 2015 at 12:13:42PM +0000, David Laight wrote:
> > From: Thomas Graf
> > > Sent: 17 March 2015 11:58
> > ...
> > > > Do you really want to double the table size when 0.1% of the buckets
> > > > have a chain length > 4 but still < 16?
> > >
> > > If we constantly hit that bucket because we are handling just a few
> > > TCP flows it would be worth to double the size & rehash to avoid the
> > > additional cache misses of the linked list.
> > >
> > > Although a limit of 4 would be too high. Ideally we should resize and
> > > rehash when we add the 2nd entry to a bucket to stay < 100% utilization.
> > > It seems likely though that we'll always have a bucket with >=2
> > > entries so we would end up constantly doubling and rehashing. This is
> > > the only thing that speaks for a table wide nelems counters in my
> > > opinion.
> >
> > I think you are seriously overestimating the 'efficiency' of the hash function.
> > And not doing the 'birthday paradox' maths at all.
>
> Agreed. Thomas, an easy test to do is to pump the integers from
> 0 to 65535 into jhash, then mask it with 65535 and run sort |
> uniq -c | sort -n on it, I think you'll see what we're talking
> about.
I'm not claiming perfect hash functions and this is exactly why I
think average utilization is not an optimal growth criteria because
it gives very limited view into the actual chain lengths.
What you describe above is a 100% utilization scenario. Initially
we talked about 0.1% utilization and whether to resize & rehash if a
single chain has length > 4. My answer is: yes we should resize &
rehash or at least rehash in that case.
My point here is that a chain length of 4 may be a serious
performance bottleneck already and that it might be worth to try
and detect bad hashing distribution and attempt to fix it at an
earlier stage while ruling out the possibility of endless rehashes.
^ permalink raw reply [flat|nested] 113+ messages in thread
* RE: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 12:40 ` 'tgraf@suug.ch'
@ 2015-03-17 13:06 ` David Laight
2015-03-17 21:56 ` Herbert Xu
1 sibling, 0 replies; 113+ messages in thread
From: David Laight @ 2015-03-17 13:06 UTC (permalink / raw)
To: 'tgraf@suug.ch', Herbert Xu; +Cc: David Miller, netdev, Eric Dumazet
From: Thomas Graf
> Sent: 17 March 2015 12:40
...
> I'm not claiming perfect hash functions and this is exactly why I
> think average utilization is not an optimal growth criteria because
> it gives very limited view into the actual chain lengths.
If you assume that all the entries are being looked up equally
often the interesting number ought to be the number of failed compares.
So you want to sum 'length * (length - 1)/2' and compare against the
total number of items (or the table size).
> What you describe above is a 100% utilization scenario. Initially
> we talked about 0.1% utilization and whether to resize & rehash if a
> single chain has length > 4. My answer is: yes we should resize &
> rehash or at least rehash in that case.
0.1% utilisation is about as realistic as 100%.
50-75% is probably a reasonable limit.
This will give some short chains.
> My point here is that a chain length of 4 may be a serious
> performance bottleneck already and that it might be worth to try
> and detect bad hashing distribution and attempt to fix it at an
> earlier stage while ruling out the possibility of endless rehashes.
If you have 4 items and they are all on one chain they I suspect that
nothing you do will actually split them.
OTOH if you have 1000 items and one chain of length 4 it really doesn't
matter. Unless, of course, someone arranges to flood the last item with
traffic - in which case you are going to lose anyway.
David
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 12:40 ` 'tgraf@suug.ch'
2015-03-17 13:06 ` David Laight
@ 2015-03-17 21:56 ` Herbert Xu
2015-03-18 9:51 ` 'tgraf@suug.ch'
1 sibling, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-17 21:56 UTC (permalink / raw)
To: 'tgraf@suug.ch'; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On Tue, Mar 17, 2015 at 12:40:12PM +0000, 'tgraf@suug.ch' wrote:
>
> > Agreed. Thomas, an easy test to do is to pump the integers from
> > 0 to 65535 into jhash, then mask it with 65535 and run sort |
> > uniq -c | sort -n on it, I think you'll see what we're talking
> > about.
>
> I'm not claiming perfect hash functions and this is exactly why I
> think average utilization is not an optimal growth criteria because
> it gives very limited view into the actual chain lengths.
>
> What you describe above is a 100% utilization scenario. Initially
Actually 75% is just as bad. To test that just repeat my script
above and add head -n $((65536 / 4 * 3)) before the first sort.
My point is that to get below anything like a chain length limit
of 2 (or even 4), your hash table is going to be mostly empty.
OK 0.1% might be an exaggeration but it's certainly nowhere near 50%
as your hash table size grows.
Please actually try out my test, here is a C program that will help
you do it:
#include <stdio.h>
typedef unsigned char u8;
typedef unsigned int u32;
static inline u32 rol32(u32 word, unsigned int shift)
{
return (word << shift) | (word >> (32 - shift));
}
/* Best hash sizes are of power of two */
#define jhash_size(n) ((u32)1<<(n))
/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
#define jhash_mask(n) (jhash_size(n)-1)
/* __jhash_mix -- mix 3 32-bit values reversibly. */
#define __jhash_mix(a, b, c) \
{ \
a -= c; a ^= rol32(c, 4); c += b; \
b -= a; b ^= rol32(a, 6); a += c; \
c -= b; c ^= rol32(b, 8); b += a; \
a -= c; a ^= rol32(c, 16); c += b; \
b -= a; b ^= rol32(a, 19); a += c; \
c -= b; c ^= rol32(b, 4); b += a; \
}
/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
#define __jhash_final(a, b, c) \
{ \
c ^= b; c -= rol32(b, 14); \
a ^= c; a -= rol32(c, 11); \
b ^= a; b -= rol32(a, 25); \
c ^= b; c -= rol32(b, 16); \
a ^= c; a -= rol32(c, 4); \
b ^= a; b -= rol32(a, 14); \
c ^= b; c -= rol32(b, 24); \
}
#define __get_unaligned_cpu32(x) (*(u32 *)(x))
/* An arbitrary initial parameter */
#define JHASH_INITVAL 0xdeadbeef
/* jhash - hash an arbitrary key
* @k: sequence of bytes as key
* @length: the length of the key
* @initval: the previous hash, or an arbitray value
*
* The generic version, hashes an arbitrary sequence of bytes.
* No alignment or length assumptions are made about the input key.
*
* Returns the hash value of the key. The result depends on endianness.
*/
static inline u32 jhash(const void *key, u32 length, u32 initval)
{
u32 a, b, c;
const u8 *k = key;
/* Set up the internal state */
a = b = c = JHASH_INITVAL + length + initval;
/* All but the last block: affect some 32 bits of (a,b,c) */
while (length > 12) {
a += __get_unaligned_cpu32(k);
b += __get_unaligned_cpu32(k + 4);
c += __get_unaligned_cpu32(k + 8);
__jhash_mix(a, b, c);
length -= 12;
k += 12;
}
/* Last block: affect all 32 bits of (c) */
/* All the case statements fall through */
switch (length) {
case 12: c += (u32)k[11]<<24;
case 11: c += (u32)k[10]<<16;
case 10: c += (u32)k[9]<<8;
case 9: c += k[8];
case 8: b += (u32)k[7]<<24;
case 7: b += (u32)k[6]<<16;
case 6: b += (u32)k[5]<<8;
case 5: b += k[4];
case 4: a += (u32)k[3]<<24;
case 3: a += (u32)k[2]<<16;
case 2: a += (u32)k[1]<<8;
case 1: a += k[0];
__jhash_final(a, b, c);
case 0: /* Nothing left to add */
break;
}
return c;
}
/* jhash2 - hash an array of u32's
* @k: the key which must be an array of u32's
* @length: the number of u32's in the key
* @initval: the previous hash, or an arbitray value
*
* Returns the hash value of the key.
*/
static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
{
u32 a, b, c;
/* Set up the internal state */
a = b = c = JHASH_INITVAL + (length<<2) + initval;
/* Handle most of the key */
while (length > 3) {
a += k[0];
b += k[1];
c += k[2];
__jhash_mix(a, b, c);
length -= 3;
k += 3;
}
/* Handle the last 3 u32's: all the case statements fall through */
switch (length) {
case 3: c += k[2];
case 2: b += k[1];
case 1: a += k[0];
__jhash_final(a, b, c);
case 0: /* Nothing left to add */
break;
}
return c;
}
/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
{
a += JHASH_INITVAL;
b += JHASH_INITVAL;
c += initval;
__jhash_final(a, b, c);
return c;
}
static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
{
return jhash_3words(a, b, 0, initval);
}
static inline u32 jhash_1word(u32 a, u32 initval)
{
return jhash_3words(a, 0, 0, initval);
}
int main(int argc, char **argv)
{
int i;
struct {
void *s;
void *t;
u32 l;
} k = { .s = 0 };
int total;
total = atoi(argv[1]);
for (i = 0; i < total; i++) {
k.l = i;
printf("0x%x\n", jhash2((u32 *)&k, 3, 12345) & (total - 1));
}
return 0;
}
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 21:56 ` Herbert Xu
@ 2015-03-18 9:51 ` 'tgraf@suug.ch'
2015-03-18 9:55 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: 'tgraf@suug.ch' @ 2015-03-18 9:51 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On 03/18/15 at 08:56am, Herbert Xu wrote:
> Actually 75% is just as bad. To test that just repeat my script
> above and add head -n $((65536 / 4 * 3)) before the first sort.
>
> My point is that to get below anything like a chain length limit
> of 2 (or even 4), your hash table is going to be mostly empty.
> OK 0.1% might be an exaggeration but it's certainly nowhere near 50%
> as your hash table size grows.
>
> Please actually try out my test, here is a C program that will help
> you do it:
Thanks for providing that C program. I played around with it and
noticed a small typo which I fixed up. I'm attaching the version
I've used at the end of the mail.
Maybe there is a thinko on my part but the results I'm getting are
very impressive. I have a very hard time to produce more than a
single hash conflict for sequences up to 0..2^27.
I also tried with a pseudo random value for the u32 part of the key
instead of the sequence number and got the same results.
I'm running it like this:
for i in $(seq 1 27); do \
echo $i: && ./jhash $((2**$i)) | uniq -D | uniq -c \
done
1:
2:
3:
3 0x1
4:
2 0xc
5:
6:
2 0x2b
7:
8:
9:
10:
11:
2 0x1c9
12:
13:
2 0x9c7
2 0x4e9
14:
15:
2 0x5c8
16:
17:
18:
2 0x34cb2
2 0x21fd0
19:
2 0x61fd0
2 0x6e82f
20:
2 0x61fd0
2 0x6e82f
2 0xfd571
21:
2 0x1a0e02
2 0x1776ea
22:
2 0x379099
23:
2 0x2650b6
2 0x6dc5b5
2 0x648fe1
2 0x543446
24:
2 0x8d60e0
2 0x8726e5
25:
2 0x1225f30
26:
2 0x3a136f4
27:
--- jhash.c ---
#include <stdio.h>
typedef unsigned char u8;
typedef unsigned int u32;
static inline u32 rol32(u32 word, unsigned int shift)
{
return (word << shift) | (word >> (32 - shift));
}
/* Best hash sizes are of power of two */
#define jhash_size(n) ((u32)1<<(n))
/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
#define jhash_mask(n) (jhash_size(n)-1)
/* __jhash_mix -- mix 3 32-bit values reversibly. */
#define __jhash_mix(a, b, c) \
{ \
a -= c; a ^= rol32(c, 4); c += b; \
b -= a; b ^= rol32(a, 6); a += c; \
c -= b; c ^= rol32(b, 8); b += a; \
a -= c; a ^= rol32(c, 16); c += b; \
b -= a; b ^= rol32(a, 19); a += c; \
c -= b; c ^= rol32(b, 4); b += a; \
}
/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
#define __jhash_final(a, b, c) \
{ \
c ^= b; c -= rol32(b, 14); \
a ^= c; a -= rol32(c, 11); \
b ^= a; b -= rol32(a, 25); \
c ^= b; c -= rol32(b, 16); \
a ^= c; a -= rol32(c, 4); \
b ^= a; b -= rol32(a, 14); \
c ^= b; c -= rol32(b, 24); \
}
#define __get_unaligned_cpu32(x) (*(u32 *)(x))
/* An arbitrary initial parameter */
#define JHASH_INITVAL 0xdeadbeef
/* jhash - hash an arbitrary key
* @k: sequence of bytes as key
* @length: the length of the key
* @initval: the previous hash, or an arbitray value
*
* The generic version, hashes an arbitrary sequence of bytes.
* No alignment or length assumptions are made about the input key.
*
* Returns the hash value of the key. The result depends on endianness.
*/
static inline u32 jhash(const void *key, u32 length, u32 initval)
{
u32 a, b, c;
const u8 *k = key;
/* Set up the internal state */
a = b = c = JHASH_INITVAL + length + initval;
/* All but the last block: affect some 32 bits of (a,b,c) */
while (length > 12) {
a += __get_unaligned_cpu32(k);
b += __get_unaligned_cpu32(k + 4);
c += __get_unaligned_cpu32(k + 8);
__jhash_mix(a, b, c);
length -= 12;
k += 12;
}
/* Last block: affect all 32 bits of (c) */
/* All the case statements fall through */
switch (length) {
case 12: c += (u32)k[11]<<24;
case 11: c += (u32)k[10]<<16;
case 10: c += (u32)k[9]<<8;
case 9: c += k[8];
case 8: b += (u32)k[7]<<24;
case 7: b += (u32)k[6]<<16;
case 6: b += (u32)k[5]<<8;
case 5: b += k[4];
case 4: a += (u32)k[3]<<24;
case 3: a += (u32)k[2]<<16;
case 2: a += (u32)k[1]<<8;
case 1: a += k[0];
__jhash_final(a, b, c);
case 0: /* Nothing left to add */
break;
}
return c;
}
/* jhash2 - hash an array of u32's
* @k: the key which must be an array of u32's
* @length: the number of u32's in the key
* @initval: the previous hash, or an arbitray value
*
* Returns the hash value of the key.
*/
static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
{
u32 a, b, c;
/* Set up the internal state */
a = b = c = JHASH_INITVAL + (length<<2) + initval;
/* Handle most of the key */
while (length > 3) {
a += k[0];
b += k[1];
c += k[2];
__jhash_mix(a, b, c);
length -= 3;
k += 3;
}
/* Handle the last 3 u32's: all the case statements fall through */
switch (length) {
case 3: c += k[2];
case 2: b += k[1];
case 1: a += k[0];
__jhash_final(a, b, c);
case 0: /* Nothing left to add */
break;
}
return c;
}
/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
{
a += JHASH_INITVAL;
b += JHASH_INITVAL;
c += initval;
__jhash_final(a, b, c);
return c;
}
static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
{
return jhash_3words(a, b, 0, initval);
}
static inline u32 jhash_1word(u32 a, u32 initval)
{
return jhash_3words(a, 0, 0, initval);
}
int main(int argc, char **argv)
{
int i;
struct {
void *s;
void *t;
u32 l;
} k = { .s = 0 };
int total;
u32 initval;
total = atoi(argv[1]);
srandom(time(0));
initval = random();
for (i = 0; i < total; i++) {
k.l = i;
printf("0x%x\n", jhash2((u32 *)&k, sizeof(k)/4, initval) & (total - 1));
}
return 0;
}
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-18 9:51 ` 'tgraf@suug.ch'
@ 2015-03-18 9:55 ` Herbert Xu
2015-03-18 10:08 ` 'tgraf@suug.ch'
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-18 9:55 UTC (permalink / raw)
To: 'tgraf@suug.ch'; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On Wed, Mar 18, 2015 at 09:51:02AM +0000, 'tgraf@suug.ch' wrote:
>
> I'm running it like this:
> for i in $(seq 1 27); do \
> echo $i: && ./jhash $((2**$i)) | uniq -D | uniq -c \
uniq -D doesn't do what you think it does:
$ { echo a; echo b; echo a; } | uniq -D
$
So you should use sort | uniq -c
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-18 9:55 ` Herbert Xu
@ 2015-03-18 10:08 ` 'tgraf@suug.ch'
2015-03-18 10:12 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: 'tgraf@suug.ch' @ 2015-03-18 10:08 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On 03/18/15 at 08:55pm, Herbert Xu wrote:
> On Wed, Mar 18, 2015 at 09:51:02AM +0000, 'tgraf@suug.ch' wrote:
> >
> > I'm running it like this:
> > for i in $(seq 1 27); do \
> > echo $i: && ./jhash $((2**$i)) | uniq -D | uniq -c \
>
> uniq -D doesn't do what you think it does:
>
> $ { echo a; echo b; echo a; } | uniq -D
> $
>
> So you should use sort | uniq -c
Thanks. The important keyword is "adjacent" here ;-)
With this fixed up I can see what you mean. So if we are
to only do a chain length based decision, the limit would
have to grow together with the table size.
for i in $(seq 1 27); do
echo $i: && ./jhash $((2**$i)) | sort | uniq -D | uniq -c | sort -n -r | head -5;
done
1:
2 0x0
2:
3 0x0
3:
3 0x0
2 0x2
4:
2 0xb
2 0x8
2 0x4
2 0x2
2 0x0
5:
3 0xe
3 0x13
2 0xc
2 0xa
2 0x8
6:
3 0x38
3 0x37
3 0x32
3 0x2f
3 0x2e
7:
4 0x37
3 0x7d
3 0x79
3 0x6f
3 0x69
8:
6 0xb7
5 0x79
4 0xf4
4 0xe4
4 0xa3
9:
5 0xb7
5 0x9e
5 0x15b
5 0x13b
4 0xdd
10:
5 0xa4
5 0x79
5 0x29e
5 0x22a
4 0xf4
11:
5 0x92
5 0x4a4
5 0x479
5 0x47
5 0x379
12:
6 0xf1
6 0x751
5 0xf00
5 0xe26
5 0xc26
13:
7 0xf1
6 0x520
6 0x1e06
6 0x1c44
6 0x11ac
14:
7 0x3174
7 0x2f12
6 0x520
6 0x3763
6 0x3615
15:
7 0xf4
7 0x7d0b
7 0x179
6 0x807
6 0x69c9
16:
7 0xf516
7 0xe659
7 0xdf00
7 0x5f5a
7 0x5cc3
17:
7 0xe659
7 0xde5a
7 0xcaf
7 0x9cce
7 0x7fe3
18:
9 0x17f9b
8 0x3852d
7 0x992b
7 0x5e38
7 0x3d39e
19:
8 0x5f22b
8 0x47d66
8 0x436ba
8 0x182aa
7 0xe095
20:
8 0xf87dd
8 0xf464d
8 0xef51d
8 0xc08d4
8 0x91a96
21:
9 0xa7a90
9 0x1fc5ad
8 0xfae2f
8 0xd4123
8 0xb5686
22:
9 0x97b11
9 0x3fcfa3
8 0xffa9e
8 0xfa437
8 0xef7e6
23:
10 0x3b3115
9 0x5cb5ba
9 0x568fbf
9 0x54d790
9 0x4c2393
24:
10 0x86373a
10 0x35794f
9 0xf6d21e
9 0xde5301
9 0xb36b9d
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-18 10:08 ` 'tgraf@suug.ch'
@ 2015-03-18 10:12 ` Herbert Xu
2015-03-18 10:26 ` David Laight
2015-03-18 10:44 ` 'tgraf@suug.ch'
0 siblings, 2 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-18 10:12 UTC (permalink / raw)
To: 'tgraf@suug.ch'; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On Wed, Mar 18, 2015 at 10:08:12AM +0000, 'tgraf@suug.ch' wrote:
>
> With this fixed up I can see what you mean. So if we are
> to only do a chain length based decision, the limit would
> have to grow together with the table size.
All we could just use a flat limit of 16 since our maximum table
size is 2^32 (actually a bit less) and 16 should be plenty for
that.
Remember this limit is only there to detect when somebody has
stolen our secret is trying to DoS us. So if we can keep them
under 16 per-bucket it should be sufficient to defeat any attack.
Of course I will also add a patch to limit the number of elements
to the table size (so maximum utilisation is 100%). This will come
after we allow insertions to fail.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* RE: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-18 10:12 ` Herbert Xu
@ 2015-03-18 10:26 ` David Laight
2015-03-18 10:44 ` 'tgraf@suug.ch'
1 sibling, 0 replies; 113+ messages in thread
From: David Laight @ 2015-03-18 10:26 UTC (permalink / raw)
To: 'Herbert Xu', 'tgraf@suug.ch'
Cc: David Miller, netdev, Eric Dumazet
From: Herbert Xu
..
> Of course I will also add a patch to limit the number of elements
> to the table size (so maximum utilisation is 100%). This will come
> after we allow insertions to fail.
There may be some uses where short hash chains are acceptable.
In which case a utilisation way above 100% may make sense.
If a table is likely to have repeated lookups for the same item
then it can make sense to cache the last looked up item for each chain.
This cached value can be written without any locking - the only
place care is needed is ensuring it doesn't reference an item
that has just been deleted.
(You'd want a separate array to not dirty the cache lines containing
the head pointers. Although for a large table they are misses anyway.)
David
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-18 10:12 ` Herbert Xu
2015-03-18 10:26 ` David Laight
@ 2015-03-18 10:44 ` 'tgraf@suug.ch'
1 sibling, 0 replies; 113+ messages in thread
From: 'tgraf@suug.ch' @ 2015-03-18 10:44 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Laight, David Miller, netdev, Eric Dumazet
On 03/18/15 at 09:12pm, Herbert Xu wrote:
> On Wed, Mar 18, 2015 at 10:08:12AM +0000, 'tgraf@suug.ch' wrote:
> >
> > With this fixed up I can see what you mean. So if we are
> > to only do a chain length based decision, the limit would
> > have to grow together with the table size.
>
> All we could just use a flat limit of 16 since our maximum table
> size is 2^32 (actually a bit less) and 16 should be plenty for
> that.
>
> Remember this limit is only there to detect when somebody has
> stolen our secret is trying to DoS us. So if we can keep them
> under 16 per-bucket it should be sufficient to defeat any attack.
Agreed, that is certainly sufficient to avoid attacks. I didn't
want to give up on getting rid of the need for a table wide
nelemes *inside* rhashtable if we have to maintain chain length
anyway but I can see the difficulties.
The only approach that seems to work is to count the number of
buckets with a chain length above a limit that is relative to the
table size or something alike which requires to count on table
level and if we have to do that we might as well just count the
number of elements in the table.
> Of course I will also add a patch to limit the number of elements
> to the table size (so maximum utilisation is 100%). This will come
> after we allow insertions to fail.
I'm also attaching distribution of chain length for table sizes
for completion's sake.
Percentage of buckets with more than 1 entry:
1: 0%
2: 50%
3: 25%
4: 25%
5: 28%
6: 28%
7: 31%
8: 26%
9: 25%
10: 27%
11: 26%
12: 26%
13: 26%
14: 26%
15: 26%
16: 26%
17: 26%
18: 26%
19: 26%
20: 26%
21: 26%
22: 26%
23: 26%
Distribution of chain lengths:
1:
1 2
2:
2 2
3:
1 2
1 3
4:
3 2
1 3
5:
6 2
4 3
6:
11 2
4 3
1 4
7:
27 2
8 3
2 4
8:
50 2
15 3
3 4
1 5
9:
101 2
33 3
6 4
1 5
10:
201 2
61 3
17 4
2 5
11:
361 2
134 3
29 4
5 5
1 6
12:
784 2
242 3
49 4
14 5
3 6
13:
1552 2
472 3
118 4
28 5
5 6
1 7
14:
2965 2
996 3
262 4
51 5
9 6
1 7
1 8
15:
6010 2
2029 3
510 4
90 5
21 6
2 7
16:
11993 2
4068 3
1014 4
175 5
40 6
5 7
1 8
17:
24123 2
8100 3
1969 4
403 5
79 6
10 7
3 8
18:
48127 2
16145 3
3930 4
825 5
137 6
25 7
1 8
1 9
19:
96558 2
32250 3
7990 4
1619 5
264 6
40 7
7 8
20:
193267 2
64476 3
16051 4
3140 5
544 6
71 7
9 8
1 9
21:
385293 2
128850 3
32366 4
6344 5
1017 6
128 7
21 8
2 9
22:
769535 2
257779 3
64534 4
13052 5
2177 6
322 7
25 8
1 9
23:
1540606 2
514743 3
130192 4
26207 5
4403 6
635 7
73 8
11 9
24:
3073982 2
1032727 3
262044 4
53475 5
9179 6
1353 7
184 8
14 9
2 10
1 11
25:
6124258 2
2071733 3
534130 4
111862 5
19754 6
3030 7
419 8
56 9
5 10
1 12
^ permalink raw reply [flat|nested] 113+ messages in thread
* RE: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table
2015-03-17 11:00 ` Herbert Xu
2015-03-17 11:22 ` tgraf
@ 2015-03-17 11:22 ` David Laight
2015-03-17 11:25 ` Herbert Xu
1 sibling, 1 reply; 113+ messages in thread
From: David Laight @ 2015-03-17 11:22 UTC (permalink / raw)
To: 'Herbert Xu', tgraf; +Cc: David Miller, netdev, Eric Dumazet
From: Herbert Xu
> Sent: 17 March 2015 11:01
> On Tue, Mar 17, 2015 at 10:56:57AM +0000, tgraf@suug.ch wrote:
> >
> > Given the discussions, the grow decision will likely change to
> > a max bucket length limit anyway.
>
> Actually no. In my pathces the chain length is only used to
> force an immediate rehash. Growing is still based on the number
> of elements.
>
> The reason is that the maximum (not average) chain length actually
> grows with the hash table size, even at 75% utilisation.
That doesn't surprise me.
But won't the rehashed table be just as likely to have a long list?
So you are likely to get an immediate rehash?
David
^ permalink raw reply [flat|nested] 113+ messages in thread
* [v1 PATCH 2/14] rhashtable: Introduce max_size/min_size
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-15 15:12 ` Sergei Shtylyov
2015-03-15 10:44 ` [v1 PATCH 3/14] netlink: Use rhashtable max_size instead of max_shift Herbert Xu
` (18 subsequent siblings)
20 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch adds the parameters max_size and min_size which are
meant to replace max_shift and min_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/rhashtable.h | 4 ++++
lib/rhashtable.c | 12 ++++++++----
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index f16e856..81267fe 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -85,6 +85,8 @@ struct rhashtable;
* @head_offset: Offset of rhash_head in struct to be hashed
* @max_shift: Maximum number of shifts while expanding
* @min_shift: Minimum number of shifts while shrinking
+ * @max_size: Maximum size while expanding
+ * @min_size: Minimum size while shrinking
* @nulls_base: Base value to generate nulls marker
* @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
* @hashfn: Function to hash key
@@ -97,6 +99,8 @@ struct rhashtable_params {
size_t head_offset;
size_t max_shift;
size_t min_shift;
+ unsigned int max_size;
+ unsigned int min_size;
u32 nulls_base;
size_t locks_mul;
rht_hashfn_t hashfn;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 4a99c0c..5101e18 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -27,7 +27,7 @@
#include <linux/err.h>
#define HASH_DEFAULT_SIZE 64UL
-#define HASH_MIN_SIZE 4UL
+#define HASH_MIN_SIZE 4U
#define BUCKET_LOCKS_PER_CPU 128UL
/* Base bits plus 1 bit for nulls marker */
@@ -188,7 +188,8 @@ static bool rht_grow_above_75(const struct rhashtable *ht,
{
/* Expand table when exceeding 75% load */
return atomic_read(&ht->nelems) > (tbl->size / 4 * 3) &&
- (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift));
+ (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift)) &&
+ (!ht->p.max_size || tbl->size < ht->p.max_size);
}
/**
@@ -201,7 +202,8 @@ static bool rht_shrink_below_30(const struct rhashtable *ht,
{
/* Shrink table beneath 30% load */
return atomic_read(&ht->nelems) < (tbl->size * 3 / 10) &&
- tbl->size > (1 << ht->p.min_shift);
+ tbl->size > (1 << ht->p.min_shift) &&
+ tbl->size > ht->p.min_size;
}
static int rhashtable_rehash_one(struct rhashtable *ht, unsigned old_hash)
@@ -872,7 +874,8 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop);
static size_t rounded_hashtable_size(struct rhashtable_params *params)
{
return max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
- 1UL << params->min_shift);
+ max(1UL << params->min_shift,
+ (unsigned long)params->min_size));
}
/**
@@ -934,6 +937,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
params->min_shift = max_t(size_t, params->min_shift,
ilog2(HASH_MIN_SIZE));
+ params->min_size = max(params->min_size, HASH_MIN_SIZE);
if (params->nelem_hint)
size = rounded_hashtable_size(params);
^ permalink raw reply related [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 2/14] rhashtable: Introduce max_size/min_size
2015-03-15 10:44 ` [v1 PATCH 2/14] rhashtable: Introduce max_size/min_size Herbert Xu
@ 2015-03-15 15:12 ` Sergei Shtylyov
2015-03-15 20:21 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: Sergei Shtylyov @ 2015-03-15 15:12 UTC (permalink / raw)
To: Herbert Xu, David Miller, tgraf, netdev, Eric Dumazet
Hello.
On 3/15/2015 1:44 PM, Herbert Xu wrote:
> This patch adds the parameters max_size and min_size which are
> meant to replace max_shift and min_shift.
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[...]
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 4a99c0c..5101e18 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
[...]
> @@ -872,7 +874,8 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop);
> static size_t rounded_hashtable_size(struct rhashtable_params *params)
> {
> return max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
> - 1UL << params->min_shift);
> + max(1UL << params->min_shift,
max_t(), perhaps?
> + (unsigned long)params->min_size));
> }
>
> /**
[...]
WBR, Sergei
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 2/14] rhashtable: Introduce max_size/min_size
2015-03-15 15:12 ` Sergei Shtylyov
@ 2015-03-15 20:21 ` Herbert Xu
0 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 20:21 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: David Miller, tgraf, netdev, Eric Dumazet
On Sun, Mar 15, 2015 at 06:12:02PM +0300, Sergei Shtylyov wrote:
> Hello.
>
> On 3/15/2015 1:44 PM, Herbert Xu wrote:
>
> >This patch adds the parameters max_size and min_size which are
> >meant to replace max_shift and min_shift.
>
> >Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> [...]
> >diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> >index 4a99c0c..5101e18 100644
> >--- a/lib/rhashtable.c
> >+++ b/lib/rhashtable.c
> [...]
> >@@ -872,7 +874,8 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop);
> > static size_t rounded_hashtable_size(struct rhashtable_params *params)
> > {
> > return max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
> >- 1UL << params->min_shift);
> >+ max(1UL << params->min_shift,
>
> max_t(), perhaps?
max works just fine. Besides, this max goes away a few patches
down.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* [v1 PATCH 3/14] netlink: Use rhashtable max_size instead of max_shift
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 2/14] rhashtable: Introduce max_size/min_size Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 4/14] tipc: " Herbert Xu
` (17 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch converts netlink to use rhashtable max_size instead
of the obsolete max_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/netlink/af_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6b0f219..d97aed6 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -3123,7 +3123,7 @@ static int __init netlink_proto_init(void)
.key_offset = offsetof(struct netlink_sock, portid),
.key_len = sizeof(u32), /* portid */
.hashfn = jhash,
- .max_shift = 16, /* 64K */
+ .max_size = 65536,
};
if (err != 0)
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v1 PATCH 4/14] tipc: Use rhashtable max_size instead of max_shift
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (2 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 3/14] netlink: Use rhashtable max_size instead of max_shift Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-15 15:13 ` Sergei Shtylyov
2015-03-15 10:44 ` [v1 PATCH 5/14] test_rhashtable: " Herbert Xu
` (16 subsequent siblings)
20 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch converts tipc to use rhashtable max_size instead of
the obsolete max_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/tipc/socket.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 934947f..64eb669 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2361,8 +2361,8 @@ int tipc_sk_rht_init(struct net *net)
.key_offset = offsetof(struct tipc_sock, portid),
.key_len = sizeof(u32), /* portid */
.hashfn = jhash,
- .max_shift = 20, /* 1M */
- .min_shift = 8, /* 256 */
+ .max_size = 1048576,
+ .min_size = 256,
};
return rhashtable_init(&tn->sk_rht, &rht_params);
^ permalink raw reply related [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 4/14] tipc: Use rhashtable max_size instead of max_shift
2015-03-15 10:44 ` [v1 PATCH 4/14] tipc: " Herbert Xu
@ 2015-03-15 15:13 ` Sergei Shtylyov
0 siblings, 0 replies; 113+ messages in thread
From: Sergei Shtylyov @ 2015-03-15 15:13 UTC (permalink / raw)
To: Herbert Xu, David Miller, tgraf, netdev, Eric Dumazet
On 3/15/2015 1:44 PM, Herbert Xu wrote:
> This patch converts tipc to use rhashtable max_size instead of
> the obsolete max_shift.
You're converting 'min_shift' to 'min_size' as well...
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> ---
> net/tipc/socket.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 934947f..64eb669 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -2361,8 +2361,8 @@ int tipc_sk_rht_init(struct net *net)
> .key_offset = offsetof(struct tipc_sock, portid),
> .key_len = sizeof(u32), /* portid */
> .hashfn = jhash,
> - .max_shift = 20, /* 1M */
> - .min_shift = 8, /* 256 */
> + .max_size = 1048576,
> + .min_size = 256,
> };
>
> return rhashtable_init(&tn->sk_rht, &rht_params);
WBR, Sergei
^ permalink raw reply [flat|nested] 113+ messages in thread
* [v1 PATCH 5/14] test_rhashtable: Use rhashtable max_size instead of max_shift
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (3 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 4/14] tipc: " Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-16 3:50 ` David Miller
2015-03-15 10:44 ` [v1 PATCH 6/14] rhashtable: Remove max_shift and min_shift Herbert Xu
` (15 subsequent siblings)
20 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch converts tets_rhashtable to use rhashtable max_size
instead of the obsolete max_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
lib/test_rhashtable.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 16974fd..2bc403d 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -201,7 +201,7 @@ static int __init test_rht_init(void)
.key_offset = offsetof(struct test_obj, value),
.key_len = sizeof(int),
.hashfn = jhash,
- .max_shift = 1, /* we expand/shrink manually here */
+ .max_size = 2, /* we expand/shrink manually here */
.nulls_base = (3U << RHT_BASE_SHIFT),
};
int err;
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v1 PATCH 6/14] rhashtable: Remove max_shift and min_shift
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (4 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 5/14] test_rhashtable: " Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare Herbert Xu
` (14 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
Now that nobody uses max_shift and min_shift, we can safely remove
them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/rhashtable.h | 4 ----
lib/rhashtable.c | 7 +------
2 files changed, 1 insertion(+), 10 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 81267fe..99425f2 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -83,8 +83,6 @@ struct rhashtable;
* @key_len: Length of key
* @key_offset: Offset of key in struct to be hashed
* @head_offset: Offset of rhash_head in struct to be hashed
- * @max_shift: Maximum number of shifts while expanding
- * @min_shift: Minimum number of shifts while shrinking
* @max_size: Maximum size while expanding
* @min_size: Minimum size while shrinking
* @nulls_base: Base value to generate nulls marker
@@ -97,8 +95,6 @@ struct rhashtable_params {
size_t key_len;
size_t key_offset;
size_t head_offset;
- size_t max_shift;
- size_t min_shift;
unsigned int max_size;
unsigned int min_size;
u32 nulls_base;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 5101e18..a74ba72 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -188,7 +188,6 @@ static bool rht_grow_above_75(const struct rhashtable *ht,
{
/* Expand table when exceeding 75% load */
return atomic_read(&ht->nelems) > (tbl->size / 4 * 3) &&
- (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift)) &&
(!ht->p.max_size || tbl->size < ht->p.max_size);
}
@@ -202,7 +201,6 @@ static bool rht_shrink_below_30(const struct rhashtable *ht,
{
/* Shrink table beneath 30% load */
return atomic_read(&ht->nelems) < (tbl->size * 3 / 10) &&
- tbl->size > (1 << ht->p.min_shift) &&
tbl->size > ht->p.min_size;
}
@@ -874,8 +872,7 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop);
static size_t rounded_hashtable_size(struct rhashtable_params *params)
{
return max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
- max(1UL << params->min_shift,
- (unsigned long)params->min_size));
+ (unsigned long)params->min_size);
}
/**
@@ -935,8 +932,6 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
if (params->nulls_base && params->nulls_base < (1U << RHT_BASE_SHIFT))
return -EINVAL;
- params->min_shift = max_t(size_t, params->min_shift,
- ilog2(HASH_MIN_SIZE));
params->min_size = max(params->min_size, HASH_MIN_SIZE);
if (params->nelem_hint)
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (5 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 6/14] rhashtable: Remove max_shift and min_shift Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-16 8:28 ` Thomas Graf
2015-03-15 10:44 ` [v1 PATCH 8/14] rhashtable: Fix support of objects with no accessible keys Herbert Xu
` (13 subsequent siblings)
20 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
The use of rhashtable_lookup_compare in nft_hash is gratuitous
since the comparison function is just doing memcmp. Furthermore,
there is cruft embedded in the comparson function that should
instead be moved into the caller of the lookup.
This patch moves that cruft over and replacces the call to
rhashtable_lookup_compare with rhashtable_lookup.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/netfilter/nft_hash.c | 40 ++++++++++------------------------------
1 file changed, 10 insertions(+), 30 deletions(-)
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index c82df0a..1fcae5e 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -89,41 +89,21 @@ static void nft_hash_remove(const struct nft_set *set,
kfree(elem->cookie);
}
-struct nft_compare_arg {
- const struct nft_set *set;
- struct nft_set_elem *elem;
-};
-
-static bool nft_hash_compare(void *ptr, void *arg)
-{
- struct nft_hash_elem *he = ptr;
- struct nft_compare_arg *x = arg;
-
- if (!nft_data_cmp(&he->key, &x->elem->key, x->set->klen)) {
- x->elem->cookie = he;
- x->elem->flags = 0;
- if (x->set->flags & NFT_SET_MAP)
- nft_data_copy(&x->elem->data, he->data);
-
- return true;
- }
-
- return false;
-}
-
static int nft_hash_get(const struct nft_set *set, struct nft_set_elem *elem)
{
struct rhashtable *priv = nft_set_priv(set);
- struct nft_compare_arg arg = {
- .set = set,
- .elem = elem,
- };
+ struct nft_hash_elem *he;
- if (rhashtable_lookup_compare(priv, &elem->key,
- &nft_hash_compare, &arg))
- return 0;
+ he = rhashtable_lookup(priv, &elem->key);
+ if (!he)
+ return -ENOENT;
- return -ENOENT;
+ elem->cookie = he;
+ elem->flags = 0;
+ if (set->flags & NFT_SET_MAP)
+ nft_data_copy(&elem->data, he->data);
+
+ return 0;
}
static void nft_hash_walk(const struct nft_ctx *ctx, const struct nft_set *set,
^ permalink raw reply related [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-15 10:44 ` [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare Herbert Xu
@ 2015-03-16 8:28 ` Thomas Graf
2015-03-16 9:14 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: Thomas Graf @ 2015-03-16 8:28 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, netdev, Eric Dumazet, kaber
On 03/15/15 at 09:44pm, Herbert Xu wrote:
> The use of rhashtable_lookup_compare in nft_hash is gratuitous
> since the comparison function is just doing memcmp. Furthermore,
> there is cruft embedded in the comparson function that should
> instead be moved into the caller of the lookup.
>
> This patch moves that cruft over and replacces the call to
> rhashtable_lookup_compare with rhashtable_lookup.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The reason for the indirection was to not bypass the abstraction
nft_data_cmp() provides.
No objection to the change but maybe leave a comment in
nft_data_cmp() that if one changes nft_data_cmp() one needs to
look at nft_hash and see if the direct use of rhashtable_lookup()
is still valid.
Copying Patrick as well.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-16 8:28 ` Thomas Graf
@ 2015-03-16 9:14 ` Herbert Xu
2015-03-16 9:28 ` Thomas Graf
` (2 more replies)
0 siblings, 3 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-16 9:14 UTC (permalink / raw)
To: Thomas Graf; +Cc: David Miller, netdev, Eric Dumazet, kaber
On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
> On 03/15/15 at 09:44pm, Herbert Xu wrote:
> > The use of rhashtable_lookup_compare in nft_hash is gratuitous
> > since the comparison function is just doing memcmp. Furthermore,
> > there is cruft embedded in the comparson function that should
> > instead be moved into the caller of the lookup.
> >
> > This patch moves that cruft over and replacces the call to
> > rhashtable_lookup_compare with rhashtable_lookup.
> >
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> The reason for the indirection was to not bypass the abstraction
> nft_data_cmp() provides.
>
> No objection to the change but maybe leave a comment in
> nft_data_cmp() that if one changes nft_data_cmp() one needs to
> look at nft_hash and see if the direct use of rhashtable_lookup()
> is still valid.
Well we could preserve nft_data_cmp if necessary. It'll just
mean an extra indirect call to do the compare when needed.
Patrick?
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-16 9:14 ` Herbert Xu
@ 2015-03-16 9:28 ` Thomas Graf
2015-03-16 11:13 ` Patrick McHardy
2015-03-20 9:36 ` Herbert Xu
2 siblings, 0 replies; 113+ messages in thread
From: Thomas Graf @ 2015-03-16 9:28 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, netdev, Eric Dumazet, kaber
On 03/16/15 at 08:14pm, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
> > On 03/15/15 at 09:44pm, Herbert Xu wrote:
> > > The use of rhashtable_lookup_compare in nft_hash is gratuitous
> > > since the comparison function is just doing memcmp. Furthermore,
> > > there is cruft embedded in the comparson function that should
> > > instead be moved into the caller of the lookup.
> > >
> > > This patch moves that cruft over and replacces the call to
> > > rhashtable_lookup_compare with rhashtable_lookup.
> > >
> > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> >
> > The reason for the indirection was to not bypass the abstraction
> > nft_data_cmp() provides.
> >
> > No objection to the change but maybe leave a comment in
> > nft_data_cmp() that if one changes nft_data_cmp() one needs to
> > look at nft_hash and see if the direct use of rhashtable_lookup()
> > is still valid.
>
> Well we could preserve nft_data_cmp if necessary. It'll just
> mean an extra indirect call to do the compare when needed.
I like Dave's idea of implementing the lookup as a macro to avoid
the redirect for both rhashtable and API users requiring their own
compare function to inline the compare and hash function:
#define rhashtable_lookup_compare(ht, obj, compare_fn, obj_hash_fn, arg)
Leaving indirect hash calls to the slow path only. This was also
the initial design of the compare lookup variantion until it
"evolved" away from that due to lack of users of that API ;-)
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-16 9:14 ` Herbert Xu
2015-03-16 9:28 ` Thomas Graf
@ 2015-03-16 11:13 ` Patrick McHardy
2015-03-20 8:55 ` Herbert Xu
2015-03-20 9:36 ` Herbert Xu
2 siblings, 1 reply; 113+ messages in thread
From: Patrick McHardy @ 2015-03-16 11:13 UTC (permalink / raw)
To: Herbert Xu; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On 16.03, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
> > On 03/15/15 at 09:44pm, Herbert Xu wrote:
> > > The use of rhashtable_lookup_compare in nft_hash is gratuitous
> > > since the comparison function is just doing memcmp. Furthermore,
> > > there is cruft embedded in the comparson function that should
> > > instead be moved into the caller of the lookup.
> > >
> > > This patch moves that cruft over and replacces the call to
> > > rhashtable_lookup_compare with rhashtable_lookup.
> > >
> > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> >
> > The reason for the indirection was to not bypass the abstraction
> > nft_data_cmp() provides.
> >
> > No objection to the change but maybe leave a comment in
> > nft_data_cmp() that if one changes nft_data_cmp() one needs to
> > look at nft_hash and see if the direct use of rhashtable_lookup()
> > is still valid.
>
> Well we could preserve nft_data_cmp if necessary. It'll just
> mean an extra indirect call to do the compare when needed.
>
> Patrick?
An upcoming patchset will add transaction support to sets, which needs
the callback. So please leave it for know since it will only cause
unnecessary churn.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-16 11:13 ` Patrick McHardy
@ 2015-03-20 8:55 ` Herbert Xu
2015-03-20 9:22 ` Patrick McHardy
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-20 8:55 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On Mon, Mar 16, 2015 at 11:13:46AM +0000, Patrick McHardy wrote:
>
> An upcoming patchset will add transaction support to sets, which needs
> the callback. So please leave it for know since it will only cause
> unnecessary churn.
Patrick, can you explain what you mean by the callback?
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 8:55 ` Herbert Xu
@ 2015-03-20 9:22 ` Patrick McHardy
2015-03-20 9:27 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: Patrick McHardy @ 2015-03-20 9:22 UTC (permalink / raw)
To: Herbert Xu; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On 20.03, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 11:13:46AM +0000, Patrick McHardy wrote:
> >
> > An upcoming patchset will add transaction support to sets, which needs
> > the callback. So please leave it for know since it will only cause
> > unnecessary churn.
>
> Patrick, can you explain what you mean by the callback?
I need the compare functions for transaction support to decide whether
an element is already activated or has been deactivated and, with a
further patch, for timeouts. Inactive and timed out elements are treated
as non-existant but are in case of transactions present in the hash
until the transaction is committed, in case of timeouts until GC.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 9:22 ` Patrick McHardy
@ 2015-03-20 9:27 ` Herbert Xu
2015-03-20 9:59 ` Patrick McHardy
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-20 9:27 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On Fri, Mar 20, 2015 at 09:22:17AM +0000, Patrick McHardy wrote:
>
> I need the compare functions for transaction support to decide whether
> an element is already activated or has been deactivated and, with a
> further patch, for timeouts. Inactive and timed out elements are treated
> as non-existant but are in case of transactions present in the hash
> until the transaction is committed, in case of timeouts until GC.
I still don't understand what is in the compare callback. Can
you provide some code example?
More importantly, why is that you can't lookup and then just do
whatever you need to do in the caller of lookup? If you're planning
on having multiple objects in the hash table with the same key then
I'm afraid I'll have to say no because we want to use the chain
length to determine whether we're being attacked and having multiple
objects with the same key defeats that mechanism.
Of course many hash table users need to be able to keep multiple
objects under the same key. My suggestion would be to allocate
your own linked list and have the linked list be the object that
is inserted into the hash table.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 9:27 ` Herbert Xu
@ 2015-03-20 9:59 ` Patrick McHardy
2015-03-20 10:16 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: Patrick McHardy @ 2015-03-20 9:59 UTC (permalink / raw)
To: Herbert Xu; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On 20.03, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 09:22:17AM +0000, Patrick McHardy wrote:
> >
> > I need the compare functions for transaction support to decide whether
> > an element is already activated or has been deactivated and, with a
> > further patch, for timeouts. Inactive and timed out elements are treated
> > as non-existant but are in case of transactions present in the hash
> > until the transaction is committed, in case of timeouts until GC.
>
> I still don't understand what is in the compare callback. Can
> you provide some code example?
Sure, for lookup / insert:
static bool nft_hash_cmp(void *ptr, void *arg)
{
struct nft_hash_cmp_arg *x = arg;
struct nft_hash_elem *he = ptr;
if (nft_data_cmp(nft_set_ext_key(&he->ext), x->key, x->set->klen))
return false;
if (nft_hash_elem_expired(he))
return false;
if (!nft_hash_is_active(he, x->genmask))
return false;
return true;
}
static bool nft_hash_lookup(const struct nft_set *set,
const struct nft_data *key,
const struct nft_set_ext **ext)
{
struct nft_hash *priv = nft_set_priv(set);
const struct nft_hash_elem *he;
struct nft_hash_cmp_arg arg = {
.genmask = nft_genmask_cur(set->net) << NFT_HASH_GEN_SHIFT,
.set = set,
.key = key,
};
he = rhashtable_lookup_compare(&priv->ht, key, &nft_hash_cmp, &arg);
if (he != NULL)
*ext = &he->ext;
return !!he;
}
static int nft_hash_insert(const struct nft_set *set,
const struct nft_set_elem *elem)
{
struct nft_hash *priv = nft_set_priv(set);
struct nft_hash_elem *he = elem->priv;
struct nft_hash_cmp_arg arg = {
.genmask = nft_genmask_next(set->net) << NFT_HASH_GEN_SHIFT,
.set = set,
.key = &elem->key,
};
nft_hash_set_inactive(set, he);
if (!rhashtable_lookup_compare_insert(&priv->ht, &he->node,
nft_hash_cmp, &arg))
return -EEXIST;
return 0;
}
> More importantly, why is that you can't lookup and then just do
> whatever you need to do in the caller of lookup? If you're planning
> on having multiple objects in the hash table with the same key then
> I'm afraid I'll have to say no because we want to use the chain
> length to determine whether we're being attacked and having multiple
> objects with the same key defeats that mechanism.
It's exactly that, there are multiple similar keys in the hash. Timed
out and inactive elements are treated as non-existant. Timeout could
be handled differently, but for transactions there is no other way
to implement this in order to provide atomic transaction semantics.
Consider f.i. the sequence:
- delete element X
- add element X
If we fail, we want the first X to be still active, otherwise the second
X becomes active and the first one inactive atomically. For this to work,
both need to be present in the hash already. Transactions can cover
an arbitrary amount of elements, so even if the case of a single similar
key could be handled otherwise, its not possible for multiple keys.
Regarding the chain length as trigger - I'm sorry, but this doesn't work
for us. I don't see why you would have to look at chain length. That
implies that you don't trust your hash function - why not fix that
instead?
> Of course many hash table users need to be able to keep multiple
> objects under the same key. My suggestion would be to allocate
> your own linked list and have the linked list be the object that
> is inserted into the hash table.
That would require a huge amount of extra memory per element and having
millions of them is not unrealistic for our use case.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 9:59 ` Patrick McHardy
@ 2015-03-20 10:16 ` Herbert Xu
2015-03-20 10:27 ` Patrick McHardy
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-20 10:16 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On Fri, Mar 20, 2015 at 09:59:09AM +0000, Patrick McHardy wrote:
>
> Regarding the chain length as trigger - I'm sorry, but this doesn't work
> for us. I don't see why you would have to look at chain length. That
> implies that you don't trust your hash function - why not fix that
> instead?
Any hash function can be attacked. That's why we need to be able
to rehash it. And the best way to decide when to rehash is based
on chain length (otherwise you'd waste time rehashing periodically
like we used to do). With name spaces these days anyone could be
an adversary.
Besides, putting multiple objects with the same key into a hash
table defeats the whole point of hashing.
> > Of course many hash table users need to be able to keep multiple
> > objects under the same key. My suggestion would be to allocate
> > your own linked list and have the linked list be the object that
> > is inserted into the hash table.
>
> That would require a huge amount of extra memory per element and having
> millions of them is not unrealistic for our use case.
You should be able to do it with just 8 extra bytes per unique
hash table key.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 10:16 ` Herbert Xu
@ 2015-03-20 10:27 ` Patrick McHardy
2015-03-20 21:47 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: Patrick McHardy @ 2015-03-20 10:27 UTC (permalink / raw)
To: Herbert Xu; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On 20.03, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 09:59:09AM +0000, Patrick McHardy wrote:
> >
> > Regarding the chain length as trigger - I'm sorry, but this doesn't work
> > for us. I don't see why you would have to look at chain length. That
> > implies that you don't trust your hash function - why not fix that
> > instead?
>
> Any hash function can be attacked. That's why we need to be able
> to rehash it. And the best way to decide when to rehash is based
> on chain length (otherwise you'd waste time rehashing periodically
> like we used to do). With name spaces these days anyone could be
> an adversary.
We already had this discussion. I strongly do not believe this is
the right way to fix namespace problems. There are millions of ways
of creating CPU intensive workloads. You need to be able to put
bounds on the entire namespace. Fixing individual spots will not
solve that problem.
> Besides, putting multiple objects with the same key into a hash
> table defeats the whole point of hashing.
They exist only for (very) short periods of time. Its simply not a
problem in our case. We could even put hard bounds on them, meaning
an element will at most exist twice during that period.
> > > Of course many hash table users need to be able to keep multiple
> > > objects under the same key. My suggestion would be to allocate
> > > your own linked list and have the linked list be the object that
> > > is inserted into the hash table.
> >
> > That would require a huge amount of extra memory per element and having
> > millions of them is not unrealistic for our use case.
>
> You should be able to do it with just 8 extra bytes per unique
> hash table key.
That's something 25% more memory usage for us in common cases. We try
very hard to keep the active memory size small. I don't want to waste
that amount of memory just for the very short periods while transactions
are unconfirmed.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 10:27 ` Patrick McHardy
@ 2015-03-20 21:47 ` Herbert Xu
2015-03-20 21:56 ` Thomas Graf
2015-03-21 5:23 ` Patrick McHardy
0 siblings, 2 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-20 21:47 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> On 20.03, Herbert Xu wrote:
>
> > Any hash function can be attacked. That's why we need to be able
> > to rehash it. And the best way to decide when to rehash is based
> > on chain length (otherwise you'd waste time rehashing periodically
> > like we used to do). With name spaces these days anyone could be
> > an adversary.
>
> We already had this discussion. I strongly do not believe this is
> the right way to fix namespace problems. There are millions of ways
> of creating CPU intensive workloads. You need to be able to put
> bounds on the entire namespace. Fixing individual spots will not
> solve that problem.
A CPU intensive workload that can be rescheduled is completely
different from one that is running under spin lock with BH disabled.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 21:47 ` Herbert Xu
@ 2015-03-20 21:56 ` Thomas Graf
2015-03-20 21:57 ` Herbert Xu
2015-03-21 5:23 ` Patrick McHardy
1 sibling, 1 reply; 113+ messages in thread
From: Thomas Graf @ 2015-03-20 21:56 UTC (permalink / raw)
To: Herbert Xu; +Cc: Patrick McHardy, David Miller, netdev, Eric Dumazet
On 03/21/15 at 08:47am, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> > On 20.03, Herbert Xu wrote:
> >
> > > Any hash function can be attacked. That's why we need to be able
> > > to rehash it. And the best way to decide when to rehash is based
> > > on chain length (otherwise you'd waste time rehashing periodically
> > > like we used to do). With name spaces these days anyone could be
> > > an adversary.
> >
> > We already had this discussion. I strongly do not believe this is
> > the right way to fix namespace problems. There are millions of ways
> > of creating CPU intensive workloads. You need to be able to put
> > bounds on the entire namespace. Fixing individual spots will not
> > solve that problem.
>
> A CPU intensive workload that can be rescheduled is completely
> different from one that is running under spin lock with BH disabled.
Just make the chain length based growth function configurable
and nft_hash can disable it. nft_hash entries are not created by
unprivileged users so attacking the table is out of the question.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 21:56 ` Thomas Graf
@ 2015-03-20 21:57 ` Herbert Xu
2015-03-20 22:07 ` Thomas Graf
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-20 21:57 UTC (permalink / raw)
To: Thomas Graf; +Cc: Patrick McHardy, David Miller, netdev, Eric Dumazet
On Fri, Mar 20, 2015 at 09:56:12PM +0000, Thomas Graf wrote:
> On 03/21/15 at 08:47am, Herbert Xu wrote:
> > On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> > > On 20.03, Herbert Xu wrote:
> > >
> > > > Any hash function can be attacked. That's why we need to be able
> > > > to rehash it. And the best way to decide when to rehash is based
> > > > on chain length (otherwise you'd waste time rehashing periodically
> > > > like we used to do). With name spaces these days anyone could be
> > > > an adversary.
> > >
> > > We already had this discussion. I strongly do not believe this is
> > > the right way to fix namespace problems. There are millions of ways
> > > of creating CPU intensive workloads. You need to be able to put
> > > bounds on the entire namespace. Fixing individual spots will not
> > > solve that problem.
> >
> > A CPU intensive workload that can be rescheduled is completely
> > different from one that is running under spin lock with BH disabled.
>
> Just make the chain length based growth function configurable
> and nft_hash can disable it. nft_hash entries are not created by
> unprivileged users so attacking the table is out of the question.
Please read the quoted text, we're talking about potential attacks
on nft_hash.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 21:57 ` Herbert Xu
@ 2015-03-20 22:07 ` Thomas Graf
2015-03-20 22:10 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: Thomas Graf @ 2015-03-20 22:07 UTC (permalink / raw)
To: Herbert Xu; +Cc: Patrick McHardy, David Miller, netdev, Eric Dumazet
On 03/21/15 at 08:57am, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 09:56:12PM +0000, Thomas Graf wrote:
> > On 03/21/15 at 08:47am, Herbert Xu wrote:
> > > On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> > > > We already had this discussion. I strongly do not believe this is
> > > > the right way to fix namespace problems. There are millions of ways
> > > > of creating CPU intensive workloads. You need to be able to put
> > > > bounds on the entire namespace. Fixing individual spots will not
> > > > solve that problem.
> > >
> > > A CPU intensive workload that can be rescheduled is completely
> > > different from one that is running under spin lock with BH disabled.
> >
> > Just make the chain length based growth function configurable
> > and nft_hash can disable it. nft_hash entries are not created by
> > unprivileged users so attacking the table is out of the question.
>
> Please read the quoted text, we're talking about potential attacks
> on nft_hash.
Attack by whom? If I read the nft_set code correctly then the only
way to add to an nft_set is via nfnetlink which requires
CAP_NET_ADMIN. My understanding was that the chain length based
growth is to counter hash seed attacks.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 22:07 ` Thomas Graf
@ 2015-03-20 22:10 ` Herbert Xu
2015-03-20 22:23 ` Thomas Graf
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-20 22:10 UTC (permalink / raw)
To: Thomas Graf; +Cc: Patrick McHardy, David Miller, netdev, Eric Dumazet
On Fri, Mar 20, 2015 at 10:07:51PM +0000, Thomas Graf wrote:
>
> Attack by whom? If I read the nft_set code correctly then the only
> way to add to an nft_set is via nfnetlink which requires
> CAP_NET_ADMIN. My understanding was that the chain length based
> growth is to counter hash seed attacks.
You cannot trust root in a namespace.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 22:10 ` Herbert Xu
@ 2015-03-20 22:23 ` Thomas Graf
2015-03-20 22:25 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: Thomas Graf @ 2015-03-20 22:23 UTC (permalink / raw)
To: Herbert Xu; +Cc: Patrick McHardy, David Miller, netdev, Eric Dumazet
On 03/21/15 at 09:10am, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 10:07:51PM +0000, Thomas Graf wrote:
> >
> > Attack by whom? If I read the nft_set code correctly then the only
> > way to add to an nft_set is via nfnetlink which requires
> > CAP_NET_ADMIN. My understanding was that the chain length based
> > growth is to counter hash seed attacks.
>
> You cannot trust root in a namespace.
He might as well just run for (;;) to burn cycles in his namespace.
If you give away virtualized local privileges you better be ready
to restrict the resources consumed.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 22:23 ` Thomas Graf
@ 2015-03-20 22:25 ` Herbert Xu
2015-03-20 22:36 ` Thomas Graf
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-20 22:25 UTC (permalink / raw)
To: Thomas Graf; +Cc: Patrick McHardy, David Miller, netdev, Eric Dumazet
On Fri, Mar 20, 2015 at 10:23:11PM +0000, Thomas Graf wrote:
>
> He might as well just run for (;;) to burn cycles in his namespace.
> If you give away virtualized local privileges you better be ready
> to restrict the resources consumed.
Please reread the first email that you replied to, let me quote:
A CPU intensive workload that can be rescheduled is
completely different from one that is running under spin
lock with BH disabled.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 22:25 ` Herbert Xu
@ 2015-03-20 22:36 ` Thomas Graf
2015-03-21 5:25 ` Patrick McHardy
0 siblings, 1 reply; 113+ messages in thread
From: Thomas Graf @ 2015-03-20 22:36 UTC (permalink / raw)
To: Herbert Xu; +Cc: Patrick McHardy, David Miller, netdev, Eric Dumazet
On 03/21/15 at 09:25am, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 10:23:11PM +0000, Thomas Graf wrote:
> >
> > He might as well just run for (;;) to burn cycles in his namespace.
> > If you give away virtualized local privileges you better be ready
> > to restrict the resources consumed.
>
> Please reread the first email that you replied to, let me quote:
>
> A CPU intensive workload that can be rescheduled is
> completely different from one that is running under spin
> lock with BH disabled.
We have countless ways to create linear list of things like classifiers,
qdiscs, multicast memberships, net_devices, fib_rules, etc. All taking
spin locks or write locks. Most of them with BH disabled. Some at
least use hashtables with most of them fixed size.
I don't want to downplay this but do you *really* want to run
untrusted workloads with CAP_NET_ADMIN privileges?
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 22:36 ` Thomas Graf
@ 2015-03-21 5:25 ` Patrick McHardy
0 siblings, 0 replies; 113+ messages in thread
From: Patrick McHardy @ 2015-03-21 5:25 UTC (permalink / raw)
To: Thomas Graf; +Cc: Herbert Xu, David Miller, netdev, Eric Dumazet
On 20.03, Thomas Graf wrote:
> On 03/21/15 at 09:25am, Herbert Xu wrote:
> > On Fri, Mar 20, 2015 at 10:23:11PM +0000, Thomas Graf wrote:
> > >
> > > He might as well just run for (;;) to burn cycles in his namespace.
> > > If you give away virtualized local privileges you better be ready
> > > to restrict the resources consumed.
> >
> > Please reread the first email that you replied to, let me quote:
> >
> > A CPU intensive workload that can be rescheduled is
> > completely different from one that is running under spin
> > lock with BH disabled.
>
> We have countless ways to create linear list of things like classifiers,
> qdiscs, multicast memberships, net_devices, fib_rules, etc. All taking
> spin locks or write locks. Most of them with BH disabled. Some at
> least use hashtables with most of them fixed size.
That's my point. Its impossible to fix this by restricting data
structures, this just removes a valid use case.
> I don't want to downplay this but do you *really* want to run
> untrusted workloads with CAP_NET_ADMIN privileges?
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 21:47 ` Herbert Xu
2015-03-20 21:56 ` Thomas Graf
@ 2015-03-21 5:23 ` Patrick McHardy
1 sibling, 0 replies; 113+ messages in thread
From: Patrick McHardy @ 2015-03-21 5:23 UTC (permalink / raw)
To: Herbert Xu; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On 21.03, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> > On 20.03, Herbert Xu wrote:
> >
> > > Any hash function can be attacked. That's why we need to be able
> > > to rehash it. And the best way to decide when to rehash is based
> > > on chain length (otherwise you'd waste time rehashing periodically
> > > like we used to do). With name spaces these days anyone could be
> > > an adversary.
> >
> > We already had this discussion. I strongly do not believe this is
> > the right way to fix namespace problems. There are millions of ways
> > of creating CPU intensive workloads. You need to be able to put
> > bounds on the entire namespace. Fixing individual spots will not
> > solve that problem.
>
> A CPU intensive workload that can be rescheduled is completely
> different from one that is running under spin lock with BH disabled.
Sure, that's what I actually meant of course.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-16 9:14 ` Herbert Xu
2015-03-16 9:28 ` Thomas Graf
2015-03-16 11:13 ` Patrick McHardy
@ 2015-03-20 9:36 ` Herbert Xu
2015-03-20 10:02 ` Patrick McHardy
2 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-20 9:36 UTC (permalink / raw)
To: Thomas Graf; +Cc: David Miller, netdev, Eric Dumazet, kaber
On Mon, Mar 16, 2015 at 08:14:15PM +1100, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
>
> > The reason for the indirection was to not bypass the abstraction
> > nft_data_cmp() provides.
> >
> > No objection to the change but maybe leave a comment in
> > nft_data_cmp() that if one changes nft_data_cmp() one needs to
> > look at nft_hash and see if the direct use of rhashtable_lookup()
> > is still valid.
>
> Well we could preserve nft_data_cmp if necessary. It'll just
> mean an extra indirect call to do the compare when needed.
FWIW I've decided to ditch nft_data_cmp unless someone can show
me that it's really necessary. The reason is that nft_hash_lookup
already uses memcmp instead of nft_data_cmp.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare
2015-03-20 9:36 ` Herbert Xu
@ 2015-03-20 10:02 ` Patrick McHardy
0 siblings, 0 replies; 113+ messages in thread
From: Patrick McHardy @ 2015-03-20 10:02 UTC (permalink / raw)
To: Herbert Xu; +Cc: Thomas Graf, David Miller, netdev, Eric Dumazet
On 20.03, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 08:14:15PM +1100, Herbert Xu wrote:
> > On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
> >
> > > The reason for the indirection was to not bypass the abstraction
> > > nft_data_cmp() provides.
> > >
> > > No objection to the change but maybe leave a comment in
> > > nft_data_cmp() that if one changes nft_data_cmp() one needs to
> > > look at nft_hash and see if the direct use of rhashtable_lookup()
> > > is still valid.
> >
> > Well we could preserve nft_data_cmp if necessary. It'll just
> > mean an extra indirect call to do the compare when needed.
>
> FWIW I've decided to ditch nft_data_cmp unless someone can show
> me that it's really necessary. The reason is that nft_hash_lookup
> already uses memcmp instead of nft_data_cmp.
I don't care about nft_data_cmp, but I do care about being able to
override the compare function.
^ permalink raw reply [flat|nested] 113+ messages in thread
* [v1 PATCH 8/14] rhashtable: Fix support of objects with no accessible keys
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (6 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 7/14] netfilter: Use rhashtable_lookup instead of lookup_compare Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 9/14] netlink: Move namespace into hash key Herbert Xu
` (12 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
The current support of so-called (a misnomer actually) vairable-
length keys is broken. I found out about this after trying to
actually use it for netlink by incorporating the namespace into
the key.
The crux of the matter is that all the code that is needed to
make it work (such as netlink_lookup_insert_compare) only works
when key_len != 0. However, the object hash function is only
used when key_len == 0.
In fact key_len should always be non-zero since even for these
objects with inaccessible keys, we need to use a key to perform
a table lookup.
This patch fixes this by directly using the object hash function
as the indicator of whether the key is accessible or not. It
also adds a new function obj_cmpfn to compare a key against an
object. This means that the caller no longer needs to supply
explicit compare functions.
All this is done in a backwards compatible manner so we can rip
out the existing users safely after this patch.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/rhashtable.h | 5 +
lib/rhashtable.c | 158 +++++++++++++++++++++++++++++++++++++++------
2 files changed, 143 insertions(+), 20 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 99425f2..de7ac49 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -74,6 +74,7 @@ struct bucket_table {
typedef u32 (*rht_hashfn_t)(const void *data, u32 len, u32 seed);
typedef u32 (*rht_obj_hashfn_t)(const void *data, u32 seed);
+typedef bool (*rht_obj_cmpfn_t)(const void *key, const void *obj);
struct rhashtable;
@@ -89,6 +90,7 @@ struct rhashtable;
* @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
* @hashfn: Function to hash key
* @obj_hashfn: Function to hash object
+ * @obj_cmpfn: Function to compare key with object
*/
struct rhashtable_params {
size_t nelem_hint;
@@ -101,6 +103,7 @@ struct rhashtable_params {
size_t locks_mul;
rht_hashfn_t hashfn;
rht_obj_hashfn_t obj_hashfn;
+ rht_obj_cmpfn_t obj_cmpfn;
};
/**
@@ -198,6 +201,8 @@ bool rhashtable_lookup_compare_insert(struct rhashtable *ht,
struct rhash_head *obj,
bool (*compare)(void *, void *),
void *arg);
+int rhashtable_lookup_insert_key(struct rhashtable *ht,
+ const void *key, struct rhash_head *obj);
int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter);
void rhashtable_walk_exit(struct rhashtable_iter *iter);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index a74ba72..8b3cae3 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -74,7 +74,7 @@ static u32 head_hashfn(struct rhashtable *ht,
{
const char *ptr = rht_obj(ht, he);
- return likely(ht->p.key_len) ?
+ return likely(!ht->p.obj_hashfn) ?
key_hashfn(ht, tbl, ptr + ht->p.key_offset) :
rht_bucket_index(tbl, ht->p.obj_hashfn(ptr, tbl->hash_rnd));
}
@@ -376,6 +376,69 @@ unlock:
mutex_unlock(&ht->mutex);
}
+static int __rhashtable_lookup_insert_key(struct rhashtable *ht,
+ const void *key,
+ struct rhash_head *obj)
+{
+ struct bucket_table *tbl, *old_tbl;
+ struct rhash_head *head;
+ bool no_resize_running;
+ unsigned hash;
+ int err = -EEXIST;
+
+ rcu_read_lock();
+
+ old_tbl = rht_dereference_rcu(ht->tbl, ht);
+ hash = head_hashfn(ht, old_tbl, obj);
+
+ spin_lock_bh(bucket_lock(old_tbl, hash));
+
+ /* Because we have already taken the bucket lock in old_tbl,
+ * if we find that future_tbl is not yet visible then that
+ * guarantees all other insertions of the same entry will
+ * also grab the bucket lock in old_tbl because until the
+ * rehash completes ht->tbl won't be changed.
+ */
+ tbl = rht_dereference_rcu(old_tbl->future_tbl, ht) ?: old_tbl;
+ if (tbl != old_tbl) {
+ hash = head_hashfn(ht, tbl, obj);
+ spin_lock_nested(bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING);
+ }
+
+ if (key && rhashtable_lookup(ht, key))
+ goto exit;
+
+ err = 0;
+
+ no_resize_running = tbl == old_tbl;
+
+ head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
+
+ if (rht_is_a_nulls(head))
+ INIT_RHT_NULLS_HEAD(obj->next, ht, hash);
+ else
+ RCU_INIT_POINTER(obj->next, head);
+
+ rcu_assign_pointer(tbl->buckets[hash], obj);
+
+ atomic_inc(&ht->nelems);
+ if (no_resize_running && rht_grow_above_75(ht, tbl))
+ schedule_work(&ht->run_work);
+
+exit:
+ if (tbl != old_tbl) {
+ hash = head_hashfn(ht, tbl, obj);
+ spin_unlock(bucket_lock(tbl, hash));
+ }
+
+ hash = head_hashfn(ht, old_tbl, obj);
+ spin_unlock_bh(bucket_lock(old_tbl, hash));
+
+ rcu_read_unlock();
+
+ return err;
+}
+
static bool __rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj,
bool (*compare)(void *, void *), void *arg)
{
@@ -457,7 +520,9 @@ exit:
*/
void rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj)
{
- __rhashtable_insert(ht, obj, NULL, NULL);
+ BUG_ON(ht->p.obj_hashfn);
+
+ __rhashtable_lookup_insert_key(ht, NULL, obj);
}
EXPORT_SYMBOL_GPL(rhashtable_insert);
@@ -543,9 +608,9 @@ struct rhashtable_compare_arg {
const void *key;
};
-static bool rhashtable_compare(void *ptr, void *arg)
+static bool rhashtable_compare(const void *arg, const void *ptr)
{
- struct rhashtable_compare_arg *x = arg;
+ const struct rhashtable_compare_arg *x = arg;
struct rhashtable *ht = x->ht;
return !memcmp(ptr + ht->p.key_offset, x->key, ht->p.key_len);
@@ -559,9 +624,6 @@ static bool rhashtable_compare(void *ptr, void *arg)
* Computes the hash value for the key and traverses the bucket chain looking
* for a entry with an identical key. The first matching entry is returned.
*
- * This lookup function may only be used for fixed key hash table (key_len
- * parameter set). It will BUG() if used inappropriately.
- *
* Lookups may occur in parallel with hashtable mutations and resizing.
*/
void *rhashtable_lookup(struct rhashtable *ht, const void *key)
@@ -570,10 +632,41 @@ void *rhashtable_lookup(struct rhashtable *ht, const void *key)
.ht = ht,
.key = key,
};
+ const struct bucket_table *tbl;
+ rht_obj_cmpfn_t compare;
+ const void *compare_key;
+ struct rhash_head *he;
+ u32 hash;
+
+ if (ht->p.obj_cmpfn) {
+ compare = ht->p.obj_cmpfn;
+ compare_key = key;
+ } else {
+ compare = rhashtable_compare;
+ compare_key = &arg;
+ }
+
+ rcu_read_lock();
+
+ tbl = rht_dereference_rcu(ht->tbl, ht);
+restart:
+ hash = key_hashfn(ht, tbl, key);
+ rht_for_each_rcu(he, tbl, hash) {
+ if (!compare(compare_key, rht_obj(ht, he)))
+ continue;
+ rcu_read_unlock();
+ return rht_obj(ht, he);
+ }
- BUG_ON(!ht->p.key_len);
+ /* Ensure we see any new tables. */
+ smp_rmb();
+
+ tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+ if (unlikely(tbl))
+ goto restart;
+ rcu_read_unlock();
- return rhashtable_lookup_compare(ht, key, &rhashtable_compare, &arg);
+ return NULL;
}
EXPORT_SYMBOL_GPL(rhashtable_lookup);
@@ -644,15 +737,12 @@ EXPORT_SYMBOL_GPL(rhashtable_lookup_compare);
*/
bool rhashtable_lookup_insert(struct rhashtable *ht, struct rhash_head *obj)
{
- struct rhashtable_compare_arg arg = {
- .ht = ht,
- .key = rht_obj(ht, obj) + ht->p.key_offset,
- };
+ const char *key = rht_obj(ht, obj);
- BUG_ON(!ht->p.key_len);
+ BUG_ON(ht->p.obj_hashfn);
+
+ return !__rhashtable_lookup_insert_key(ht, key + ht->p.key_offset, obj);
- return rhashtable_lookup_compare_insert(ht, obj, &rhashtable_compare,
- &arg);
}
EXPORT_SYMBOL_GPL(rhashtable_lookup_insert);
@@ -681,13 +771,41 @@ bool rhashtable_lookup_compare_insert(struct rhashtable *ht,
bool (*compare)(void *, void *),
void *arg)
{
- BUG_ON(!ht->p.key_len);
-
return __rhashtable_insert(ht, obj, compare, arg);
}
EXPORT_SYMBOL_GPL(rhashtable_lookup_compare_insert);
/**
+ * rhashtable_lookup_insert_key - search and insert object to hash table
+ * with explicit key
+ * @ht: hash table
+ * @key: key
+ * @obj: pointer to hash head inside object
+ *
+ * Locks down the bucket chain in both the old and new table if a resize
+ * is in progress to ensure that writers can't remove from the old table
+ * and can't insert to the new table during the atomic operation of search
+ * and insertion. Searches for duplicates in both the old and new table if
+ * a resize is in progress.
+ *
+ * Lookups may occur in parallel with hashtable mutations and resizing.
+ *
+ * Will trigger an automatic deferred table resizing if the size grows
+ * beyond the watermark indicated by grow_decision() which can be passed
+ * to rhashtable_init().
+ *
+ * Returns zero on success.
+ */
+int rhashtable_lookup_insert_key(struct rhashtable *ht,
+ const void *key, struct rhash_head *obj)
+{
+ BUG_ON(!ht->p.obj_hashfn || !key);
+
+ return __rhashtable_lookup_insert_key(ht, key, obj);
+}
+EXPORT_SYMBOL_GPL(rhashtable_lookup_insert_key);
+
+/**
* rhashtable_walk_init - Initialise an iterator
* @ht: Table to walk over
* @iter: Hash table Iterator
@@ -925,8 +1043,8 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
size = HASH_DEFAULT_SIZE;
- if ((params->key_len && !params->hashfn) ||
- (!params->key_len && !params->obj_hashfn))
+ if (!params->hashfn || !params->key_len ||
+ (params->obj_hashfn && !params->obj_cmpfn))
return -EINVAL;
if (params->nulls_base && params->nulls_base < (1U << RHT_BASE_SHIFT))
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v1 PATCH 9/14] netlink: Move namespace into hash key
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (7 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 8/14] rhashtable: Fix support of objects with no accessible keys Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 10/14] rhashtable: Rip out obsolete compare interface Herbert Xu
` (11 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
Currently the name space is a de facto key because it has to match
before we find an object in the hash table. However, it isn't in
the hash value so all objects from different name spaces with the
same port ID hash to the same bucket.
This is bad as the number of name spaces is unbounded.
This patch fixes this by using the namespace when doing the hash.
Because the namespace field doesn't lie next to the portid field
in the netlink socket, this patch switches over to the rhashtable
interface without a fixed key.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/netlink/af_netlink.c | 66 +++++++++++++++++++++++++++++------------------
1 file changed, 41 insertions(+), 25 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index d97aed6..ca048f3 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -970,41 +970,46 @@ netlink_unlock_table(void)
struct netlink_compare_arg
{
- struct net *net;
+ possible_net_t pnet;
u32 portid;
+ char trailer[];
};
-static bool netlink_compare(void *ptr, void *arg)
+#define netlink_compare_arg_len offsetof(struct netlink_compare_arg, trailer)
+
+static bool netlink_compare(const void *arg, const void *ptr)
{
- struct netlink_compare_arg *x = arg;
- struct sock *sk = ptr;
+ const struct netlink_compare_arg *x = arg;
+ const struct netlink_sock *nlk = ptr;
- return nlk_sk(sk)->portid == x->portid &&
- net_eq(sock_net(sk), x->net);
+ return nlk->portid == x->portid &&
+ net_eq(sock_net(&nlk->sk), read_pnet(&x->pnet));
+}
+
+static void netlink_compare_arg_init(struct netlink_compare_arg *arg,
+ struct net *net, u32 portid)
+{
+ memset(arg, 0, sizeof(*arg));
+ write_pnet(&arg->pnet, net);
+ arg->portid = portid;
}
static struct sock *__netlink_lookup(struct netlink_table *table, u32 portid,
struct net *net)
{
- struct netlink_compare_arg arg = {
- .net = net,
- .portid = portid,
- };
+ struct netlink_compare_arg arg;
- return rhashtable_lookup_compare(&table->hash, &portid,
- &netlink_compare, &arg);
+ netlink_compare_arg_init(&arg, net, portid);
+ return rhashtable_lookup(&table->hash, &arg);
}
-static bool __netlink_insert(struct netlink_table *table, struct sock *sk)
+static int __netlink_insert(struct netlink_table *table, struct sock *sk)
{
- struct netlink_compare_arg arg = {
- .net = sock_net(sk),
- .portid = nlk_sk(sk)->portid,
- };
+ struct netlink_compare_arg arg;
- return rhashtable_lookup_compare_insert(&table->hash,
- &nlk_sk(sk)->node,
- &netlink_compare, &arg);
+ netlink_compare_arg_init(&arg, sock_net(sk), nlk_sk(sk)->portid);
+ return rhashtable_lookup_insert_key(&table->hash, &arg,
+ &nlk_sk(sk)->node);
}
static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
@@ -1066,9 +1071,10 @@ static int netlink_insert(struct sock *sk, u32 portid)
nlk_sk(sk)->portid = portid;
sock_hold(sk);
- err = 0;
- if (!__netlink_insert(table, sk)) {
- err = -EADDRINUSE;
+ err = __netlink_insert(table, sk);
+ if (err) {
+ if (err == -EEXIST)
+ err = -EADDRINUSE;
sock_put(sk);
}
@@ -3114,15 +3120,25 @@ static struct pernet_operations __net_initdata netlink_net_ops = {
.exit = netlink_net_exit,
};
+static u32 netlink_hash(const void *data, u32 seed)
+{
+ const struct netlink_sock *nlk = data;
+ struct netlink_compare_arg arg;
+
+ netlink_compare_arg_init(&arg, sock_net(&nlk->sk), nlk->portid);
+ return jhash(&arg, netlink_compare_arg_len, seed);
+}
+
static int __init netlink_proto_init(void)
{
int i;
int err = proto_register(&netlink_proto, 0);
struct rhashtable_params ht_params = {
.head_offset = offsetof(struct netlink_sock, node),
- .key_offset = offsetof(struct netlink_sock, portid),
- .key_len = sizeof(u32), /* portid */
+ .key_len = netlink_compare_arg_len,
.hashfn = jhash,
+ .obj_hashfn = netlink_hash,
+ .obj_cmpfn = netlink_compare,
.max_size = 65536,
};
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v1 PATCH 10/14] rhashtable: Rip out obsolete compare interface
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (8 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 9/14] netlink: Move namespace into hash key Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-16 9:35 ` Thomas Graf
2015-03-15 10:44 ` [v1 PATCH 11/14] rhashtable: Allow hashfn to be unset Herbert Xu
` (10 subsequent siblings)
20 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
Now that the only user of rhashtable_lookup_compare_insert and
rhashtable_lookup_compare (i.e., netlink) has switched over to
the new obj_hashfn based interface, we can rip them out safely.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/rhashtable.h | 6 -
lib/rhashtable.c | 143 +--------------------------------------------
2 files changed, 4 insertions(+), 145 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index de7ac49..00fe294 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -193,14 +193,8 @@ int rhashtable_expand(struct rhashtable *ht);
int rhashtable_shrink(struct rhashtable *ht);
void *rhashtable_lookup(struct rhashtable *ht, const void *key);
-void *rhashtable_lookup_compare(struct rhashtable *ht, const void *key,
- bool (*compare)(void *, void *), void *arg);
bool rhashtable_lookup_insert(struct rhashtable *ht, struct rhash_head *obj);
-bool rhashtable_lookup_compare_insert(struct rhashtable *ht,
- struct rhash_head *obj,
- bool (*compare)(void *, void *),
- void *arg);
int rhashtable_lookup_insert_key(struct rhashtable *ht,
const void *key, struct rhash_head *obj);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 8b3cae3..e33e7b9 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -1,13 +1,16 @@
/*
* Resizable, Scalable, Concurrent Hash Table
*
+ * Copyright (c) 2015 Herbert Xu <herbert@gondor.apana.org.au>
* Copyright (c) 2014-2015 Thomas Graf <tgraf@suug.ch>
* Copyright (c) 2008-2014 Patrick McHardy <kaber@trash.net>
*
* Based on the following paper:
* https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf
*
- * Code partially derived from nft_hash
+ * Code originally partially derived from nft_hash
+ * Rewritten with rehash code from br_multicast plus single list
+ * pointer as suggested by Josh Triplett
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
@@ -439,70 +442,6 @@ exit:
return err;
}
-static bool __rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj,
- bool (*compare)(void *, void *), void *arg)
-{
- struct bucket_table *tbl, *old_tbl;
- struct rhash_head *head;
- bool no_resize_running;
- unsigned hash;
- bool success = true;
-
- rcu_read_lock();
-
- old_tbl = rht_dereference_rcu(ht->tbl, ht);
- hash = head_hashfn(ht, old_tbl, obj);
-
- spin_lock_bh(bucket_lock(old_tbl, hash));
-
- /* Because we have already taken the bucket lock in old_tbl,
- * if we find that future_tbl is not yet visible then that
- * guarantees all other insertions of the same entry will
- * also grab the bucket lock in old_tbl because until the
- * rehash completes ht->tbl won't be changed.
- */
- tbl = rht_dereference_rcu(old_tbl->future_tbl, ht) ?: old_tbl;
- if (tbl != old_tbl) {
- hash = head_hashfn(ht, tbl, obj);
- spin_lock_nested(bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING);
- }
-
- if (compare &&
- rhashtable_lookup_compare(ht, rht_obj(ht, obj) + ht->p.key_offset,
- compare, arg)) {
- success = false;
- goto exit;
- }
-
- no_resize_running = tbl == old_tbl;
-
- head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
-
- if (rht_is_a_nulls(head))
- INIT_RHT_NULLS_HEAD(obj->next, ht, hash);
- else
- RCU_INIT_POINTER(obj->next, head);
-
- rcu_assign_pointer(tbl->buckets[hash], obj);
-
- atomic_inc(&ht->nelems);
- if (no_resize_running && rht_grow_above_75(ht, tbl))
- schedule_work(&ht->run_work);
-
-exit:
- if (tbl != old_tbl) {
- hash = head_hashfn(ht, tbl, obj);
- spin_unlock(bucket_lock(tbl, hash));
- }
-
- hash = head_hashfn(ht, old_tbl, obj);
- spin_unlock_bh(bucket_lock(old_tbl, hash));
-
- rcu_read_unlock();
-
- return success;
-}
-
/**
* rhashtable_insert - insert object into hash table
* @ht: hash table
@@ -671,51 +610,6 @@ restart:
EXPORT_SYMBOL_GPL(rhashtable_lookup);
/**
- * rhashtable_lookup_compare - search hash table with compare function
- * @ht: hash table
- * @key: the pointer to the key
- * @compare: compare function, must return true on match
- * @arg: argument passed on to compare function
- *
- * Traverses the bucket chain behind the provided hash value and calls the
- * specified compare function for each entry.
- *
- * Lookups may occur in parallel with hashtable mutations and resizing.
- *
- * Returns the first entry on which the compare function returned true.
- */
-void *rhashtable_lookup_compare(struct rhashtable *ht, const void *key,
- bool (*compare)(void *, void *), void *arg)
-{
- const struct bucket_table *tbl;
- struct rhash_head *he;
- u32 hash;
-
- rcu_read_lock();
-
- tbl = rht_dereference_rcu(ht->tbl, ht);
-restart:
- hash = key_hashfn(ht, tbl, key);
- rht_for_each_rcu(he, tbl, hash) {
- if (!compare(rht_obj(ht, he), arg))
- continue;
- rcu_read_unlock();
- return rht_obj(ht, he);
- }
-
- /* Ensure we see any new tables. */
- smp_rmb();
-
- tbl = rht_dereference_rcu(tbl->future_tbl, ht);
- if (unlikely(tbl))
- goto restart;
- rcu_read_unlock();
-
- return NULL;
-}
-EXPORT_SYMBOL_GPL(rhashtable_lookup_compare);
-
-/**
* rhashtable_lookup_insert - lookup and insert object into hash table
* @ht: hash table
* @obj: pointer to hash head inside object
@@ -747,35 +641,6 @@ bool rhashtable_lookup_insert(struct rhashtable *ht, struct rhash_head *obj)
EXPORT_SYMBOL_GPL(rhashtable_lookup_insert);
/**
- * rhashtable_lookup_compare_insert - search and insert object to hash table
- * with compare function
- * @ht: hash table
- * @obj: pointer to hash head inside object
- * @compare: compare function, must return true on match
- * @arg: argument passed on to compare function
- *
- * Locks down the bucket chain in both the old and new table if a resize
- * is in progress to ensure that writers can't remove from the old table
- * and can't insert to the new table during the atomic operation of search
- * and insertion. Searches for duplicates in both the old and new table if
- * a resize is in progress.
- *
- * Lookups may occur in parallel with hashtable mutations and resizing.
- *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
- */
-bool rhashtable_lookup_compare_insert(struct rhashtable *ht,
- struct rhash_head *obj,
- bool (*compare)(void *, void *),
- void *arg)
-{
- return __rhashtable_insert(ht, obj, compare, arg);
-}
-EXPORT_SYMBOL_GPL(rhashtable_lookup_compare_insert);
-
-/**
* rhashtable_lookup_insert_key - search and insert object to hash table
* with explicit key
* @ht: hash table
^ permalink raw reply related [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 10/14] rhashtable: Rip out obsolete compare interface
2015-03-15 10:44 ` [v1 PATCH 10/14] rhashtable: Rip out obsolete compare interface Herbert Xu
@ 2015-03-16 9:35 ` Thomas Graf
0 siblings, 0 replies; 113+ messages in thread
From: Thomas Graf @ 2015-03-16 9:35 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, netdev, Eric Dumazet
On 03/15/15 at 09:44pm, Herbert Xu wrote:
> Now that the only user of rhashtable_lookup_compare_insert and
> rhashtable_lookup_compare (i.e., netlink) has switched over to
> the new obj_hashfn based interface, we can rip them out safely.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As pointed out in the previous patch, I think we should keep this
API but convert it to a macro for inlining. rhashtable_lookup() would
then just be a user of this macro.
It might even be worth to hardcode rhashtable_lookup() to jhash
and require users which require a different hash function to
use rhashtable_lookup_compare().
^ permalink raw reply [flat|nested] 113+ messages in thread
* [v1 PATCH 11/14] rhashtable: Allow hashfn to be unset
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (9 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 10/14] rhashtable: Rip out obsolete compare interface Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 12/14] netlink: Use default rhashtable hashfn Herbert Xu
` (9 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.
This function provides a solution to this problem by allowing
hashfn to be unset. In which case rhashtable will automatically
set it to jhash. Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/rhashtable.h | 2 ++
lib/rhashtable.c | 19 +++++++++++++++++--
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 00fe294..295580e 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -110,6 +110,7 @@ struct rhashtable_params {
* struct rhashtable - Hash table handle
* @tbl: Bucket table
* @nelems: Number of elements in table
+ * @key_len: Key length for hashfn
* @p: Configuration parameters
* @run_work: Deferred worker to expand/shrink asynchronously
* @mutex: Mutex to protect current/future table swapping
@@ -119,6 +120,7 @@ struct rhashtable {
struct bucket_table __rcu *tbl;
atomic_t nelems;
bool being_destroyed;
+ unsigned int hashfn_key_len;
struct rhashtable_params p;
struct work_struct run_work;
struct mutex mutex;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index e33e7b9..a2dc855 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -67,7 +67,7 @@ static u32 rht_bucket_index(const struct bucket_table *tbl, u32 hash)
static u32 key_hashfn(struct rhashtable *ht, const struct bucket_table *tbl,
const void *key)
{
- return rht_bucket_index(tbl, ht->p.hashfn(key, ht->p.key_len,
+ return rht_bucket_index(tbl, ht->p.hashfn(key, ht->hashfn_key_len,
tbl->hash_rnd));
}
@@ -858,6 +858,11 @@ static size_t rounded_hashtable_size(struct rhashtable_params *params)
(unsigned long)params->min_size);
}
+static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
+{
+ return jhash2(key, length, seed);
+}
+
/**
* rhashtable_init - initialize a new hash table
* @ht: hash table to be initialized
@@ -908,7 +913,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
size = HASH_DEFAULT_SIZE;
- if (!params->hashfn || !params->key_len ||
+ if (!params->key_len ||
(params->obj_hashfn && !params->obj_cmpfn))
return -EINVAL;
@@ -924,6 +929,16 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
mutex_init(&ht->mutex);
memcpy(&ht->p, params, sizeof(*params));
+ ht->hashfn_key_len = ht->p.key_len;
+ if (!params->hashfn) {
+ ht->p.hashfn = jhash;
+
+ if (!(ht->hashfn_key_len & (sizeof(u32) - 1))) {
+ ht->hashfn_key_len /= sizeof(u32);
+ ht->p.hashfn = rhashtable_jhash2;
+ }
+ }
+
if (params->locks_mul)
ht->p.locks_mul = roundup_pow_of_two(params->locks_mul);
else
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v1 PATCH 12/14] netlink: Use default rhashtable hashfn
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (10 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 11/14] rhashtable: Allow hashfn to be unset Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 13/14] tipc: " Herbert Xu
` (8 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch removes the explicit jhash value for the hashfn parameter
of rhashtable. The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/netlink/af_netlink.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ca048f3..f521f35 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -3126,7 +3126,7 @@ static u32 netlink_hash(const void *data, u32 seed)
struct netlink_compare_arg arg;
netlink_compare_arg_init(&arg, sock_net(&nlk->sk), nlk->portid);
- return jhash(&arg, netlink_compare_arg_len, seed);
+ return jhash2((u32 *)&arg, netlink_compare_arg_len / sizeof(u32), seed);
}
static int __init netlink_proto_init(void)
@@ -3136,7 +3136,6 @@ static int __init netlink_proto_init(void)
struct rhashtable_params ht_params = {
.head_offset = offsetof(struct netlink_sock, node),
.key_len = netlink_compare_arg_len,
- .hashfn = jhash,
.obj_hashfn = netlink_hash,
.obj_cmpfn = netlink_compare,
.max_size = 65536,
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v1 PATCH 13/14] tipc: Use default rhashtable hashfn
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (11 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 12/14] netlink: Use default rhashtable hashfn Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-15 10:44 ` [v1 PATCH 14/14] netfilter: " Herbert Xu
` (7 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch removes the explicit jhash value for the hashfn parameter
of rhashtable. The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/tipc/socket.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 64eb669..f462498 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -35,7 +35,6 @@
*/
#include <linux/rhashtable.h>
-#include <linux/jhash.h>
#include "core.h"
#include "name_table.h"
#include "node.h"
@@ -2360,7 +2359,6 @@ int tipc_sk_rht_init(struct net *net)
.head_offset = offsetof(struct tipc_sock, node),
.key_offset = offsetof(struct tipc_sock, portid),
.key_len = sizeof(u32), /* portid */
- .hashfn = jhash,
.max_size = 1048576,
.min_size = 256,
};
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v1 PATCH 14/14] netfilter: Use default rhashtable hashfn
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (12 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 13/14] tipc: " Herbert Xu
@ 2015-03-15 10:44 ` Herbert Xu
2015-03-16 4:01 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash David Miller
` (6 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-15 10:44 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch removes the explicit jhash value for the hashfn parameter
of rhashtable. The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/netfilter/nft_hash.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 1fcae5e..397bdd3 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -13,7 +13,6 @@
#include <linux/module.h>
#include <linux/list.h>
#include <linux/log2.h>
-#include <linux/jhash.h>
#include <linux/netlink.h>
#include <linux/rhashtable.h>
#include <linux/netfilter.h>
@@ -171,7 +170,6 @@ static int nft_hash_init(const struct nft_set *set,
.head_offset = offsetof(struct nft_hash_elem, node),
.key_offset = offsetof(struct nft_hash_elem, key),
.key_len = set->klen,
- .hashfn = jhash,
};
return rhashtable_init(priv, ¶ms);
^ permalink raw reply related [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (13 preceding siblings ...)
2015-03-15 10:44 ` [v1 PATCH 14/14] netfilter: " Herbert Xu
@ 2015-03-16 4:01 ` David Miller
2015-03-16 4:18 ` Herbert Xu
2015-03-18 9:01 ` [v2 PATCH 1/6] rhashtable: Remove shift from bucket_table Herbert Xu
` (5 subsequent siblings)
20 siblings, 1 reply; 113+ messages in thread
From: David Miller @ 2015-03-16 4:01 UTC (permalink / raw)
To: herbert; +Cc: tgraf, netdev, eric.dumazet
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 15 Mar 2015 21:43:06 +1100
> PS I'd love to kill the indirect call on the hash but I think
> you guys need something other than jhash for sockets. Presumably
> they are not exposed to untrusted parties. But I really wonder
> whether that is still the case given things like namespaces where
> even root may be untrusted.
In my opinion the biggest architectural fault of rhashtables is
these damn callbacks, look at the assembler for a simple hash
lookup, it's disgusting.
Two callbacks, _two_.
That also means the hash lookup can't be a leaf function, and we take
two mispredicted indirect calls as well. Not acceptable.
This is pure overhead from overengineering in my opinion and I'd much
rather have code duplication than this.
We should have rhashtable_lookup_compare() be a macro, for which the
caller provides the hash function and compare operation inline.
Otherwise when we convert things like inet_hashtables.c to rhashtable
it's going to be a regression that will definitely show up in
microbenchmarks.
So essentially I think the rhashtables interface to how the keys are
described to the user is inside out. rhashtable should just provide
the outer-framework, and the user should provide the minute details of
the key hashing and comparison in something that gets expanded inline.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash
2015-03-16 4:01 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash David Miller
@ 2015-03-16 4:18 ` Herbert Xu
2015-03-16 4:30 ` David Miller
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-16 4:18 UTC (permalink / raw)
To: David Miller; +Cc: tgraf, netdev, eric.dumazet
On Mon, Mar 16, 2015 at 12:01:13AM -0400, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Sun, 15 Mar 2015 21:43:06 +1100
>
> > PS I'd love to kill the indirect call on the hash but I think
> > you guys need something other than jhash for sockets. Presumably
> > they are not exposed to untrusted parties. But I really wonder
> > whether that is still the case given things like namespaces where
> > even root may be untrusted.
>
> In my opinion the biggest architectural fault of rhashtables is
> these damn callbacks, look at the assembler for a simple hash
> lookup, it's disgusting.
Well if everybody used jhash the indirect call would go away...
> So essentially I think the rhashtables interface to how the keys are
> described to the user is inside out. rhashtable should just provide
> the outer-framework, and the user should provide the minute details of
> the key hashing and comparison in something that gets expanded inline.
Sure I'll try to work something out that eliminates the indirect
calls on the fast path.
However, the indirect call is still needed for rehashing but that
should be OK since rehashing is supposed to be rare.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash
2015-03-16 4:18 ` Herbert Xu
@ 2015-03-16 4:30 ` David Miller
2015-03-16 4:33 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: David Miller @ 2015-03-16 4:30 UTC (permalink / raw)
To: herbert; +Cc: tgraf, netdev, eric.dumazet
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 16 Mar 2015 15:18:05 +1100
> On Mon, Mar 16, 2015 at 12:01:13AM -0400, David Miller wrote:
>> From: Herbert Xu <herbert@gondor.apana.org.au>
>> Date: Sun, 15 Mar 2015 21:43:06 +1100
>>
>> > PS I'd love to kill the indirect call on the hash but I think
>> > you guys need something other than jhash for sockets. Presumably
>> > they are not exposed to untrusted parties. But I really wonder
>> > whether that is still the case given things like namespaces where
>> > even root may be untrusted.
>>
>> In my opinion the biggest architectural fault of rhashtables is
>> these damn callbacks, look at the assembler for a simple hash
>> lookup, it's disgusting.
>
> Well if everybody used jhash the indirect call would go away...
The other issue is that some hashes are per-ns and thus don't
need the namespace hash and comparison, whilst others do.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash
2015-03-16 4:30 ` David Miller
@ 2015-03-16 4:33 ` Herbert Xu
2015-03-16 4:40 ` David Miller
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-16 4:33 UTC (permalink / raw)
To: David Miller; +Cc: tgraf, netdev, eric.dumazet
On Mon, Mar 16, 2015 at 12:30:49AM -0400, David Miller wrote:
>
> > Well if everybody used jhash the indirect call would go away...
>
> The other issue is that some hashes are per-ns and thus don't
> need the namespace hash and comparison, whilst others do.
That's just a matter of what goes into the key. IOW if your
table is global then you include the namespace in the hash,
otherwise you don't. It shouldn't change the hash function.
Have a look at what I did to netlink in patch 9. If you wanted
a per-namespace hash table it is as simple as getting rid of
obj_hashfn and just reducing the key to have just the portid.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash
2015-03-16 4:33 ` Herbert Xu
@ 2015-03-16 4:40 ` David Miller
2015-03-16 11:26 ` Herbert Xu
0 siblings, 1 reply; 113+ messages in thread
From: David Miller @ 2015-03-16 4:40 UTC (permalink / raw)
To: herbert; +Cc: tgraf, netdev, eric.dumazet
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 16 Mar 2015 15:33:59 +1100
> On Mon, Mar 16, 2015 at 12:30:49AM -0400, David Miller wrote:
>>
>> > Well if everybody used jhash the indirect call would go away...
>>
>> The other issue is that some hashes are per-ns and thus don't
>> need the namespace hash and comparison, whilst others do.
>
> That's just a matter of what goes into the key. IOW if your
> table is global then you include the namespace in the hash,
> otherwise you don't. It shouldn't change the hash function.
>
> Have a look at what I did to netlink in patch 9. If you wanted
> a per-namespace hash table it is as simple as getting rid of
> obj_hashfn and just reducing the key to have just the portid.
I know, reading the hash callback adjusting patches is what
prompted me to start this conversation about inlining the
lookup.
Patrick McHardy had mentioned to me his consternation about this issue
offhand the other week, and I've just failed to get around to bringing
it up :-)
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash
2015-03-16 4:40 ` David Miller
@ 2015-03-16 11:26 ` Herbert Xu
2015-03-16 20:25 ` David Miller
0 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-16 11:26 UTC (permalink / raw)
To: David Miller; +Cc: tgraf, netdev, eric.dumazet, Patrick McHardy
On Mon, Mar 16, 2015 at 12:40:17AM -0400, David Miller wrote:
>
> Patrick McHardy had mentioned to me his consternation about this issue
> offhand the other week, and I've just failed to get around to bringing
> it up :-)
How about something like this? Compile tested only but at least
it does inline correctly.
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index c5104aa..79df647 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -42,6 +42,9 @@
#define RHT_HASH_BITS 27
#define RHT_BASE_SHIFT RHT_HASH_BITS
+/* Base bits plus 1 bit for nulls marker */
+#define RHT_HASH_RESERVED_SPACE (RHT_BASE_BITS + 1)
+
struct rhash_head {
struct rhash_head __rcu *next;
};
@@ -172,6 +175,24 @@ static inline unsigned long rht_get_nulls_value(const struct rhash_head *ptr)
return ((unsigned long) ptr) >> 1;
}
+static void *rht_obj(const struct rhashtable *ht, const struct rhash_head *he)
+{
+ return (void *) he - ht->p.head_offset;
+}
+
+static inline u32 rht_bucket_index(const struct bucket_table *tbl, u32 hash)
+{
+ return (hash >> RHT_HASH_RESERVED_SPACE) & (tbl->size - 1);
+}
+
+static inline u32 rht_key_hashfn(const struct bucket_table *tbl,
+ const void *key,
+ unsigned int key_len,
+ rht_hashfn_t hashfn)
+{
+ return rht_bucket_index(tbl, hashfn(key, key_len, tbl->hash_rnd));
+}
+
#ifdef CONFIG_PROVE_LOCKING
int lockdep_rht_mutex_is_held(struct rhashtable *ht);
int lockdep_rht_bucket_is_held(const struct bucket_table *tbl, u32 hash);
@@ -197,6 +218,7 @@ int rhashtable_expand(struct rhashtable *ht);
int rhashtable_shrink(struct rhashtable *ht);
void *rhashtable_lookup(struct rhashtable *ht, const void *key);
+void *rhashtable_lookup_slow(struct rhashtable *ht, const void *key);
int rhashtable_lookup_insert(struct rhashtable *ht, struct rhash_head *obj);
int rhashtable_lookup_insert_key(struct rhashtable *ht,
@@ -358,4 +380,30 @@ void rhashtable_destroy(struct rhashtable *ht);
rht_for_each_entry_rcu_continue(tpos, pos, (tbl)->buckets[hash],\
tbl, hash, member)
+static inline void *rhashtable_lookup_fast(struct rhashtable *ht,
+ const void *key,
+ unsigned int key_len,
+ rht_hashfn_t hashfn,
+ rht_obj_cmpfn_t obj_cmpfn)
+{
+ const struct bucket_table *tbl;
+ struct rhash_head *he;
+ u32 hash;
+
+ rcu_read_lock();
+
+ tbl = rht_dereference_rcu(ht->tbl, ht);
+ hash = rht_key_hashfn(tbl, key, key_len, hashfn);
+ rht_for_each_rcu(he, tbl, hash) {
+ if (!obj_cmpfn(key, rht_obj(ht, he)))
+ continue;
+ rcu_read_unlock();
+ return rht_obj(ht, he);
+ }
+
+ rcu_read_unlock();
+
+ return rhashtable_lookup(ht, key);
+}
+
#endif /* _LINUX_RHASHTABLE_H */
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index d22da74..7486e6a 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -33,9 +33,6 @@
#define HASH_MIN_SIZE 4U
#define BUCKET_LOCKS_PER_CPU 128UL
-/* Base bits plus 1 bit for nulls marker */
-#define HASH_RESERVED_SPACE (RHT_BASE_BITS + 1)
-
/* The bucket lock is selected based on the hash and protects mutations
* on a group of hash buckets.
*
@@ -54,21 +51,10 @@ static spinlock_t *bucket_lock(const struct bucket_table *tbl, u32 hash)
return &tbl->locks[hash & tbl->locks_mask];
}
-static void *rht_obj(const struct rhashtable *ht, const struct rhash_head *he)
-{
- return (void *) he - ht->p.head_offset;
-}
-
-static u32 rht_bucket_index(const struct bucket_table *tbl, u32 hash)
-{
- return (hash >> HASH_RESERVED_SPACE) & (tbl->size - 1);
-}
-
static u32 key_hashfn(struct rhashtable *ht, const struct bucket_table *tbl,
const void *key)
{
- return rht_bucket_index(tbl, ht->p.hashfn(key, ht->hashfn_key_len,
- tbl->hash_rnd));
+ return rht_key_hashfn(tbl, key, ht->p.key_len, ht->p.hashfn);
}
static u32 head_hashfn(struct rhashtable *ht,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index f521f35..4f225c7 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1000,7 +1000,7 @@ static struct sock *__netlink_lookup(struct netlink_table *table, u32 portid,
struct netlink_compare_arg arg;
netlink_compare_arg_init(&arg, net, portid);
- return rhashtable_lookup(&table->hash, &arg);
+ return rhashtable_lookup_fast(&table->hash, &arg, netlink_compare_arg_len, jhash2, netlink_compare);
}
static int __netlink_insert(struct netlink_table *table, struct sock *sk)
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related [flat|nested] 113+ messages in thread
* Re: [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash
2015-03-16 11:26 ` Herbert Xu
@ 2015-03-16 20:25 ` David Miller
0 siblings, 0 replies; 113+ messages in thread
From: David Miller @ 2015-03-16 20:25 UTC (permalink / raw)
To: herbert; +Cc: tgraf, netdev, eric.dumazet, kaber
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 16 Mar 2015 22:26:28 +1100
> On Mon, Mar 16, 2015 at 12:40:17AM -0400, David Miller wrote:
>>
>> Patrick McHardy had mentioned to me his consternation about this issue
>> offhand the other week, and I've just failed to get around to bringing
>> it up :-)
>
> How about something like this? Compile tested only but at least
> it does inline correctly.
I like it.
^ permalink raw reply [flat|nested] 113+ messages in thread
* [v2 PATCH 1/6] rhashtable: Remove shift from bucket_table
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (14 preceding siblings ...)
2015-03-16 4:01 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash David Miller
@ 2015-03-18 9:01 ` Herbert Xu
2015-03-18 9:01 ` [v2 PATCH 2/6] rhashtable: Introduce max_size/min_size Herbert Xu
` (4 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-18 9:01 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
Keeping both size and shift is silly. We only need one.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/rhashtable.h | 2 --
lib/rhashtable.c | 5 ++---
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 1695378..f16e856 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -51,7 +51,6 @@ struct rhash_head {
* @size: Number of hash buckets
* @rehash: Current bucket being rehashed
* @hash_rnd: Random seed to fold into hash
- * @shift: Current size (1 << shift)
* @locks_mask: Mask to apply before accessing locks[]
* @locks: Array of spinlocks protecting individual buckets
* @walkers: List of active walkers
@@ -63,7 +62,6 @@ struct bucket_table {
unsigned int size;
unsigned int rehash;
u32 hash_rnd;
- u32 shift;
unsigned int locks_mask;
spinlock_t *locks;
struct list_head walkers;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 09a7ada..0974003 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -162,7 +162,6 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
return NULL;
tbl->size = nbuckets;
- tbl->shift = ilog2(nbuckets);
if (alloc_bucket_locks(ht, tbl) < 0) {
bucket_table_free(tbl);
@@ -189,7 +188,7 @@ static bool rht_grow_above_75(const struct rhashtable *ht,
{
/* Expand table when exceeding 75% load */
return atomic_read(&ht->nelems) > (tbl->size / 4 * 3) &&
- (!ht->p.max_shift || tbl->shift < ht->p.max_shift);
+ (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift));
}
/**
@@ -202,7 +201,7 @@ static bool rht_shrink_below_30(const struct rhashtable *ht,
{
/* Shrink table beneath 30% load */
return atomic_read(&ht->nelems) < (tbl->size * 3 / 10) &&
- tbl->shift > ht->p.min_shift;
+ tbl->size > (1 << ht->p.min_shift);
}
static int rhashtable_rehash_one(struct rhashtable *ht, unsigned old_hash)
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v2 PATCH 2/6] rhashtable: Introduce max_size/min_size
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (15 preceding siblings ...)
2015-03-18 9:01 ` [v2 PATCH 1/6] rhashtable: Remove shift from bucket_table Herbert Xu
@ 2015-03-18 9:01 ` Herbert Xu
2015-03-18 10:55 ` Thomas Graf
2015-03-18 9:01 ` [v2 PATCH 3/6] netlink: Use rhashtable max_size instead of max_shift Herbert Xu
` (3 subsequent siblings)
20 siblings, 1 reply; 113+ messages in thread
From: Herbert Xu @ 2015-03-18 9:01 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch adds the parameters max_size and min_size which are
meant to replace max_shift and min_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/rhashtable.h | 4 ++++
lib/rhashtable.c | 12 ++++++++----
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index f16e856..81267fe 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -85,6 +85,8 @@ struct rhashtable;
* @head_offset: Offset of rhash_head in struct to be hashed
* @max_shift: Maximum number of shifts while expanding
* @min_shift: Minimum number of shifts while shrinking
+ * @max_size: Maximum size while expanding
+ * @min_size: Minimum size while shrinking
* @nulls_base: Base value to generate nulls marker
* @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
* @hashfn: Function to hash key
@@ -97,6 +99,8 @@ struct rhashtable_params {
size_t head_offset;
size_t max_shift;
size_t min_shift;
+ unsigned int max_size;
+ unsigned int min_size;
u32 nulls_base;
size_t locks_mul;
rht_hashfn_t hashfn;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 0974003..c4061bb 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -27,7 +27,7 @@
#include <linux/err.h>
#define HASH_DEFAULT_SIZE 64UL
-#define HASH_MIN_SIZE 4UL
+#define HASH_MIN_SIZE 4U
#define BUCKET_LOCKS_PER_CPU 128UL
/* Base bits plus 1 bit for nulls marker */
@@ -188,7 +188,8 @@ static bool rht_grow_above_75(const struct rhashtable *ht,
{
/* Expand table when exceeding 75% load */
return atomic_read(&ht->nelems) > (tbl->size / 4 * 3) &&
- (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift));
+ (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift)) &&
+ (!ht->p.max_size || tbl->size < ht->p.max_size);
}
/**
@@ -201,7 +202,8 @@ static bool rht_shrink_below_30(const struct rhashtable *ht,
{
/* Shrink table beneath 30% load */
return atomic_read(&ht->nelems) < (tbl->size * 3 / 10) &&
- tbl->size > (1 << ht->p.min_shift);
+ tbl->size > (1 << ht->p.min_shift) &&
+ tbl->size > ht->p.min_size;
}
static int rhashtable_rehash_one(struct rhashtable *ht, unsigned old_hash)
@@ -873,7 +875,8 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop);
static size_t rounded_hashtable_size(struct rhashtable_params *params)
{
return max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
- 1UL << params->min_shift);
+ max(1UL << params->min_shift,
+ (unsigned long)params->min_size));
}
/**
@@ -935,6 +938,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
params->min_shift = max_t(size_t, params->min_shift,
ilog2(HASH_MIN_SIZE));
+ params->min_size = max(params->min_size, HASH_MIN_SIZE);
if (params->nelem_hint)
size = rounded_hashtable_size(params);
^ permalink raw reply related [flat|nested] 113+ messages in thread
* Re: [v2 PATCH 2/6] rhashtable: Introduce max_size/min_size
2015-03-18 9:01 ` [v2 PATCH 2/6] rhashtable: Introduce max_size/min_size Herbert Xu
@ 2015-03-18 10:55 ` Thomas Graf
2015-03-18 16:47 ` David Miller
2015-03-18 16:51 ` David Laight
0 siblings, 2 replies; 113+ messages in thread
From: Thomas Graf @ 2015-03-18 10:55 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, netdev, Eric Dumazet
On 03/18/15 at 08:01pm, Herbert Xu wrote:
> @@ -935,6 +938,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
>
> params->min_shift = max_t(size_t, params->min_shift,
> ilog2(HASH_MIN_SIZE));
> + params->min_size = max(params->min_size, HASH_MIN_SIZE);
>
> if (params->nelem_hint)
> size = rounded_hashtable_size(params);
The only change I would add on top is to ensure that min_size
and max_size are a power of two as otherwise the table size
used will end up being greater or smaller than specified.
I can do that as a follow-up though.
^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [v2 PATCH 2/6] rhashtable: Introduce max_size/min_size
2015-03-18 10:55 ` Thomas Graf
@ 2015-03-18 16:47 ` David Miller
2015-03-18 16:51 ` David Laight
1 sibling, 0 replies; 113+ messages in thread
From: David Miller @ 2015-03-18 16:47 UTC (permalink / raw)
To: tgraf; +Cc: herbert, netdev, eric.dumazet
From: Thomas Graf <tgraf@suug.ch>
Date: Wed, 18 Mar 2015 10:55:28 +0000
> On 03/18/15 at 08:01pm, Herbert Xu wrote:
>> @@ -935,6 +938,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
>>
>> params->min_shift = max_t(size_t, params->min_shift,
>> ilog2(HASH_MIN_SIZE));
>> + params->min_size = max(params->min_size, HASH_MIN_SIZE);
>>
>> if (params->nelem_hint)
>> size = rounded_hashtable_size(params);
>
> The only change I would add on top is to ensure that min_size
> and max_size are a power of two as otherwise the table size
> used will end up being greater or smaller than specified.
> I can do that as a follow-up though.
Feel free.
^ permalink raw reply [flat|nested] 113+ messages in thread
* RE: [v2 PATCH 2/6] rhashtable: Introduce max_size/min_size
2015-03-18 10:55 ` Thomas Graf
2015-03-18 16:47 ` David Miller
@ 2015-03-18 16:51 ` David Laight
1 sibling, 0 replies; 113+ messages in thread
From: David Laight @ 2015-03-18 16:51 UTC (permalink / raw)
To: 'Thomas Graf', Herbert Xu; +Cc: David Miller, netdev, Eric Dumazet
From: Thomas Graf
> On 03/18/15 at 08:01pm, Herbert Xu wrote:
> > @@ -935,6 +938,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
> >
> > params->min_shift = max_t(size_t, params->min_shift,
> > ilog2(HASH_MIN_SIZE));
> > + params->min_size = max(params->min_size, HASH_MIN_SIZE);
> >
> > if (params->nelem_hint)
> > size = rounded_hashtable_size(params);
>
> The only change I would add on top is to ensure that min_size
> and max_size are a power of two as otherwise the table size
> used will end up being greater or smaller than specified.
I'd just make sure that 'something sensible' happens if they aren't.
You don't really want to error the table creation if some sysctl (etc)
that control the sizes isn't a power of 2.
David
^ permalink raw reply [flat|nested] 113+ messages in thread
* [v2 PATCH 3/6] netlink: Use rhashtable max_size instead of max_shift
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (16 preceding siblings ...)
2015-03-18 9:01 ` [v2 PATCH 2/6] rhashtable: Introduce max_size/min_size Herbert Xu
@ 2015-03-18 9:01 ` Herbert Xu
2015-03-18 9:01 ` [v2 PATCH 4/6] tipc: Use rhashtable max/min_size instead of max/min_shift Herbert Xu
` (2 subsequent siblings)
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-18 9:01 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch converts netlink to use rhashtable max_size instead
of the obsolete max_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/netlink/af_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6b0f219..d97aed6 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -3123,7 +3123,7 @@ static int __init netlink_proto_init(void)
.key_offset = offsetof(struct netlink_sock, portid),
.key_len = sizeof(u32), /* portid */
.hashfn = jhash,
- .max_shift = 16, /* 64K */
+ .max_size = 65536,
};
if (err != 0)
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v2 PATCH 4/6] tipc: Use rhashtable max/min_size instead of max/min_shift
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (17 preceding siblings ...)
2015-03-18 9:01 ` [v2 PATCH 3/6] netlink: Use rhashtable max_size instead of max_shift Herbert Xu
@ 2015-03-18 9:01 ` Herbert Xu
2015-03-18 9:01 ` [v2 PATCH 5/6] test_rhashtable: Use rhashtable max_size instead of max_shift Herbert Xu
2015-03-18 9:01 ` [v2 PATCH 6/6] rhashtable: Remove max_shift and min_shift Herbert Xu
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-18 9:01 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch converts tipc to use rhashtable max/min_size instead of
the obsolete max/min_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/tipc/socket.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 813847d..d7a6c10 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2286,8 +2286,8 @@ int tipc_sk_rht_init(struct net *net)
.key_offset = offsetof(struct tipc_sock, portid),
.key_len = sizeof(u32), /* portid */
.hashfn = jhash,
- .max_shift = 20, /* 1M */
- .min_shift = 8, /* 256 */
+ .max_size = 1048576,
+ .min_size = 256,
};
return rhashtable_init(&tn->sk_rht, &rht_params);
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v2 PATCH 5/6] test_rhashtable: Use rhashtable max_size instead of max_shift
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (18 preceding siblings ...)
2015-03-18 9:01 ` [v2 PATCH 4/6] tipc: Use rhashtable max/min_size instead of max/min_shift Herbert Xu
@ 2015-03-18 9:01 ` Herbert Xu
2015-03-18 9:01 ` [v2 PATCH 6/6] rhashtable: Remove max_shift and min_shift Herbert Xu
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-18 9:01 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
This patch converts test_rhashtable to use rhashtable max_size
instead of the obsolete max_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
lib/test_rhashtable.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 16974fd..2bc403d 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -201,7 +201,7 @@ static int __init test_rht_init(void)
.key_offset = offsetof(struct test_obj, value),
.key_len = sizeof(int),
.hashfn = jhash,
- .max_shift = 1, /* we expand/shrink manually here */
+ .max_size = 2, /* we expand/shrink manually here */
.nulls_base = (3U << RHT_BASE_SHIFT),
};
int err;
^ permalink raw reply related [flat|nested] 113+ messages in thread
* [v2 PATCH 6/6] rhashtable: Remove max_shift and min_shift
2015-03-15 10:43 ` [v1 PATCH 0/14] rhashtable: Kill shift/Key netlink namespace/Merge jhash Herbert Xu
` (19 preceding siblings ...)
2015-03-18 9:01 ` [v2 PATCH 5/6] test_rhashtable: Use rhashtable max_size instead of max_shift Herbert Xu
@ 2015-03-18 9:01 ` Herbert Xu
20 siblings, 0 replies; 113+ messages in thread
From: Herbert Xu @ 2015-03-18 9:01 UTC (permalink / raw)
To: David Miller, tgraf, netdev, Eric Dumazet
Now that nobody uses max_shift and min_shift, we can safely remove
them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/rhashtable.h | 4 ----
lib/rhashtable.c | 7 +------
2 files changed, 1 insertion(+), 10 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 81267fe..99425f2 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -83,8 +83,6 @@ struct rhashtable;
* @key_len: Length of key
* @key_offset: Offset of key in struct to be hashed
* @head_offset: Offset of rhash_head in struct to be hashed
- * @max_shift: Maximum number of shifts while expanding
- * @min_shift: Minimum number of shifts while shrinking
* @max_size: Maximum size while expanding
* @min_size: Minimum size while shrinking
* @nulls_base: Base value to generate nulls marker
@@ -97,8 +95,6 @@ struct rhashtable_params {
size_t key_len;
size_t key_offset;
size_t head_offset;
- size_t max_shift;
- size_t min_shift;
unsigned int max_size;
unsigned int min_size;
u32 nulls_base;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index c4061bb..5f8fe3e 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -188,7 +188,6 @@ static bool rht_grow_above_75(const struct rhashtable *ht,
{
/* Expand table when exceeding 75% load */
return atomic_read(&ht->nelems) > (tbl->size / 4 * 3) &&
- (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift)) &&
(!ht->p.max_size || tbl->size < ht->p.max_size);
}
@@ -202,7 +201,6 @@ static bool rht_shrink_below_30(const struct rhashtable *ht,
{
/* Shrink table beneath 30% load */
return atomic_read(&ht->nelems) < (tbl->size * 3 / 10) &&
- tbl->size > (1 << ht->p.min_shift) &&
tbl->size > ht->p.min_size;
}
@@ -875,8 +873,7 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop);
static size_t rounded_hashtable_size(struct rhashtable_params *params)
{
return max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
- max(1UL << params->min_shift,
- (unsigned long)params->min_size));
+ (unsigned long)params->min_size);
}
/**
@@ -936,8 +933,6 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
if (params->nulls_base && params->nulls_base < (1U << RHT_BASE_SHIFT))
return -EINVAL;
- params->min_shift = max_t(size_t, params->min_shift,
- ilog2(HASH_MIN_SIZE));
params->min_size = max(params->min_size, HASH_MIN_SIZE);
if (params->nelem_hint)
^ permalink raw reply related [flat|nested] 113+ messages in thread