All of lore.kernel.org
 help / color / mirror / Atom feed
* [v3 PATCH 0/9] rhashtable: Multiple rehashing
@ 2015-03-23 13:49 Herbert Xu
  2015-03-23 13:50 ` [v3 PATCH 1/9] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:49 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

Hi:

This series introduces multiple rehashing.

Recall that the original implementation in br_multicast used
two list pointers per hash node and therefore is limited to at
most one rehash at a time since you need one list pointer for
the old table and one for the new table.

Thanks to Josh Triplett's suggestion of using a single list pointer
we're no longer limited by that.  So it is perfectly OK to have
an arbitrary number of tables in existence at any one time.

The reader and removal simply has to walk from the oldest table
to the newest table in order not to miss anything.  Insertion
without lookup are just as easy as we simply go to the last table
that we can find and add the entry there.

However, insertion with uniqueness lookup is more complicated
because we need to ensure that two simultaneous insertions of the
same key do not both succeed.  To achieve this, all insertions
including those without lookups are required to obtain the bucket
lock from the oldest hash table that is still alive.  This is
determined by having the rehasher (there is only one rehashing
thread in the system) keep a pointer of where it is up to.  If
a bucket has already been rehashed then it is dead, i.e., there
cannot be any more insertions to it, otherwise it is considered
alive.  This guarantees that the same key cannot be inserted
in two different tables in parallel.

Patch 1 is actually a bug fix for the walker.

Patch 2-5 eliminates unnecessary out-of-line copies of jhash.

Patch 6 makes rhashtable_shrink shrink to fit.

Patch 7 introduces multiple rehashing.  This means that if we
decide to grow then we will grow regardless of whether the previous
one has finished.  However, this is still asynchronous meaning
that if insertions come fast enough we may still end up with a
table that is overutilised.

Patch 8 adds support for GFP_ATOMIC allocations of struct bucket_table.

Finally patch 9 enables immediate rehashing.  This is done either
when the table reaches 100% utilisation, or when the chain length
exceeds 16 (the latter can be disabled on request, e.g., for
nft_hash.

With these patches the system should no longer have any trouble
dealing with fast insertions on a small table.  In the worst
case you end up with a list of tables that's log N in length
while the rehasher catches up.

v3 restores rhashtable_shrink and fixes a number of bugs in the
multiple rehashing patches (7 and 9).

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [v3 PATCH 1/9] rhashtable: Add barrier to ensure we see new tables in walker
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
@ 2015-03-23 13:50 ` Herbert Xu
  2015-03-23 13:50 ` [v3 PATCH 2/9] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:50 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

The walker is a lockless reader so it too needs an smp_rmb before
reading the future_tbl field in order to see any new tables that
may contain elements that we should have walked over.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
---

 lib/rhashtable.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 83cfedd..618a3f0 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -477,6 +477,9 @@ next:
 		iter->skip = 0;
 	}
 
+	/* Ensure we see any new tables. */
+	smp_rmb();
+
 	iter->walker->tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 	if (iter->walker->tbl) {
 		iter->slot = 0;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [v3 PATCH 2/9] rhashtable: Eliminate unnecessary branch in rht_key_hashfn
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
  2015-03-23 13:50 ` [v3 PATCH 1/9] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
@ 2015-03-23 13:50 ` Herbert Xu
  2015-03-23 13:58   ` Thomas Graf
  2015-03-23 13:50 ` [v3 PATCH 3/9] rhashtable: Allow hashfn to be unset Herbert Xu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:50 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

When rht_key_hashfn is called from rhashtable itself and params
is equal to ht->p, there is no point in checking params.key_len
and falling back to ht->p.key_len.

For some reason gcc couldn't figure out that params is the same
as ht->p.  So let's help it by only checking params.key_len when
it's a constant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 18a6488..e316f10 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -199,8 +199,12 @@ static inline unsigned int rht_key_hashfn(
 	struct rhashtable *ht, const struct bucket_table *tbl,
 	const void *key, const struct rhashtable_params params)
 {
-	return rht_bucket_index(tbl, params.hashfn(key, params.key_len ?:
-							ht->p.key_len,
+	/* params must be equal to ht->p if it isn't constant. */
+	unsigned key_len = __builtin_constant_p(params.key_len) ?
+			   (params.key_len ?: ht->p.key_len) :
+			   params.key_len;
+
+	return rht_bucket_index(tbl, params.hashfn(key, key_len,
 						   tbl->hash_rnd));
 }
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [v3 PATCH 3/9] rhashtable: Allow hashfn to be unset
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
  2015-03-23 13:50 ` [v3 PATCH 1/9] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
  2015-03-23 13:50 ` [v3 PATCH 2/9] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
@ 2015-03-23 13:50 ` Herbert Xu
  2015-03-23 15:36   ` Thomas Graf
  2015-03-23 13:50 ` [v3 PATCH 4/9] netlink: Use default rhashtable hashfn Herbert Xu
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:50 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.

This function provides a solution to this problem by allowing
hashfn to be unset.  In which case rhashtable will automatically
set it to jhash.  Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   33 +++++++++++++++++++++++++++------
 lib/rhashtable.c           |   17 ++++++++++++++++-
 2 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index e316f10..53465dc 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -19,6 +19,7 @@
 
 #include <linux/compiler.h>
 #include <linux/errno.h>
+#include <linux/jhash.h>
 #include <linux/list_nulls.h>
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
@@ -103,7 +104,7 @@ struct rhashtable;
  * @min_size: Minimum size while shrinking
  * @nulls_base: Base value to generate nulls marker
  * @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
- * @hashfn: Function to hash key
+ * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
  * @obj_cmpfn: Function to compare key with object
  */
@@ -125,6 +126,7 @@ struct rhashtable_params {
  * struct rhashtable - Hash table handle
  * @tbl: Bucket table
  * @nelems: Number of elements in table
+ * @key_len: Key length for hashfn
  * @p: Configuration parameters
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
@@ -134,6 +136,7 @@ struct rhashtable {
 	struct bucket_table __rcu	*tbl;
 	atomic_t			nelems;
 	bool                            being_destroyed;
+	unsigned int			key_len;
 	struct rhashtable_params	p;
 	struct work_struct		run_work;
 	struct mutex                    mutex;
@@ -199,13 +202,31 @@ static inline unsigned int rht_key_hashfn(
 	struct rhashtable *ht, const struct bucket_table *tbl,
 	const void *key, const struct rhashtable_params params)
 {
+	unsigned hash;
+
 	/* params must be equal to ht->p if it isn't constant. */
-	unsigned key_len = __builtin_constant_p(params.key_len) ?
-			   (params.key_len ?: ht->p.key_len) :
-			   params.key_len;
+	if (!__builtin_constant_p(params.key_len))
+		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);
+	else if (params.key_len) {
+		unsigned key_len = params.key_len;
+
+		if (params.hashfn)
+			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+		else if (key_len & (sizeof(u32) - 1))
+			hash = jhash(key, key_len, tbl->hash_rnd);
+		else
+			hash = jhash2(key, key_len / sizeof(u32),
+				      tbl->hash_rnd);
+	} else {
+		unsigned key_len = ht->p.key_len;
+
+		if (params.hashfn)
+			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+		else
+			hash = jhash(key, key_len, tbl->hash_rnd);
+	}
 
-	return rht_bucket_index(tbl, params.hashfn(key, key_len,
-						   tbl->hash_rnd));
+	return rht_bucket_index(tbl, hash);
 }
 
 static inline unsigned int rht_head_hashfn(
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 618a3f0..798f01d 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -532,6 +532,11 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)
 		   (unsigned long)params->min_size);
 }
 
+static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
+{
+	return jhash2(key, length, seed);
+}
+
 /**
  * rhashtable_init - initialize a new hash table
  * @ht:		hash table to be initialized
@@ -583,7 +588,7 @@ int rhashtable_init(struct rhashtable *ht,
 
 	size = HASH_DEFAULT_SIZE;
 
-	if ((!(params->key_len && params->hashfn) && !params->obj_hashfn) ||
+	if ((!params->key_len && !params->obj_hashfn) ||
 	    (params->obj_hashfn && !params->obj_cmpfn))
 		return -EINVAL;
 
@@ -610,6 +615,16 @@ int rhashtable_init(struct rhashtable *ht,
 	else
 		ht->p.locks_mul = BUCKET_LOCKS_PER_CPU;
 
+	ht->key_len = ht->p.key_len;
+	if (!params->hashfn) {
+		ht->p.hashfn = jhash;
+
+		if (!(ht->key_len & (sizeof(u32) - 1))) {
+			ht->key_len /= sizeof(u32);
+			ht->p.hashfn = rhashtable_jhash2;
+		}
+	}
+
 	tbl = bucket_table_alloc(ht, size);
 	if (tbl == NULL)
 		return -ENOMEM;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [v3 PATCH 4/9] netlink: Use default rhashtable hashfn
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
                   ` (2 preceding siblings ...)
  2015-03-23 13:50 ` [v3 PATCH 3/9] rhashtable: Allow hashfn to be unset Herbert Xu
@ 2015-03-23 13:50 ` Herbert Xu
  2015-03-23 13:50 ` [v3 PATCH 5/9] tipc: " Herbert Xu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:50 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable.  As the key length is a multiple of 4, this means that
we will actually end up using jhash2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
---

 net/netlink/af_netlink.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6517921..e2f7f28 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -3133,13 +3133,12 @@ static inline u32 netlink_hash(const void *data, u32 seed)
 	struct netlink_compare_arg arg;
 
 	netlink_compare_arg_init(&arg, sock_net(&nlk->sk), nlk->portid);
-	return jhash(&arg, netlink_compare_arg_len, seed);
+	return jhash2((u32 *)&arg, netlink_compare_arg_len / sizeof(u32), seed);
 }
 
 static const struct rhashtable_params netlink_rhashtable_params = {
 	.head_offset = offsetof(struct netlink_sock, node),
 	.key_len = netlink_compare_arg_len,
-	.hashfn = jhash,
 	.obj_hashfn = netlink_hash,
 	.obj_cmpfn = netlink_compare,
 	.max_size = 65536,

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [v3 PATCH 5/9] tipc: Use default rhashtable hashfn
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
                   ` (3 preceding siblings ...)
  2015-03-23 13:50 ` [v3 PATCH 4/9] netlink: Use default rhashtable hashfn Herbert Xu
@ 2015-03-23 13:50 ` Herbert Xu
  2015-03-23 13:50 ` [v3 PATCH 6/9] rhashtable: Shrink to fit Herbert Xu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:50 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable.  The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
---

 net/tipc/socket.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 73c2f51..6dd5bd9 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -35,7 +35,6 @@
  */
 
 #include <linux/rhashtable.h>
-#include <linux/jhash.h>
 #include "core.h"
 #include "name_table.h"
 #include "node.h"
@@ -2294,7 +2293,6 @@ static const struct rhashtable_params tsk_rht_params = {
 	.head_offset = offsetof(struct tipc_sock, node),
 	.key_offset = offsetof(struct tipc_sock, portid),
 	.key_len = sizeof(u32), /* portid */
-	.hashfn = jhash,
 	.max_size = 1048576,
 	.min_size = 256,
 };

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [v3 PATCH 6/9] rhashtable: Shrink to fit
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
                   ` (4 preceding siblings ...)
  2015-03-23 13:50 ` [v3 PATCH 5/9] tipc: " Herbert Xu
@ 2015-03-23 13:50 ` Herbert Xu
  2015-03-23 15:38   ` Thomas Graf
  2015-03-23 13:50 ` [v3 PATCH 7/9] rhashtable: Add multiple rehash support Herbert Xu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:50 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch changes rhashtable_shrink to shrink to the smallest
size possible rather than halving the table.  This is needed
because with multiple rehashing we will defer shrinking until
all other rehashing is done, meaning that when we do shrink
we may be able to shrink a lot.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 lib/rhashtable.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 798f01d..9623be3 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -261,8 +261,8 @@ EXPORT_SYMBOL_GPL(rhashtable_expand);
  * rhashtable_shrink - Shrink hash table while allowing concurrent lookups
  * @ht:		the hash table to shrink
  *
- * This function may only be called in a context where it is safe to call
- * synchronize_rcu(), e.g. not within a rcu_read_lock() section.
+ * This function shrinks the hash table to fit, i.e., the smallest
+ * size would not cause it to expand right away automatically.
  *
  * The caller must ensure that no concurrent resizing occurs by holding
  * ht->mutex.
@@ -276,10 +276,17 @@ EXPORT_SYMBOL_GPL(rhashtable_expand);
 int rhashtable_shrink(struct rhashtable *ht)
 {
 	struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht);
+	unsigned size = roundup_pow_of_two(atomic_read(&ht->nelems) * 3 / 2);
 
 	ASSERT_RHT_MUTEX(ht);
 
-	new_tbl = bucket_table_alloc(ht, old_tbl->size / 2);
+	if (size < ht->p.min_size)
+		size = ht->p.min_size;
+
+	if (old_tbl->size <= size)
+		return 0;
+
+	new_tbl = bucket_table_alloc(ht, size);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [v3 PATCH 7/9] rhashtable: Add multiple rehash support
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
                   ` (5 preceding siblings ...)
  2015-03-23 13:50 ` [v3 PATCH 6/9] rhashtable: Shrink to fit Herbert Xu
@ 2015-03-23 13:50 ` Herbert Xu
  2015-03-23 16:37   ` Thomas Graf
  2015-03-23 13:50 ` [v3 PATCH 8/9] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:50 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch adds the missing bits to allow multiple rehashes.  The
read-side as well as remove already handle this correctly.  So it's
only the rehasher and insertion that need modification to handle
this.

Note that this patch doesn't actually enable it so for now rehashing
is still only performed by the worker thread.

This patch also disables the explicit expand/shrink interface because
the table is meant to expand and shrink automatically, and continuing
to export these interfaces unnecessarily complicates the life of the
rehasher since the rehash process is now composed of two parts.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   26 +++++++------
 lib/rhashtable.c           |   87 +++++++++++++++++++++++++++++++++++++--------
 lib/test_rhashtable.c      |   24 ------------
 3 files changed, 86 insertions(+), 51 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 53465dc..97fa904 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -308,9 +308,6 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *old_tbl);
 
-int rhashtable_expand(struct rhashtable *ht);
-int rhashtable_shrink(struct rhashtable *ht);
-
 int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter);
 void rhashtable_walk_exit(struct rhashtable_iter *iter);
 int rhashtable_walk_start(struct rhashtable_iter *iter) __acquires(RCU);
@@ -541,17 +538,22 @@ static inline int __rhashtable_insert_fast(
 	rcu_read_lock();
 
 	tbl = rht_dereference_rcu(ht->tbl, ht);
-	hash = rht_head_hashfn(ht, tbl, obj, params);
-	lock = rht_bucket_lock(tbl, hash);
-
-	spin_lock_bh(lock);
 
-	/* Because we have already taken the bucket lock in tbl,
-	 * if we find that future_tbl is not yet visible then
-	 * that guarantees all other insertions of the same entry
-	 * will also grab the bucket lock in tbl because until
-	 * the rehash completes ht->tbl won't be changed.
+	/* All insertions must grab the oldest table containing
+	 * the hashed bucket that is yet to be rehashed.
 	 */
+	for (;;) {
+		hash = rht_head_hashfn(ht, tbl, obj, params);
+		lock = rht_bucket_lock(tbl, hash);
+		spin_lock_bh(lock);
+
+		if (tbl->rehash <= hash)
+			break;
+
+		spin_unlock_bh(lock);
+		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+	}
+
 	new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 	if (unlikely(new_tbl)) {
 		err = rhashtable_insert_slow(ht, key, obj, new_tbl);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 9623be3..5e04403 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -136,11 +136,24 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 	return tbl;
 }
 
+static struct bucket_table *rhashtable_last_table(struct rhashtable *ht,
+						  struct bucket_table *tbl)
+{
+	struct bucket_table *new_tbl;
+
+	do {
+		new_tbl = tbl;
+		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+	} while (tbl);
+
+	return new_tbl;
+}
+
 static int rhashtable_rehash_one(struct rhashtable *ht, unsigned old_hash)
 {
 	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
-	struct bucket_table *new_tbl =
-		rht_dereference(old_tbl->future_tbl, ht) ?: old_tbl;
+	struct bucket_table *new_tbl = rhashtable_last_table(ht,
+		rht_dereference_rcu(old_tbl->future_tbl, ht));
 	struct rhash_head __rcu **pprev = &old_tbl->buckets[old_hash];
 	int err = -ENOENT;
 	struct rhash_head *head, *next, *entry;
@@ -196,12 +209,18 @@ static void rhashtable_rehash_chain(struct rhashtable *ht, unsigned old_hash)
 	spin_unlock_bh(old_bucket_lock);
 }
 
-static void rhashtable_rehash(struct rhashtable *ht,
-			      struct bucket_table *new_tbl)
+static int rhashtable_rehash_attach(struct rhashtable *ht,
+				    struct bucket_table *old_tbl,
+				    struct bucket_table *new_tbl)
 {
-	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
-	struct rhashtable_walker *walker;
-	unsigned old_hash;
+	/* Protect future_tbl using the first bucket lock. */
+	spin_lock_bh(old_tbl->locks);
+
+	/* Did somebody beat us to it? */
+	if (rcu_access_pointer(old_tbl->future_tbl)) {
+		spin_unlock_bh(old_tbl->locks);
+		return -EEXIST;
+	}
 
 	/* Make insertions go into the new, empty table right away. Deletions
 	 * and lookups will be attempted in both tables until we synchronize.
@@ -211,6 +230,22 @@ static void rhashtable_rehash(struct rhashtable *ht,
 	/* Ensure the new table is visible to readers. */
 	smp_wmb();
 
+	spin_unlock_bh(old_tbl->locks);
+
+	return 0;
+}
+
+static int rhashtable_rehash_table(struct rhashtable *ht)
+{
+	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
+	struct bucket_table *new_tbl;
+	struct rhashtable_walker *walker;
+	unsigned old_hash;
+
+	new_tbl = rht_dereference(old_tbl->future_tbl, ht);
+	if (!new_tbl)
+		return 0;
+
 	for (old_hash = 0; old_hash < old_tbl->size; old_hash++)
 		rhashtable_rehash_chain(ht, old_hash);
 
@@ -225,6 +260,8 @@ static void rhashtable_rehash(struct rhashtable *ht,
 	 * remain.
 	 */
 	call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
+
+	return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
 }
 
 /**
@@ -242,20 +279,25 @@ static void rhashtable_rehash(struct rhashtable *ht,
  * It is valid to have concurrent insertions and deletions protected by per
  * bucket locks or concurrent RCU protected lookups and traversals.
  */
-int rhashtable_expand(struct rhashtable *ht)
+static int rhashtable_expand(struct rhashtable *ht)
 {
 	struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht);
+	int err;
 
 	ASSERT_RHT_MUTEX(ht);
 
+	old_tbl = rhashtable_last_table(ht, old_tbl);
+
 	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
-	rhashtable_rehash(ht, new_tbl);
-	return 0;
+	err = rhashtable_rehash_attach(ht, old_tbl, new_tbl);
+	if (err)
+		bucket_table_free(new_tbl);
+
+	return err;
 }
-EXPORT_SYMBOL_GPL(rhashtable_expand);
 
 /**
  * rhashtable_shrink - Shrink hash table while allowing concurrent lookups
@@ -273,10 +315,11 @@ EXPORT_SYMBOL_GPL(rhashtable_expand);
  * It is valid to have concurrent insertions and deletions protected by per
  * bucket locks or concurrent RCU protected lookups and traversals.
  */
-int rhashtable_shrink(struct rhashtable *ht)
+static int rhashtable_shrink(struct rhashtable *ht)
 {
 	struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht);
 	unsigned size = roundup_pow_of_two(atomic_read(&ht->nelems) * 3 / 2);
+	int err;
 
 	ASSERT_RHT_MUTEX(ht);
 
@@ -286,19 +329,25 @@ int rhashtable_shrink(struct rhashtable *ht)
 	if (old_tbl->size <= size)
 		return 0;
 
+	if (rht_dereference(old_tbl->future_tbl, ht))
+		return -EEXIST;
+
 	new_tbl = bucket_table_alloc(ht, size);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
-	rhashtable_rehash(ht, new_tbl);
-	return 0;
+	err = rhashtable_rehash_attach(ht, old_tbl, new_tbl);
+	if (err)
+		bucket_table_free(new_tbl);
+
+	return err;
 }
-EXPORT_SYMBOL_GPL(rhashtable_shrink);
 
 static void rht_deferred_worker(struct work_struct *work)
 {
 	struct rhashtable *ht;
 	struct bucket_table *tbl;
+	int err = 0;
 
 	ht = container_of(work, struct rhashtable, run_work);
 	mutex_lock(&ht->mutex);
@@ -306,13 +355,20 @@ static void rht_deferred_worker(struct work_struct *work)
 		goto unlock;
 
 	tbl = rht_dereference(ht->tbl, ht);
+	tbl = rhashtable_last_table(ht, tbl);
 
 	if (rht_grow_above_75(ht, tbl))
 		rhashtable_expand(ht);
 	else if (rht_shrink_below_30(ht, tbl))
 		rhashtable_shrink(ht);
+
+	err = rhashtable_rehash_table(ht);
+
 unlock:
 	mutex_unlock(&ht->mutex);
+
+	if (err)
+		schedule_work(&ht->run_work);
 }
 
 int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
@@ -323,6 +379,7 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 	unsigned hash;
 	int err = -EEXIST;
 
+	tbl = rhashtable_last_table(ht, tbl);
 	hash = head_hashfn(ht, tbl, obj);
 	spin_lock_nested(rht_bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING);
 
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index a2ba6ad..a42a0d4 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -155,30 +155,6 @@ static int __init test_rhashtable(struct rhashtable *ht)
 	test_rht_lookup(ht);
 	rcu_read_unlock();
 
-	for (i = 0; i < TEST_NEXPANDS; i++) {
-		pr_info("  Table expansion iteration %u...\n", i);
-		mutex_lock(&ht->mutex);
-		rhashtable_expand(ht);
-		mutex_unlock(&ht->mutex);
-
-		rcu_read_lock();
-		pr_info("  Verifying lookups...\n");
-		test_rht_lookup(ht);
-		rcu_read_unlock();
-	}
-
-	for (i = 0; i < TEST_NEXPANDS; i++) {
-		pr_info("  Table shrinkage iteration %u...\n", i);
-		mutex_lock(&ht->mutex);
-		rhashtable_shrink(ht);
-		mutex_unlock(&ht->mutex);
-
-		rcu_read_lock();
-		pr_info("  Verifying lookups...\n");
-		test_rht_lookup(ht);
-		rcu_read_unlock();
-	}
-
 	rcu_read_lock();
 	test_bucket_stats(ht, true);
 	rcu_read_unlock();

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [v3 PATCH 8/9] rhashtable: Allow GFP_ATOMIC bucket table allocation
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
                   ` (6 preceding siblings ...)
  2015-03-23 13:50 ` [v3 PATCH 7/9] rhashtable: Add multiple rehash support Herbert Xu
@ 2015-03-23 13:50 ` Herbert Xu
  2015-03-23 16:38   ` Thomas Graf
  2015-03-23 13:50 ` [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion Herbert Xu
  2015-03-24  2:09 ` [v3 PATCH 0/9] rhashtable: Multiple rehashing David Miller
  9 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:50 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch adds the ability to allocate bucket table with GFP_ATOMIC
instead of GFP_KERNEL.  This is needed when we perform an immediate
rehash during insertion.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 lib/rhashtable.c |   26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 5e04403..220a11a 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -58,7 +58,8 @@ EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
 #endif
 
 
-static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
+static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
+			      gfp_t gfp)
 {
 	unsigned int i, size;
 #if defined(CONFIG_PROVE_LOCKING)
@@ -75,12 +76,13 @@ static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
 
 	if (sizeof(spinlock_t) != 0) {
 #ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE)
+		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+		    gfp == GFP_KERNEL)
 			tbl->locks = vmalloc(size * sizeof(spinlock_t));
 		else
 #endif
 		tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
-					   GFP_KERNEL);
+					   gfp);
 		if (!tbl->locks)
 			return -ENOMEM;
 		for (i = 0; i < size; i++)
@@ -105,23 +107,25 @@ static void bucket_table_free_rcu(struct rcu_head *head)
 }
 
 static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
-					       size_t nbuckets)
+					       size_t nbuckets,
+					       gfp_t gfp)
 {
 	struct bucket_table *tbl = NULL;
 	size_t size;
 	int i;
 
 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		tbl = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (tbl == NULL)
+	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) ||
+	    gfp != GFP_KERNEL)
+		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
+	if (tbl == NULL && gfp == GFP_KERNEL)
 		tbl = vzalloc(size);
 	if (tbl == NULL)
 		return NULL;
 
 	tbl->size = nbuckets;
 
-	if (alloc_bucket_locks(ht, tbl) < 0) {
+	if (alloc_bucket_locks(ht, tbl, gfp) < 0) {
 		bucket_table_free(tbl);
 		return NULL;
 	}
@@ -288,7 +292,7 @@ static int rhashtable_expand(struct rhashtable *ht)
 
 	old_tbl = rhashtable_last_table(ht, old_tbl);
 
-	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2);
+	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2, GFP_KERNEL);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
@@ -332,7 +336,7 @@ static int rhashtable_shrink(struct rhashtable *ht)
 	if (rht_dereference(old_tbl->future_tbl, ht))
 		return -EEXIST;
 
-	new_tbl = bucket_table_alloc(ht, size);
+	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
@@ -689,7 +693,7 @@ int rhashtable_init(struct rhashtable *ht,
 		}
 	}
 
-	tbl = bucket_table_alloc(ht, size);
+	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
 	if (tbl == NULL)
 		return -ENOMEM;
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
                   ` (7 preceding siblings ...)
  2015-03-23 13:50 ` [v3 PATCH 8/9] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
@ 2015-03-23 13:50 ` Herbert Xu
  2015-03-23 16:50   ` Thomas Graf
  2015-03-24  2:09 ` [v3 PATCH 0/9] rhashtable: Multiple rehashing David Miller
  9 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 13:50 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch reintroduces immediate rehash during insertion.  If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.

If this rehash fails then the insertion will fail.  Otherwise the
insertion will be reattempted in the new hash table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   42 +++++++++++++++++++++++++++----
 lib/rhashtable.c           |   60 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 96 insertions(+), 6 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 97fa904..57af1e9 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -103,6 +103,7 @@ struct rhashtable;
  * @max_size: Maximum size while expanding
  * @min_size: Minimum size while shrinking
  * @nulls_base: Base value to generate nulls marker
+ * @insecure_elasticity: Set to true to disable chain length checks
  * @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
  * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
@@ -116,6 +117,7 @@ struct rhashtable_params {
 	unsigned int		max_size;
 	unsigned int		min_size;
 	u32			nulls_base;
+	bool			insecure_elasticity;
 	size_t			locks_mul;
 	rht_hashfn_t		hashfn;
 	rht_obj_hashfn_t	obj_hashfn;
@@ -127,6 +129,7 @@ struct rhashtable_params {
  * @tbl: Bucket table
  * @nelems: Number of elements in table
  * @key_len: Key length for hashfn
+ * @elasticity: Maximum chain length before rehash
  * @p: Configuration parameters
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
@@ -137,6 +140,7 @@ struct rhashtable {
 	atomic_t			nelems;
 	bool                            being_destroyed;
 	unsigned int			key_len;
+	unsigned int			elasticity;
 	struct rhashtable_params	p;
 	struct work_struct		run_work;
 	struct mutex                    mutex;
@@ -266,6 +270,17 @@ static inline bool rht_shrink_below_30(const struct rhashtable *ht,
 	       tbl->size > ht->p.min_size;
 }
 
+/**
+ * rht_grow_above_100 - returns true if nelems > table-size
+ * @ht:		hash table
+ * @tbl:	current table
+ */
+static inline bool rht_grow_above_100(const struct rhashtable *ht,
+				      const struct bucket_table *tbl)
+{
+	return atomic_read(&ht->nelems) > tbl->size;
+}
+
 /* The bucket lock is selected based on the hash and protects mutations
  * on a group of hash buckets.
  *
@@ -307,6 +322,7 @@ int rhashtable_init(struct rhashtable *ht,
 int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *old_tbl);
+int rhashtable_insert_rehash(struct rhashtable *ht);
 
 int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter);
 void rhashtable_walk_exit(struct rhashtable_iter *iter);
@@ -529,12 +545,14 @@ static inline int __rhashtable_insert_fast(
 		.ht = ht,
 		.key = key,
 	};
-	int err = -EEXIST;
 	struct bucket_table *tbl, *new_tbl;
 	struct rhash_head *head;
 	spinlock_t *lock;
+	unsigned elasticity;
 	unsigned hash;
+	int err;
 
+restart:
 	rcu_read_lock();
 
 	tbl = rht_dereference_rcu(ht->tbl, ht);
@@ -557,20 +575,34 @@ static inline int __rhashtable_insert_fast(
 	new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 	if (unlikely(new_tbl)) {
 		err = rhashtable_insert_slow(ht, key, obj, new_tbl);
+		if (err == -EAGAIN)
+			goto slow_path;
 		goto out;
 	}
 
-	if (!key)
-		goto skip_lookup;
+	if (unlikely(rht_grow_above_100(ht, tbl))) {
+slow_path:
+		spin_unlock_bh(lock);
+		rcu_read_unlock();
+		err = rhashtable_insert_rehash(ht);
+		if (err)
+			return err;
+
+		goto restart;
+	}
 
+	err = -EEXIST;
+	elasticity = ht->elasticity;
 	rht_for_each(head, tbl, hash) {
-		if (unlikely(!(params.obj_cmpfn ?
+		if (key &&
+		    unlikely(!(params.obj_cmpfn ?
 			       params.obj_cmpfn(&arg, rht_obj(ht, head)) :
 			       rhashtable_compare(&arg, rht_obj(ht, head)))))
 			goto out;
+		if (!--elasticity)
+			goto slow_path;
 	}
 
-skip_lookup:
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 220a11a..7686c1e 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -375,21 +375,76 @@ unlock:
 		schedule_work(&ht->run_work);
 }
 
+static bool rhashtable_check_elasticity(struct rhashtable *ht,
+					struct bucket_table *tbl,
+					unsigned hash)
+{
+	unsigned elasticity = ht->elasticity;
+	struct rhash_head *head;
+
+	rht_for_each(head, tbl, hash)
+		if (!--elasticity)
+			return true;
+
+	return false;
+}
+
+int rhashtable_insert_rehash(struct rhashtable *ht)
+{
+	struct bucket_table *old_tbl;
+	struct bucket_table *new_tbl;
+	struct bucket_table *tbl;
+	unsigned int size;
+	int err;
+
+	old_tbl = rht_dereference_rcu(ht->tbl, ht);
+	tbl = rhashtable_last_table(ht, old_tbl);
+
+	size = tbl->size;
+
+	if (rht_grow_above_75(ht, tbl))
+		size *= 2;
+	/* More than two rehashes (not resizes) detected. */
+	else if (WARN_ON(old_tbl != tbl && old_tbl->size == size))
+		return -EBUSY;
+
+	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
+	if (new_tbl == NULL)
+		return -ENOMEM;
+
+	err = rhashtable_rehash_attach(ht, tbl, new_tbl);
+	if (err) {
+		bucket_table_free(new_tbl);
+		if (err == -EEXIST)
+			err = 0;
+	} else
+		schedule_work(&ht->run_work);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(rhashtable_insert_rehash);
+
 int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *tbl)
 {
 	struct rhash_head *head;
 	unsigned hash;
-	int err = -EEXIST;
+	int err;
 
 	tbl = rhashtable_last_table(ht, tbl);
 	hash = head_hashfn(ht, tbl, obj);
 	spin_lock_nested(rht_bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING);
 
+	err = -EEXIST;
 	if (key && rhashtable_lookup_fast(ht, key, ht->p))
 		goto exit;
 
+	err = -EAGAIN;
+	if (rhashtable_check_elasticity(ht, tbl, hash) ||
+	    rht_grow_above_100(ht, tbl))
+		goto exit;
+
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
@@ -678,6 +733,9 @@ int rhashtable_init(struct rhashtable *ht,
 
 	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
 
+	if (!params->insecure_elasticity)
+		ht->elasticity = 16;
+
 	if (params->locks_mul)
 		ht->p.locks_mul = roundup_pow_of_two(params->locks_mul);
 	else

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 2/9] rhashtable: Eliminate unnecessary branch in rht_key_hashfn
  2015-03-23 13:50 ` [v3 PATCH 2/9] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
@ 2015-03-23 13:58   ` Thomas Graf
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Graf @ 2015-03-23 13:58 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/24/15 at 12:50am, Herbert Xu wrote:
> When rht_key_hashfn is called from rhashtable itself and params
> is equal to ht->p, there is no point in checking params.key_len
> and falling back to ht->p.key_len.
> 
> For some reason gcc couldn't figure out that params is the same
> as ht->p.  So let's help it by only checking params.key_len when
> it's a constant.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 3/9] rhashtable: Allow hashfn to be unset
  2015-03-23 13:50 ` [v3 PATCH 3/9] rhashtable: Allow hashfn to be unset Herbert Xu
@ 2015-03-23 15:36   ` Thomas Graf
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Graf @ 2015-03-23 15:36 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/24/15 at 12:50am, Herbert Xu wrote:
> Since every current rhashtable user uses jhash as their hash
> function, the fact that jhash is an inline function causes each
> user to generate a copy of its code.
> 
> This function provides a solution to this problem by allowing
> hashfn to be unset.  In which case rhashtable will automatically
> set it to jhash.  Furthermore, if the key length is a multiple
> of 4, we will switch over to jhash2.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Acked-by: Thomas Graf <tgraf@suug.ch>

I'll add some documentation when the API stabilizes and multiple
rehashing is complete.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 6/9] rhashtable: Shrink to fit
  2015-03-23 13:50 ` [v3 PATCH 6/9] rhashtable: Shrink to fit Herbert Xu
@ 2015-03-23 15:38   ` Thomas Graf
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Graf @ 2015-03-23 15:38 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/24/15 at 12:50am, Herbert Xu wrote:
> This patch changes rhashtable_shrink to shrink to the smallest
> size possible rather than halving the table.  This is needed
> because with multiple rehashing we will defer shrinking until
> all other rehashing is done, meaning that when we do shrink
> we may be able to shrink a lot.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 7/9] rhashtable: Add multiple rehash support
  2015-03-23 13:50 ` [v3 PATCH 7/9] rhashtable: Add multiple rehash support Herbert Xu
@ 2015-03-23 16:37   ` Thomas Graf
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Graf @ 2015-03-23 16:37 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/24/15 at 12:50am, Herbert Xu wrote:
> This patch adds the missing bits to allow multiple rehashes.  The
> read-side as well as remove already handle this correctly.  So it's
> only the rehasher and insertion that need modification to handle
> this.
> 
> Note that this patch doesn't actually enable it so for now rehashing
> is still only performed by the worker thread.
> 
> This patch also disables the explicit expand/shrink interface because
> the table is meant to expand and shrink automatically, and continuing
> to export these interfaces unnecessarily complicates the life of the
> rehasher since the rehash process is now composed of two parts.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 8/9] rhashtable: Allow GFP_ATOMIC bucket table allocation
  2015-03-23 13:50 ` [v3 PATCH 8/9] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
@ 2015-03-23 16:38   ` Thomas Graf
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Graf @ 2015-03-23 16:38 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/24/15 at 12:50am, Herbert Xu wrote:
> This patch adds the ability to allocate bucket table with GFP_ATOMIC
> instead of GFP_KERNEL.  This is needed when we perform an immediate
> rehash during insertion.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion
  2015-03-23 13:50 ` [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion Herbert Xu
@ 2015-03-23 16:50   ` Thomas Graf
  2015-03-23 16:54     ` David Laight
  2015-03-23 21:44     ` Herbert Xu
  0 siblings, 2 replies; 23+ messages in thread
From: Thomas Graf @ 2015-03-23 16:50 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/24/15 at 12:50am, Herbert Xu wrote:
> @@ -116,6 +117,7 @@ struct rhashtable_params {
>  	unsigned int		max_size;
>  	unsigned int		min_size;
>  	u32			nulls_base;
> +	bool			insecure_elasticity;
>  	size_t			locks_mul;
>  	rht_hashfn_t		hashfn;
>  	rht_obj_hashfn_t	obj_hashfn;

First of all, love the naming of this variable ;-)

> @@ -127,6 +129,7 @@ struct rhashtable_params {
>   * @tbl: Bucket table
>   * @nelems: Number of elements in table
>   * @key_len: Key length for hashfn
> + * @elasticity: Maximum chain length before rehash
>   * @p: Configuration parameters
>   * @run_work: Deferred worker to expand/shrink asynchronously
>   * @mutex: Mutex to protect current/future table swapping

2nd: Seems like you rely on an underflow to allow to "disable"
the elasticity limit. Fair enough, but it would be great to
have the limit configurable as well.

How about making elasticity a signed int, default to 16 if user
specifies 0 and require it to be set to -1 (through a define)
to actually disable the behaviour. That would avoid requiring
two variables to implement this and makes the limit configurable
at the same time.

Otherwise this looks good to me.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion
  2015-03-23 16:50   ` Thomas Graf
@ 2015-03-23 16:54     ` David Laight
  2015-03-23 21:44     ` Herbert Xu
  1 sibling, 0 replies; 23+ messages in thread
From: David Laight @ 2015-03-23 16:54 UTC (permalink / raw)
  To: 'Thomas Graf', Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

From: Thomas Graf
> On 03/24/15 at 12:50am, Herbert Xu wrote:
> > @@ -116,6 +117,7 @@ struct rhashtable_params {
> >  	unsigned int		max_size;
> >  	unsigned int		min_size;
> >  	u32			nulls_base;
> > +	bool			insecure_elasticity;
> >  	size_t			locks_mul;
> >  	rht_hashfn_t		hashfn;
> >  	rht_obj_hashfn_t	obj_hashfn;
> 
> First of all, love the naming of this variable ;-)
> 
> > @@ -127,6 +129,7 @@ struct rhashtable_params {
> >   * @tbl: Bucket table
> >   * @nelems: Number of elements in table
> >   * @key_len: Key length for hashfn
> > + * @elasticity: Maximum chain length before rehash
> >   * @p: Configuration parameters
> >   * @run_work: Deferred worker to expand/shrink asynchronously
> >   * @mutex: Mutex to protect current/future table swapping
> 
> 2nd: Seems like you rely on an underflow to allow to "disable"
> the elasticity limit. Fair enough, but it would be great to
> have the limit configurable as well.
> 
> How about making elasticity a signed int, default to 16 if user
> specifies 0 and require it to be set to -1 (through a define)
> to actually disable the behaviour. That would avoid requiring
> two variables to implement this and makes the limit configurable
> at the same time.

Or set the value to MAXINT to disable it.
That way you don't even need two tests in the code.

	David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion
  2015-03-23 16:50   ` Thomas Graf
  2015-03-23 16:54     ` David Laight
@ 2015-03-23 21:44     ` Herbert Xu
  2015-03-23 21:56       ` Thomas Graf
  1 sibling, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 21:44 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 04:50:33PM +0000, Thomas Graf wrote:
>
> 2nd: Seems like you rely on an underflow to allow to "disable"
> the elasticity limit. Fair enough, but it would be great to
> have the limit configurable as well.
> 
> How about making elasticity a signed int, default to 16 if user
> specifies 0 and require it to be set to -1 (through a define)
> to actually disable the behaviour. That would avoid requiring
> two variables to implement this and makes the limit configurable
> at the same time.

I specifically made it this way because I don't think the users
should be touching the actual limit.  This is something that you
always want to enable unless you're in the situation of netfilter
where you don't care.

If you did care then 16 is sort of intrinsic to the 32-bit hash
that we're using.  Going below doesn't make much sense because
you may run into false warnings due to double rehashes (that's
how I discovered 4 was uesless, very quickly :) Remember the
rehash is only there to detect pathological cases where people
are actively attacking us.  Otherwise the 100% utilisation check
will kick in.

Going above 16 means that you're hashing multiple objects with
the same key.  Then you'd want to disable this altogether.

The rhashtable already has too many knobs and I don't want to
add anything unnecessary to rhashtable_params.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion
  2015-03-23 21:44     ` Herbert Xu
@ 2015-03-23 21:56       ` Thomas Graf
  2015-03-23 22:15         ` Herbert Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Graf @ 2015-03-23 21:56 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/24/15 at 08:44am, Herbert Xu wrote:
> I specifically made it this way because I don't think the users
> should be touching the actual limit.  This is something that you
> always want to enable unless you're in the situation of netfilter
> where you don't care.
> 
> If you did care then 16 is sort of intrinsic to the 32-bit hash
> that we're using.  Going below doesn't make much sense because
> you may run into false warnings due to double rehashes (that's
> how I discovered 4 was uesless, very quickly :) Remember the
> rehash is only there to detect pathological cases where people
> are actively attacking us.  Otherwise the 100% utilisation check
> will kick in.
> 
> Going above 16 means that you're hashing multiple objects with
> the same key.  Then you'd want to disable this altogether.
> 
> The rhashtable already has too many knobs and I don't want to
> add anything unnecessary to rhashtable_params.

OK. I can certinaly live with this. You are certinaly right that
simplicity isn't a bad move here. Thanks for being patience and
answering all the questions.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion
  2015-03-23 21:56       ` Thomas Graf
@ 2015-03-23 22:15         ` Herbert Xu
  0 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2015-03-23 22:15 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 09:56:32PM +0000, Thomas Graf wrote:
>
> OK. I can certinaly live with this. You are certinaly right that
> simplicity isn't a bad move here. Thanks for being patience and
> answering all the questions.
> 
> Acked-by: Thomas Graf <tgraf@suug.ch>

Thank you for your reviewing and critiques!
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 0/9] rhashtable: Multiple rehashing
  2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
                   ` (8 preceding siblings ...)
  2015-03-23 13:50 ` [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion Herbert Xu
@ 2015-03-24  2:09 ` David Miller
  2015-03-24  2:37   ` Herbert Xu
  9 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2015-03-24  2:09 UTC (permalink / raw)
  To: herbert; +Cc: tgraf, eric.dumazet, kaber, josh, paulmck, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 24 Mar 2015 00:49:55 +1100

> This series introduces multiple rehashing.

This looks great, nice work.  All applied to net-next.

Could you please add a comment next to the "16" from patch
#9 explaining why that value was choosen and the invariants
that influence it?

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 0/9] rhashtable: Multiple rehashing
  2015-03-24  2:09 ` [v3 PATCH 0/9] rhashtable: Multiple rehashing David Miller
@ 2015-03-24  2:37   ` Herbert Xu
  2015-03-24 18:57     ` David Miller
  0 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2015-03-24  2:37 UTC (permalink / raw)
  To: David Miller; +Cc: tgraf, eric.dumazet, kaber, josh, paulmck, netdev

On Mon, Mar 23, 2015 at 10:09:13PM -0400, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Tue, 24 Mar 2015 00:49:55 +1100
> 
> > This series introduces multiple rehashing.
> 
> This looks great, nice work.  All applied to net-next.

Thanks!

> Could you please add a comment next to the "16" from patch
> #9 explaining why that value was choosen and the invariants
> that influence it?

Sure.

---8<---
rhashtable: Add comment on choice of elasticity value

This patch adds a comment on the choice of the value 16 as the
maximum chain length before we force a rehash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index e96ad1a..8514f7c 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -736,6 +736,18 @@ int rhashtable_init(struct rhashtable *ht,
 
 	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
 
+	/* The maximum (not average) chain length grows with the
+	 * size of the hash table, at a rate of (log N)/(log log N).
+	 * The value of 16 is selected so that even if the hash
+	 * table grew to 2^32 you would not expect the maximum
+	 * chain length to exceed it unless we are under attack
+	 * (or extremely unlucky).
+	 *
+	 * As this limit is only to detect attacks, we don't need
+	 * to set it to a lower value as you'd need the chain
+	 * length to vastly exceed 16 to have any real effect
+	 * on the system.
+	 */
 	if (!params->insecure_elasticity)
 		ht->elasticity = 16;
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [v3 PATCH 0/9] rhashtable: Multiple rehashing
  2015-03-24  2:37   ` Herbert Xu
@ 2015-03-24 18:57     ` David Miller
  0 siblings, 0 replies; 23+ messages in thread
From: David Miller @ 2015-03-24 18:57 UTC (permalink / raw)
  To: herbert; +Cc: tgraf, eric.dumazet, kaber, josh, paulmck, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 24 Mar 2015 13:37:30 +1100

> rhashtable: Add comment on choice of elasticity value
> 
> This patch adds a comment on the choice of the value 16 as the
> maximum chain length before we force a rehash.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks for following up on this, applied.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-03-24 18:57 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-23 13:49 [v3 PATCH 0/9] rhashtable: Multiple rehashing Herbert Xu
2015-03-23 13:50 ` [v3 PATCH 1/9] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
2015-03-23 13:50 ` [v3 PATCH 2/9] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
2015-03-23 13:58   ` Thomas Graf
2015-03-23 13:50 ` [v3 PATCH 3/9] rhashtable: Allow hashfn to be unset Herbert Xu
2015-03-23 15:36   ` Thomas Graf
2015-03-23 13:50 ` [v3 PATCH 4/9] netlink: Use default rhashtable hashfn Herbert Xu
2015-03-23 13:50 ` [v3 PATCH 5/9] tipc: " Herbert Xu
2015-03-23 13:50 ` [v3 PATCH 6/9] rhashtable: Shrink to fit Herbert Xu
2015-03-23 15:38   ` Thomas Graf
2015-03-23 13:50 ` [v3 PATCH 7/9] rhashtable: Add multiple rehash support Herbert Xu
2015-03-23 16:37   ` Thomas Graf
2015-03-23 13:50 ` [v3 PATCH 8/9] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
2015-03-23 16:38   ` Thomas Graf
2015-03-23 13:50 ` [v3 PATCH 9/9] rhashtable: Add immediate rehash during insertion Herbert Xu
2015-03-23 16:50   ` Thomas Graf
2015-03-23 16:54     ` David Laight
2015-03-23 21:44     ` Herbert Xu
2015-03-23 21:56       ` Thomas Graf
2015-03-23 22:15         ` Herbert Xu
2015-03-24  2:09 ` [v3 PATCH 0/9] rhashtable: Multiple rehashing David Miller
2015-03-24  2:37   ` Herbert Xu
2015-03-24 18:57     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.