All of lore.kernel.org
 help / color / mirror / Atom feed
* [v1 PATCH 0/10] rhashtable: Multiple rehashing
@ 2015-03-22  7:53 Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

Hi:

This series introduces multiple rehashing.

Recall that the original implementation in br_multicast used
two list pointers per hash node and therefore is limited to at
most one rehash at a time since you need one list pointer for
the old table and one for the new table.

Thanks to Josh Triplett's suggestion of using a single list pointer
we're no longer limited by that.  So it is perfectly OK to have
an arbitrary number of tables in existence at any one time.

The reader and removal simply has to walk from the oldest table
to the newest table in order not to miss anything.  Insertion
without lookup are just as easy as we simply go to the last table
that we can find and add the entry there.

However, insertion with uniqueness lookup is more complicated
because we need to ensure that two simultaneous insertions of the
same key do not both succeed.  To achieve this, all insertions
including those without lookups are required to obtain the bucket
lock from the oldest hash table that is still alive.  This is
determined by having the rehasher (there is only one rehashing
thread in the system) keep a pointer of where it is up to.  If
a bucket has already been rehashed then it is dead, i.e., there
cannot be any more insertions to it, otherwise it is considered
alive.  This guarantees that the same key cannot be inserted
in two different tables in parallel.

Patch 1 is actually a bug fix for the walker.

Patch 2-6 eliminates unnecessary out-of-line copies of jhash.

Patch 7 disables automatic shrinking so now shrinking is only
possible if requested by the user.

Patch 8 introduces multiple rehashing.  This means that if we
decide to grow then we will grow regardless of whether the previous
one has finished.  However, this is still asynchronous meaning
that if insertions come fast enough we may still end up with a
table that is overutilised.

Patch 9 adds support for GFP_ATOMIC allocations of struct bucket_table.

Finally patch 10 enables immediate rehashing.  This is done either
when the table reaches 100% utilisation, or when the chain length
exceeds 16 (the latter can be disabled on request, e.g., for
nft_hash.

With these patches the system should no longer have any trouble
dealing with fast insertions on a small table.  In the worst
case you end up with a list of tables that's log N in length
while the rehasher catches up.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [v1 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

The walker is a lockless reader so it too needs an smp_rmb before
reading the future_tbl field in order to see any new tables that
may contain elements that we should have walked over.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 lib/rhashtable.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 83cfedd..618a3f0 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -477,6 +477,9 @@ next:
 		iter->skip = 0;
 	}
 
+	/* Ensure we see any new tables. */
+	smp_rmb();
+
 	iter->walker->tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 	if (iter->walker->tbl) {
 		iter->slot = 0;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 3/10] rhashtable: Allow hashfn to be unset Herbert Xu
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

When rht_key_hashfn is called from rhashtable itself and params
is equal to ht->p, there is no point in checking params.key_len
and falling back to ht->p.key_len.

For some reason gcc couldn't figure out that params is the same
as ht->p.  So let's help it by only checking params.key_len when
it's a constant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 18a6488..287b609 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -199,8 +199,11 @@ static inline unsigned int rht_key_hashfn(
 	struct rhashtable *ht, const struct bucket_table *tbl,
 	const void *key, const struct rhashtable_params params)
 {
-	return rht_bucket_index(tbl, params.hashfn(key, params.key_len ?:
-							ht->p.key_len,
+	unsigned key_len = __builtin_constant_p(params.key_len) ?
+			   (params.key_len ?: ht->p.key_len) :
+			   params.key_len;
+
+	return rht_bucket_index(tbl, params.hashfn(key, key_len,
 						   tbl->hash_rnd));
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 3/10] rhashtable: Allow hashfn to be unset
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 4/10] netlink: Use default rhashtable hashfn Herbert Xu
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.

This function provides a solution to this problem by allowing
hashfn to be unset.  In which case rhashtable will automatically
set it to jhash.  Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   33 +++++++++++++++++++++++++++------
 lib/rhashtable.c           |   17 ++++++++++++++++-
 2 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 287b609..2a98b95 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -19,6 +19,7 @@
 
 #include <linux/compiler.h>
 #include <linux/errno.h>
+#include <linux/jhash.h>
 #include <linux/list_nulls.h>
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
@@ -103,7 +104,7 @@ struct rhashtable;
  * @min_size: Minimum size while shrinking
  * @nulls_base: Base value to generate nulls marker
  * @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
- * @hashfn: Function to hash key
+ * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
  * @obj_cmpfn: Function to compare key with object
  */
@@ -125,6 +126,7 @@ struct rhashtable_params {
  * struct rhashtable - Hash table handle
  * @tbl: Bucket table
  * @nelems: Number of elements in table
+ * @key_len: Key length for hashfn
  * @p: Configuration parameters
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
@@ -134,6 +136,7 @@ struct rhashtable {
 	struct bucket_table __rcu	*tbl;
 	atomic_t			nelems;
 	bool                            being_destroyed;
+	unsigned int			key_len;
 	struct rhashtable_params	p;
 	struct work_struct		run_work;
 	struct mutex                    mutex;
@@ -199,12 +202,30 @@ static inline unsigned int rht_key_hashfn(
 	struct rhashtable *ht, const struct bucket_table *tbl,
 	const void *key, const struct rhashtable_params params)
 {
-	unsigned key_len = __builtin_constant_p(params.key_len) ?
-			   (params.key_len ?: ht->p.key_len) :
-			   params.key_len;
+	unsigned hash;
+
+	if (!__builtin_constant_p(params.key_len))
+		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);
+	else if (params.key_len) {
+		unsigned key_len = params.key_len;
+
+		if (params.hashfn)
+			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+		else if (key_len & (sizeof(u32) - 1))
+			hash = jhash(key, key_len, tbl->hash_rnd);
+		else
+			hash = jhash2(key, key_len / sizeof(u32),
+				      tbl->hash_rnd);
+	} else {
+		unsigned key_len = ht->p.key_len;
+
+		if (params.hashfn)
+			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+		else
+			hash = jhash(key, key_len, tbl->hash_rnd);
+	}
 
-	return rht_bucket_index(tbl, params.hashfn(key, key_len,
-						   tbl->hash_rnd));
+	return rht_bucket_index(tbl, hash);
 }
 
 static inline unsigned int rht_head_hashfn(
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 618a3f0..798f01d 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -532,6 +532,11 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)
 		   (unsigned long)params->min_size);
 }
 
+static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
+{
+	return jhash2(key, length, seed);
+}
+
 /**
  * rhashtable_init - initialize a new hash table
  * @ht:		hash table to be initialized
@@ -583,7 +588,7 @@ int rhashtable_init(struct rhashtable *ht,
 
 	size = HASH_DEFAULT_SIZE;
 
-	if ((!(params->key_len && params->hashfn) && !params->obj_hashfn) ||
+	if ((!params->key_len && !params->obj_hashfn) ||
 	    (params->obj_hashfn && !params->obj_cmpfn))
 		return -EINVAL;
 
@@ -610,6 +615,16 @@ int rhashtable_init(struct rhashtable *ht,
 	else
 		ht->p.locks_mul = BUCKET_LOCKS_PER_CPU;
 
+	ht->key_len = ht->p.key_len;
+	if (!params->hashfn) {
+		ht->p.hashfn = jhash;
+
+		if (!(ht->key_len & (sizeof(u32) - 1))) {
+			ht->key_len /= sizeof(u32);
+			ht->p.hashfn = rhashtable_jhash2;
+		}
+	}
+
 	tbl = bucket_table_alloc(ht, size);
 	if (tbl == NULL)
 		return -ENOMEM;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 4/10] netlink: Use default rhashtable hashfn
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (2 preceding siblings ...)
  2015-03-22  7:53 ` [v1 PATCH 3/10] rhashtable: Allow hashfn to be unset Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 5/10] Herbert Xu
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable.  The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/netlink/af_netlink.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6517921..e2f7f28 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -3133,13 +3133,12 @@ static inline u32 netlink_hash(const void *data, u32 seed)
 	struct netlink_compare_arg arg;
 
 	netlink_compare_arg_init(&arg, sock_net(&nlk->sk), nlk->portid);
-	return jhash(&arg, netlink_compare_arg_len, seed);
+	return jhash2((u32 *)&arg, netlink_compare_arg_len / sizeof(u32), seed);
 }
 
 static const struct rhashtable_params netlink_rhashtable_params = {
 	.head_offset = offsetof(struct netlink_sock, node),
 	.key_len = netlink_compare_arg_len,
-	.hashfn = jhash,
 	.obj_hashfn = netlink_hash,
 	.obj_cmpfn = netlink_compare,
 	.max_size = 65536,

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 5/10]
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (3 preceding siblings ...)
  2015-03-22  7:53 ` [v1 PATCH 4/10] netlink: Use default rhashtable hashfn Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  7:55   ` Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 6/10] netfilter: Use default rhashtable hashfn Herbert Xu
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev


This patch removes the explicit jhash value for the hashfn parameter
of rhashtable.  The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/tipc/socket.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 73c2f51..6dd5bd9 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -35,7 +35,6 @@
  */
 
 #include <linux/rhashtable.h>
-#include <linux/jhash.h>
 #include "core.h"
 #include "name_table.h"
 #include "node.h"
@@ -2294,7 +2293,6 @@ static const struct rhashtable_params tsk_rht_params = {
 	.head_offset = offsetof(struct tipc_sock, node),
 	.key_offset = offsetof(struct tipc_sock, portid),
 	.key_len = sizeof(u32), /* portid */
-	.hashfn = jhash,
 	.max_size = 1048576,
 	.min_size = 256,
 };

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 6/10] netfilter: Use default rhashtable hashfn
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (4 preceding siblings ...)
  2015-03-22  7:53 ` [v1 PATCH 5/10] Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 7/10] rhashtable: Disable automatic shrinking Herbert Xu
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable.  The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/netfilter/nft_hash.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 4585c57..3bda4e7 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -13,7 +13,6 @@
 #include <linux/module.h>
 #include <linux/list.h>
 #include <linux/log2.h>
-#include <linux/jhash.h>
 #include <linux/netlink.h>
 #include <linux/rhashtable.h>
 #include <linux/netfilter.h>
@@ -169,7 +168,6 @@ static unsigned int nft_hash_privsize(const struct nlattr * const nla[])
 static const struct rhashtable_params nft_hash_params = {
 	.head_offset = offsetof(struct nft_hash_elem, node),
 	.key_offset = offsetof(struct nft_hash_elem, key),
-	.hashfn = jhash,
 };
 
 static int nft_hash_init(const struct nft_set *set,

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (5 preceding siblings ...)
  2015-03-22  7:53 ` [v1 PATCH 6/10] netfilter: Use default rhashtable hashfn Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 8/10] rhashtable: Add multiple rehash support Herbert Xu
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

Automatic shrinking is dangerous because it provides an easy
way for an adversary to cause us to do unnecessary work.  Thus
making the resizable hashtable a poor data structure.

This patch disables automatic shrinking but retains a manual
shrink function for those cases where insertions and removals
are overseen by a trusted entity, e.g., nft_hash.

The shrink function will now also shrink to fit rather than halve
the size of the table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   15 ---------------
 lib/rhashtable.c           |   44 ++++++++++++++++++++++++++++++--------------
 lib/test_rhashtable.c      |   16 ++++++----------
 3 files changed, 36 insertions(+), 39 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 2a98b95..44aa579 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -252,19 +252,6 @@ static inline bool rht_grow_above_75(const struct rhashtable *ht,
 	       (!ht->p.max_size || tbl->size < ht->p.max_size);
 }
 
-/**
- * rht_shrink_below_30 - returns true if nelems < 0.3 * table-size
- * @ht:		hash table
- * @tbl:	current table
- */
-static inline bool rht_shrink_below_30(const struct rhashtable *ht,
-				       const struct bucket_table *tbl)
-{
-	/* Shrink table beneath 30% load */
-	return atomic_read(&ht->nelems) < (tbl->size * 3 / 10) &&
-	       tbl->size > ht->p.min_size;
-}
-
 /* The bucket lock is selected based on the hash and protects mutations
  * on a group of hash buckets.
  *
@@ -745,8 +732,6 @@ static inline int rhashtable_remove_fast(
 		goto out;
 
 	atomic_dec(&ht->nelems);
-	if (rht_shrink_below_30(ht, tbl))
-		schedule_work(&ht->run_work);
 
 out:
 	rcu_read_unlock();
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 798f01d..08a6123 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -261,30 +261,48 @@ EXPORT_SYMBOL_GPL(rhashtable_expand);
  * rhashtable_shrink - Shrink hash table while allowing concurrent lookups
  * @ht:		the hash table to shrink
  *
- * This function may only be called in a context where it is safe to call
- * synchronize_rcu(), e.g. not within a rcu_read_lock() section.
- *
- * The caller must ensure that no concurrent resizing occurs by holding
- * ht->mutex.
- *
- * The caller must ensure that no concurrent table mutations take place.
- * It is however valid to have concurrent lookups if they are RCU protected.
+ * This function shrinks the hash table to fit, i.e., the smallest
+ * size would not cause it to expand right away automatically.
  *
  * It is valid to have concurrent insertions and deletions protected by per
  * bucket locks or concurrent RCU protected lookups and traversals.
  */
 int rhashtable_shrink(struct rhashtable *ht)
 {
-	struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht);
+	unsigned size = roundup_pow_of_two(atomic_read(&ht->nelems) * 4 / 3);
+	struct bucket_table *new_tbl;
+	struct bucket_table *tbl;
+	int err;
 
-	ASSERT_RHT_MUTEX(ht);
+	if (size < ht->p.min_size)
+		size = ht->p.min_size;
 
-	new_tbl = bucket_table_alloc(ht, old_tbl->size / 2);
+	new_tbl = bucket_table_alloc(ht, size);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
+	err = -EEXIST;
+
+	mutex_lock(&ht->mutex);
+	tbl = rht_dereference(ht->tbl, ht);
+	if (rht_dereference(tbl->future_tbl, ht))
+		goto out_free_tbl;
+
+	err = 0;
+	if (tbl->size <= size)
+		goto out_free_tbl;
+
 	rhashtable_rehash(ht, new_tbl);
-	return 0;
+
+	mutex_unlock(&ht->mutex);
+
+out:
+	return err;
+
+out_free_tbl:
+	mutex_unlock(&ht->mutex);
+	bucket_table_free(new_tbl);
+	goto out;
 }
 EXPORT_SYMBOL_GPL(rhashtable_shrink);
 
@@ -302,8 +320,6 @@ static void rht_deferred_worker(struct work_struct *work)
 
 	if (rht_grow_above_75(ht, tbl))
 		rhashtable_expand(ht);
-	else if (rht_shrink_below_30(ht, tbl))
-		rhashtable_shrink(ht);
 unlock:
 	mutex_unlock(&ht->mutex);
 }
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index a2ba6ad..0ceb332 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -167,17 +167,13 @@ static int __init test_rhashtable(struct rhashtable *ht)
 		rcu_read_unlock();
 	}
 
-	for (i = 0; i < TEST_NEXPANDS; i++) {
-		pr_info("  Table shrinkage iteration %u...\n", i);
-		mutex_lock(&ht->mutex);
-		rhashtable_shrink(ht);
-		mutex_unlock(&ht->mutex);
+	pr_info("  Table shrinkage...\n");
+	rhashtable_shrink(ht);
 
-		rcu_read_lock();
-		pr_info("  Verifying lookups...\n");
-		test_rht_lookup(ht);
-		rcu_read_unlock();
-	}
+	rcu_read_lock();
+	pr_info("  Verifying lookups...\n");
+	test_rht_lookup(ht);
+	rcu_read_unlock();
 
 	rcu_read_lock();
 	test_bucket_stats(ht, true);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 8/10] rhashtable: Add multiple rehash support
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (6 preceding siblings ...)
  2015-03-22  7:53 ` [v1 PATCH 7/10] rhashtable: Disable automatic shrinking Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  7:53 ` [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch adds the missing bits to allow multiple rehashes.  The
read-side as well as remove already handle this correctly.  So it's
only the rehasher and insertion that need modification to handle
this.

Note that this patch doesn't actually enable it so for now rehashing
is still only performed by the worker thread and a user thread if
an explicit shrinking is ordered.

This patch also disables the rhashtable_expand interface because
it is useless since the table is meant to expand automatically.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   24 ++++++----
 lib/rhashtable.c           |   99 ++++++++++++++++++++++++++++++---------------
 lib/test_rhashtable.c      |   23 +---------
 3 files changed, 85 insertions(+), 61 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 44aa579..cba22d8 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -294,7 +294,6 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *old_tbl);
 
-int rhashtable_expand(struct rhashtable *ht);
 int rhashtable_shrink(struct rhashtable *ht);
 
 int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter);
@@ -527,17 +526,22 @@ static inline int __rhashtable_insert_fast(
 	rcu_read_lock();
 
 	tbl = rht_dereference_rcu(ht->tbl, ht);
-	hash = rht_head_hashfn(ht, tbl, obj, params);
-	lock = rht_bucket_lock(tbl, hash);
-
-	spin_lock_bh(lock);
 
-	/* Because we have already taken the bucket lock in tbl,
-	 * if we find that future_tbl is not yet visible then
-	 * that guarantees all other insertions of the same entry
-	 * will also grab the bucket lock in tbl because until
-	 * the rehash completes ht->tbl won't be changed.
+	/* All insertions must grab the oldest table containing
+	 * the hashed bucket that is yet to be rehashed.
 	 */
+	for (;;) {
+		hash = rht_head_hashfn(ht, tbl, obj, params);
+		lock = rht_bucket_lock(tbl, hash);
+		spin_lock_bh(lock);
+
+		if (tbl->rehash <= hash)
+			break;
+
+		spin_unlock_bh(lock);
+		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+	}
+
 	new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 	if (unlikely(new_tbl)) {
 		err = rhashtable_insert_slow(ht, key, obj, new_tbl);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 08a6123..c284099 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -136,11 +136,24 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 	return tbl;
 }
 
+static struct bucket_table *rhashtable_last_table(struct rhashtable *ht,
+						  struct bucket_table *tbl)
+{
+	struct bucket_table *new_tbl;
+
+	do {
+		new_tbl = tbl;
+		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+	} while (tbl);
+
+	return new_tbl;
+}
+
 static int rhashtable_rehash_one(struct rhashtable *ht, unsigned old_hash)
 {
 	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
-	struct bucket_table *new_tbl =
-		rht_dereference(old_tbl->future_tbl, ht) ?: old_tbl;
+	struct bucket_table *new_tbl = rhashtable_last_table(ht,
+		rht_dereference_rcu(old_tbl->future_tbl, ht));
 	struct rhash_head __rcu **pprev = &old_tbl->buckets[old_hash];
 	int err = -ENOENT;
 	struct rhash_head *head, *next, *entry;
@@ -196,12 +209,18 @@ static void rhashtable_rehash_chain(struct rhashtable *ht, unsigned old_hash)
 	spin_unlock_bh(old_bucket_lock);
 }
 
-static void rhashtable_rehash(struct rhashtable *ht,
-			      struct bucket_table *new_tbl)
+static int rhashtable_rehash_attach(struct rhashtable *ht,
+				    struct bucket_table *old_tbl,
+				    struct bucket_table *new_tbl)
 {
-	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
-	struct rhashtable_walker *walker;
-	unsigned old_hash;
+	/* Protect future_tbl using the first bucket lock. */
+	spin_lock_bh(old_tbl->locks);
+
+	/* Did somebody beat us to it? */
+	if (rcu_access_pointer(old_tbl->future_tbl)) {
+		spin_unlock_bh(old_tbl->locks);
+		return -EEXIST;
+	}
 
 	/* Make insertions go into the new, empty table right away. Deletions
 	 * and lookups will be attempted in both tables until we synchronize.
@@ -211,6 +230,22 @@ static void rhashtable_rehash(struct rhashtable *ht,
 	/* Ensure the new table is visible to readers. */
 	smp_wmb();
 
+	spin_unlock_bh(old_tbl->locks);
+
+	return 0;
+}
+
+static int rhashtable_rehash_table(struct rhashtable *ht)
+{
+	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
+	struct bucket_table *new_tbl;
+	struct rhashtable_walker *walker;
+	unsigned old_hash;
+
+	new_tbl = rht_dereference(old_tbl->future_tbl, ht);
+	if (!new_tbl)
+		return 0;
+
 	for (old_hash = 0; old_hash < old_tbl->size; old_hash++)
 		rhashtable_rehash_chain(ht, old_hash);
 
@@ -225,37 +260,29 @@ static void rhashtable_rehash(struct rhashtable *ht,
 	 * remain.
 	 */
 	call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
+
+	return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
 }
 
-/**
- * rhashtable_expand - Expand hash table while allowing concurrent lookups
- * @ht:		the hash table to expand
- *
- * A secondary bucket array is allocated and the hash entries are migrated.
- *
- * This function may only be called in a context where it is safe to call
- * synchronize_rcu(), e.g. not within a rcu_read_lock() section.
- *
- * The caller must ensure that no concurrent resizing occurs by holding
- * ht->mutex.
- *
- * It is valid to have concurrent insertions and deletions protected by per
- * bucket locks or concurrent RCU protected lookups and traversals.
- */
-int rhashtable_expand(struct rhashtable *ht)
+static int rhashtable_expand(struct rhashtable *ht)
 {
 	struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht);
+	int err;
 
 	ASSERT_RHT_MUTEX(ht);
 
+	old_tbl = rhashtable_last_table(ht, old_tbl);
+
 	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
-	rhashtable_rehash(ht, new_tbl);
-	return 0;
+	err = rhashtable_rehash_attach(ht, old_tbl, new_tbl);
+	if (err)
+		bucket_table_free(new_tbl);
+
+	return err;
 }
-EXPORT_SYMBOL_GPL(rhashtable_expand);
 
 /**
  * rhashtable_shrink - Shrink hash table while allowing concurrent lookups
@@ -281,18 +308,19 @@ int rhashtable_shrink(struct rhashtable *ht)
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
-	err = -EEXIST;
-
 	mutex_lock(&ht->mutex);
 	tbl = rht_dereference(ht->tbl, ht);
-	if (rht_dereference(tbl->future_tbl, ht))
-		goto out_free_tbl;
+	tbl = rhashtable_last_table(ht, tbl);
 
 	err = 0;
 	if (tbl->size <= size)
 		goto out_free_tbl;
 
-	rhashtable_rehash(ht, new_tbl);
+	err = rhashtable_rehash_attach(ht, tbl, new_tbl);
+	if (err)
+		goto out_free_tbl;
+
+	schedule_work(&ht->run_work);
 
 	mutex_unlock(&ht->mutex);
 
@@ -310,6 +338,7 @@ static void rht_deferred_worker(struct work_struct *work)
 {
 	struct rhashtable *ht;
 	struct bucket_table *tbl;
+	int err = 0;
 
 	ht = container_of(work, struct rhashtable, run_work);
 	mutex_lock(&ht->mutex);
@@ -317,11 +346,18 @@ static void rht_deferred_worker(struct work_struct *work)
 		goto unlock;
 
 	tbl = rht_dereference(ht->tbl, ht);
+	tbl = rhashtable_last_table(ht, tbl);
 
 	if (rht_grow_above_75(ht, tbl))
 		rhashtable_expand(ht);
+
+	err = rhashtable_rehash_table(ht);
+
 unlock:
 	mutex_unlock(&ht->mutex);
+
+	if (err)
+		schedule_work(&ht->run_work);
 }
 
 int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
@@ -332,6 +368,7 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 	unsigned hash;
 	int err = -EEXIST;
 
+	tbl = rhashtable_last_table(ht, tbl);
 	hash = head_hashfn(ht, tbl, obj);
 	spin_lock_nested(rht_bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING);
 
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 0ceb332..ca78dc2 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -151,26 +151,6 @@ static int __init test_rhashtable(struct rhashtable *ht)
 	}
 
 	rcu_read_lock();
-	test_bucket_stats(ht, true);
-	test_rht_lookup(ht);
-	rcu_read_unlock();
-
-	for (i = 0; i < TEST_NEXPANDS; i++) {
-		pr_info("  Table expansion iteration %u...\n", i);
-		mutex_lock(&ht->mutex);
-		rhashtable_expand(ht);
-		mutex_unlock(&ht->mutex);
-
-		rcu_read_lock();
-		pr_info("  Verifying lookups...\n");
-		test_rht_lookup(ht);
-		rcu_read_unlock();
-	}
-
-	pr_info("  Table shrinkage...\n");
-	rhashtable_shrink(ht);
-
-	rcu_read_lock();
 	pr_info("  Verifying lookups...\n");
 	test_rht_lookup(ht);
 	rcu_read_unlock();
@@ -190,6 +170,9 @@ static int __init test_rhashtable(struct rhashtable *ht)
 		kfree(obj);
 	}
 
+	pr_info("  Table shrinkage...\n");
+	rhashtable_shrink(ht);
+
 	return 0;
 
 error:

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (7 preceding siblings ...)
  2015-03-22  7:53 ` [v1 PATCH 8/10] rhashtable: Add multiple rehash support Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  8:02   ` Herbert Xu
  2015-03-23 12:53   ` David Laight
  2015-03-22  7:53 ` [v1 PATCH 10/10] rhashtable: Add immediate rehash during insertion Herbert Xu
  2015-03-22  7:54 ` [v1 PATCH 5/10] tipc: Use default rhashtable hashfn Herbert Xu
  10 siblings, 2 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch adds the ability to allocate bucket table with GFP_ATOMIC
instead of GFP_KERNEL.  This is needed when we perform an immediate
rehash during insertion.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 lib/rhashtable.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index c284099..59078ed 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -58,7 +58,8 @@ EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
 #endif
 
 
-static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
+static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
+			      gfp_t gfp)
 {
 	unsigned int i, size;
 #if defined(CONFIG_PROVE_LOCKING)
@@ -75,12 +76,13 @@ static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
 
 	if (sizeof(spinlock_t) != 0) {
 #ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE)
+		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+		    gfp == GFP_KERNEL)
 			tbl->locks = vmalloc(size * sizeof(spinlock_t));
 		else
 #endif
 		tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
-					   GFP_KERNEL);
+					   gfp);
 		if (!tbl->locks)
 			return -ENOMEM;
 		for (i = 0; i < size; i++)
@@ -105,15 +107,17 @@ static void bucket_table_free_rcu(struct rcu_head *head)
 }
 
 static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
-					       size_t nbuckets)
+					       size_t nbuckets,
+					       gfp_t gfp)
 {
 	struct bucket_table *tbl = NULL;
 	size_t size;
 	int i;
 
 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		tbl = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
+	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) ||
+	    gfp != GFP_KERNEL)
+		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
 	if (tbl == NULL)
 		tbl = vzalloc(size);
 	if (tbl == NULL)
@@ -121,7 +125,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 
 	tbl->size = nbuckets;
 
-	if (alloc_bucket_locks(ht, tbl) < 0) {
+	if (alloc_bucket_locks(ht, tbl, gfp) < 0) {
 		bucket_table_free(tbl);
 		return NULL;
 	}
@@ -273,7 +277,7 @@ static int rhashtable_expand(struct rhashtable *ht)
 
 	old_tbl = rhashtable_last_table(ht, old_tbl);
 
-	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2);
+	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2, GFP_KERNEL);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
@@ -304,7 +308,7 @@ int rhashtable_shrink(struct rhashtable *ht)
 	if (size < ht->p.min_size)
 		size = ht->p.min_size;
 
-	new_tbl = bucket_table_alloc(ht, size);
+	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
@@ -678,7 +682,7 @@ int rhashtable_init(struct rhashtable *ht,
 		}
 	}
 
-	tbl = bucket_table_alloc(ht, size);
+	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
 	if (tbl == NULL)
 		return -ENOMEM;
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 10/10] rhashtable: Add immediate rehash during insertion
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (8 preceding siblings ...)
  2015-03-22  7:53 ` [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
@ 2015-03-22  7:53 ` Herbert Xu
  2015-03-22  7:54 ` [v1 PATCH 5/10] tipc: Use default rhashtable hashfn Herbert Xu
  10 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:53 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch reintroduces immediate rehash during insertion.  If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.

If this rehash fails then the insertion will fail.  Otherwise the
insertion will be reattempted in the new hash table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   32 +++++++++++++++-----
 lib/rhashtable.c           |   71 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index cba22d8..c19cf89 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -103,6 +103,7 @@ struct rhashtable;
  * @max_size: Maximum size while expanding
  * @min_size: Minimum size while shrinking
  * @nulls_base: Base value to generate nulls marker
+ * @insecure_elasticity: Set to true to disable chain length checks
  * @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
  * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
@@ -116,6 +117,7 @@ struct rhashtable_params {
 	unsigned int		max_size;
 	unsigned int		min_size;
 	u32			nulls_base;
+	bool			insecure_elasticity;
 	size_t			locks_mul;
 	rht_hashfn_t		hashfn;
 	rht_obj_hashfn_t	obj_hashfn;
@@ -127,6 +129,7 @@ struct rhashtable_params {
  * @tbl: Bucket table
  * @nelems: Number of elements in table
  * @key_len: Key length for hashfn
+ * @elasticity: Maximum chain length before rehash
  * @p: Configuration parameters
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
@@ -137,6 +140,7 @@ struct rhashtable {
 	atomic_t			nelems;
 	bool                            being_destroyed;
 	unsigned int			key_len;
+	unsigned int			elasticity;
 	struct rhashtable_params	p;
 	struct work_struct		run_work;
 	struct mutex                    mutex;
@@ -252,6 +256,17 @@ static inline bool rht_grow_above_75(const struct rhashtable *ht,
 	       (!ht->p.max_size || tbl->size < ht->p.max_size);
 }
 
+/**
+ * rht_grow_above_100 - returns true if nelems > table-size
+ * @ht:		hash table
+ * @tbl:	current table
+ */
+static inline bool rht_grow_above_100(const struct rhashtable *ht,
+				      const struct bucket_table *tbl)
+{
+	return atomic_read(&ht->nelems) > tbl->size;
+}
+
 /* The bucket lock is selected based on the hash and protects mutations
  * on a group of hash buckets.
  *
@@ -517,11 +532,12 @@ static inline int __rhashtable_insert_fast(
 		.ht = ht,
 		.key = key,
 	};
-	int err = -EEXIST;
 	struct bucket_table *tbl, *new_tbl;
 	struct rhash_head *head;
 	spinlock_t *lock;
+	unsigned elasticity;
 	unsigned hash;
+	int err;
 
 	rcu_read_lock();
 
@@ -543,22 +559,24 @@ static inline int __rhashtable_insert_fast(
 	}
 
 	new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
-	if (unlikely(new_tbl)) {
+	if (unlikely(new_tbl || rht_grow_above_100(ht, tbl))) {
+slow_path:
 		err = rhashtable_insert_slow(ht, key, obj, new_tbl);
 		goto out;
 	}
 
-	if (!key)
-		goto skip_lookup;
-
+	err = -EEXIST;
+	elasticity = ht->elasticity;
 	rht_for_each(head, tbl, hash) {
-		if (unlikely(!(params.obj_cmpfn ?
+		if (key &&
+		    unlikely(!(params.obj_cmpfn ?
 			       params.obj_cmpfn(&arg, rht_obj(ht, head)) :
 			       rhashtable_compare(&arg, rht_obj(ht, head)))))
 			goto out;
+		if (!--elasticity)
+			goto slow_path;
 	}
 
-skip_lookup:
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 59078ed..6494b2e 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -364,21 +364,80 @@ unlock:
 		schedule_work(&ht->run_work);
 }
 
+static bool rhashtable_check_elasticity(struct rhashtable *ht,
+					struct bucket_table *tbl,
+					unsigned hash)
+{
+	unsigned elasticity = ht->elasticity;
+	struct rhash_head *head;
+
+	rht_for_each(head, tbl, hash)
+		if (!--elasticity)
+			return true;
+
+	return false;
+}
+
+int rhashtable_expand_or_rehash(struct rhashtable *ht,
+				struct bucket_table *tbl)
+{
+	struct bucket_table *old_tbl;
+	struct bucket_table *new_tbl;
+	unsigned int size;
+	int err;
+
+	old_tbl = rht_dereference_rcu(ht->tbl, ht);
+	if (!tbl)
+		tbl = rhashtable_last_table(ht, old_tbl);
+
+	size = tbl->size;
+
+	if (rht_grow_above_75(ht, tbl))
+		size *= 2;
+	/* More than two rehashes (not resizes) detected. */
+	else if (WARN_ON(old_tbl != tbl && old_tbl->size == size))
+		return -EBUSY;
+
+	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
+	if (new_tbl == NULL)
+		return -ENOMEM;
+
+	err = rhashtable_rehash_attach(ht, tbl, new_tbl);
+	if (err) {
+		bucket_table_free(new_tbl);
+		if (err == -EEXIST)
+			err = 0;
+	} else
+		schedule_work(&ht->run_work);
+
+	return err;
+}
+
 int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *tbl)
 {
 	struct rhash_head *head;
 	unsigned hash;
-	int err = -EEXIST;
+	int err;
+
+	if (!tbl)
+		goto rehash;
 
+restart:
 	tbl = rhashtable_last_table(ht, tbl);
 	hash = head_hashfn(ht, tbl, obj);
 	spin_lock_nested(rht_bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING);
 
+	err = -EEXIST;
 	if (key && rhashtable_lookup_fast(ht, key, ht->p))
 		goto exit;
 
+	err = -EAGAIN;
+	if (rhashtable_check_elasticity(ht, tbl, hash) ||
+	    rht_grow_above_100(ht, tbl))
+		goto exit;
+
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
@@ -391,6 +450,13 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 
 exit:
 	spin_unlock(rht_bucket_lock(tbl, hash));
+ 
+	if (err == -EAGAIN) {
+rehash:
+		err = rhashtable_expand_or_rehash(ht, tbl);
+		if (!err)
+			goto restart;
+	}
 
 	return err;
 }
@@ -667,6 +733,9 @@ int rhashtable_init(struct rhashtable *ht,
 
 	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
 
+	if (!params->insecure_elasticity)
+		ht->elasticity = 16;
+
 	if (params->locks_mul)
 		ht->p.locks_mul = roundup_pow_of_two(params->locks_mul);
 	else

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v1 PATCH 5/10] tipc: Use default rhashtable hashfn
  2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (9 preceding siblings ...)
  2015-03-22  7:53 ` [v1 PATCH 10/10] rhashtable: Add immediate rehash during insertion Herbert Xu
@ 2015-03-22  7:54 ` Herbert Xu
  10 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:54 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable.  The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/tipc/socket.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 73c2f51..6dd5bd9 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -35,7 +35,6 @@
  */
 
 #include <linux/rhashtable.h>
-#include <linux/jhash.h>
 #include "core.h"
 #include "name_table.h"
 #include "node.h"
@@ -2294,7 +2293,6 @@ static const struct rhashtable_params tsk_rht_params = {
 	.head_offset = offsetof(struct tipc_sock, node),
 	.key_offset = offsetof(struct tipc_sock, portid),
 	.key_len = sizeof(u32), /* portid */
-	.hashfn = jhash,
 	.max_size = 1048576,
 	.min_size = 256,
 };

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [v1 PATCH 5/10]
  2015-03-22  7:53 ` [v1 PATCH 5/10] Herbert Xu
@ 2015-03-22  7:55   ` Herbert Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  7:55 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

On Sun, Mar 22, 2015 at 06:53:48PM +1100, Herbert Xu wrote:
> 
> This patch removes the explicit jhash value for the hashfn parameter
> of rhashtable.  The default is now jhash so removing the setting
> makes no difference apart from making one less copy of jhash in
> the kernel.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Oops, due to an extra blank line in the commit the Subject went
missing.  Resent with the correct subject.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation
  2015-03-22  7:53 ` [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
@ 2015-03-22  8:02   ` Herbert Xu
  2015-03-23 12:53   ` David Laight
  1 sibling, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2015-03-22  8:02 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

On Sun, Mar 22, 2015 at 06:53:52PM +1100, Herbert Xu wrote:
.
>  	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
> -	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> -		tbl = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
> +	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) ||
> +	    gfp != GFP_KERNEL)
> +		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
>  	if (tbl == NULL)
>  		tbl = vzalloc(size);

Oops, this will still attempt vzalloc.  I will send a v2.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation
  2015-03-22  7:53 ` [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
  2015-03-22  8:02   ` Herbert Xu
@ 2015-03-23 12:53   ` David Laight
  2015-03-24  3:09     ` Herbert Xu
  1 sibling, 1 reply; 17+ messages in thread
From: David Laight @ 2015-03-23 12:53 UTC (permalink / raw)
  To: 'Herbert Xu',
	David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

From: Herbert Xu
> This patch adds the ability to allocate bucket table with GFP_ATOMIC
> instead of GFP_KERNEL.  This is needed when we perform an immediate
> rehash during insertion.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> ---
> 
>  lib/rhashtable.c |   24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index c284099..59078ed 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -58,7 +58,8 @@ EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
>  #endif
> 
> 
> -static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
> +static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
> +			      gfp_t gfp)
>  {
>  	unsigned int i, size;
>  #if defined(CONFIG_PROVE_LOCKING)
> @@ -75,12 +76,13 @@ static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
> 
>  	if (sizeof(spinlock_t) != 0) {
>  #ifdef CONFIG_NUMA
> -		if (size * sizeof(spinlock_t) > PAGE_SIZE)
> +		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
> +		    gfp == GFP_KERNEL)
>  			tbl->locks = vmalloc(size * sizeof(spinlock_t));
>  		else
>  #endif
>  		tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
> -					   GFP_KERNEL);
> +					   gfp);
>  		if (!tbl->locks)
>  			return -ENOMEM;
>  		for (i = 0; i < size; i++)
...

If the lock array can't be allocated, then it is probably best to use
a lock array that is half the size rather than failing the expand.

I'm not sure your current version would allow the old lock array be used.

Does linux have any architectures where someone has decided to make a
spinlock consume an entire cache line rather than just a single word?
If so then the lock array can be gigantic.

Given the lock is only used for insert and delete, I'm also not at
all clear why you allocate 128 locks per cpu for very large tables.
With the locks in their own array I don't think there can be 'false
sharing', the worst than can happen is two cpus spinning on locks
in the same cache line.

It isn't as though the locks are likely to be held for any length
of time either.

	David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation
  2015-03-23 12:53   ` David Laight
@ 2015-03-24  3:09     ` Herbert Xu
  2015-03-24  4:24       ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2015-03-24  3:09 UTC (permalink / raw)
  To: David Laight
  Cc: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 12:53:11PM +0000, David Laight wrote:
>
> Given the lock is only used for insert and delete, I'm also not at
> all clear why you allocate 128 locks per cpu for very large tables.
> With the locks in their own array I don't think there can be 'false
> sharing', the worst than can happen is two cpus spinning on locks
> in the same cache line.

Personally I'm totally against Bucket locks.  If you have a
scalability problem you really need to solve them at a higher
level, e.g., multiqueue transmission in networking.  Bucket
locks are simply kicking the can down the road, it'll come back
to bite you sooner or later in terms of scalability.

So no I'm not going to waste my time fixing up something that
in my opinion shouldn't even exist :)

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation
  2015-03-24  3:09     ` Herbert Xu
@ 2015-03-24  4:24       ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2015-03-24  4:24 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Laight, David S. Miller, Thomas Graf, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

On Tue, 2015-03-24 at 14:09 +1100, Herbert Xu wrote:
> On Mon, Mar 23, 2015 at 12:53:11PM +0000, David Laight wrote:
> >
> > Given the lock is only used for insert and delete, I'm also not at
> > all clear why you allocate 128 locks per cpu for very large tables.
> > With the locks in their own array I don't think there can be 'false
> > sharing', the worst than can happen is two cpus spinning on locks
> > in the same cache line.
> 
> Personally I'm totally against Bucket locks.  If you have a
> scalability problem you really need to solve them at a higher
> level, e.g., multiqueue transmission in networking.  Bucket
> locks are simply kicking the can down the road, it'll come back
> to bite you sooner or later in terms of scalability.


Well, keep in mind a lock can be very big with LOCKDEP.

One lock per bucket is totally overkill.

A hash array of locks is a good compromise.

128 locks per cpu is also a good compromise, because one cache line can
hold 16 locks.

Number of locks has nothing to do with number of buckets,
unless you have unlimited memory and can afford one lock per bucket,
and don't care of memory thrashing when dumping whole hash table.

I thought this kind of solution was well understood among network
developers.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-03-24  4:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-22  7:53 [v1 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
2015-03-22  7:53 ` [v1 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
2015-03-22  7:53 ` [v1 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
2015-03-22  7:53 ` [v1 PATCH 3/10] rhashtable: Allow hashfn to be unset Herbert Xu
2015-03-22  7:53 ` [v1 PATCH 4/10] netlink: Use default rhashtable hashfn Herbert Xu
2015-03-22  7:53 ` [v1 PATCH 5/10] Herbert Xu
2015-03-22  7:55   ` Herbert Xu
2015-03-22  7:53 ` [v1 PATCH 6/10] netfilter: Use default rhashtable hashfn Herbert Xu
2015-03-22  7:53 ` [v1 PATCH 7/10] rhashtable: Disable automatic shrinking Herbert Xu
2015-03-22  7:53 ` [v1 PATCH 8/10] rhashtable: Add multiple rehash support Herbert Xu
2015-03-22  7:53 ` [v1 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
2015-03-22  8:02   ` Herbert Xu
2015-03-23 12:53   ` David Laight
2015-03-24  3:09     ` Herbert Xu
2015-03-24  4:24       ` Eric Dumazet
2015-03-22  7:53 ` [v1 PATCH 10/10] rhashtable: Add immediate rehash during insertion Herbert Xu
2015-03-22  7:54 ` [v1 PATCH 5/10] tipc: Use default rhashtable hashfn Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.