All of lore.kernel.org
 help / color / mirror / Atom feed
* [v2 PATCH 0/10] rhashtable: Multiple rehashing
@ 2015-03-22  8:03 Herbert Xu
  2015-03-22  8:03 ` [v2 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
                   ` (9 more replies)
  0 siblings, 10 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:03 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

Hi:                                           

This series introduces multiple rehashing.

Recall that the original implementation in br_multicast used
two list pointers per hash node and therefore is limited to at
most one rehash at a time since you need one list pointer for
the old table and one for the new table.

Thanks to Josh Triplett's suggestion of using a single list pointer
we're no longer limited by that.  So it is perfectly OK to have
an arbitrary number of tables in existence at any one time.

The reader and removal simply has to walk from the oldest table
to the newest table in order not to miss anything.  Insertion
without lookup are just as easy as we simply go to the last table
that we can find and add the entry there.

However, insertion with uniqueness lookup is more complicated
because we need to ensure that two simultaneous insertions of the
same key do not both succeed.  To achieve this, all insertions
including those without lookups are required to obtain the bucket
lock from the oldest hash table that is still alive.  This is
determined by having the rehasher (there is only one rehashing
thread in the system) keep a pointer of where it is up to.  If
a bucket has already been rehashed then it is dead, i.e., there
cannot be any more insertions to it, otherwise it is considered
alive.  This guarantees that the same key cannot be inserted
in two different tables in parallel.

Patch 1 is actually a bug fix for the walker.

Patch 2-6 eliminates unnecessary out-of-line copies of jhash.

Patch 7 disables automatic shrinking so now shrinking is only
possible if requested by the user.

Patch 8 introduces multiple rehashing.  This means that if we
decide to grow then we will grow regardless of whether the previous
one has finished.  However, this is still asynchronous meaning
that if insertions come fast enough we may still end up with a
table that is overutilised.

Patch 9 adds support for GFP_ATOMIC allocations of struct bucket_table.

Finally patch 10 enables immediate rehashing.  This is done either
when the table reaches 100% utilisation, or when the chain length
exceeds 16 (the latter can be disabled on request, e.g., for
nft_hash.

With these patches the system should no longer have any trouble
dealing with fast insertions on a small table.  In the worst
case you end up with a list of tables that's log N in length
while the rehasher catches up.

v2 fixes the blank subject of patch 5 and the prevents vzalloc
for GFP_ATOMIC callers in patch 9.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [v2 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
@ 2015-03-22  8:03 ` Herbert Xu
  2015-03-22 10:47   ` Thomas Graf
  2015-03-22  8:04 ` [v2 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:03 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

The walker is a lockless reader so it too needs an smp_rmb before
reading the future_tbl field in order to see any new tables that
may contain elements that we should have walked over.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 lib/rhashtable.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 83cfedd..618a3f0 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -477,6 +477,9 @@ next:
 		iter->skip = 0;
 	}
 
+	/* Ensure we see any new tables. */
+	smp_rmb();
+
 	iter->walker->tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 	if (iter->walker->tbl) {
 		iter->slot = 0;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [v2 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
  2015-03-22  8:03 ` [v2 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
@ 2015-03-22  8:04 ` Herbert Xu
  2015-03-22 11:07   ` Thomas Graf
  2015-03-22  8:04 ` [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset Herbert Xu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:04 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

When rht_key_hashfn is called from rhashtable itself and params
is equal to ht->p, there is no point in checking params.key_len
and falling back to ht->p.key_len.

For some reason gcc couldn't figure out that params is the same
as ht->p.  So let's help it by only checking params.key_len when
it's a constant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 18a6488..287b609 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -199,8 +199,11 @@ static inline unsigned int rht_key_hashfn(
 	struct rhashtable *ht, const struct bucket_table *tbl,
 	const void *key, const struct rhashtable_params params)
 {
-	return rht_bucket_index(tbl, params.hashfn(key, params.key_len ?:
-							ht->p.key_len,
+	unsigned key_len = __builtin_constant_p(params.key_len) ?
+			   (params.key_len ?: ht->p.key_len) :
+			   params.key_len;
+
+	return rht_bucket_index(tbl, params.hashfn(key, key_len,
 						   tbl->hash_rnd));
 }
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
  2015-03-22  8:03 ` [v2 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
  2015-03-22  8:04 ` [v2 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
@ 2015-03-22  8:04 ` Herbert Xu
  2015-03-22 11:55   ` Thomas Graf
  2015-03-23 14:29   ` David Laight
  2015-03-22  8:04 ` [v2 PATCH 4/10] netlink: Use default rhashtable hashfn Herbert Xu
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:04 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.

This function provides a solution to this problem by allowing
hashfn to be unset.  In which case rhashtable will automatically
set it to jhash.  Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   33 +++++++++++++++++++++++++++------
 lib/rhashtable.c           |   17 ++++++++++++++++-
 2 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 287b609..2a98b95 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -19,6 +19,7 @@
 
 #include <linux/compiler.h>
 #include <linux/errno.h>
+#include <linux/jhash.h>
 #include <linux/list_nulls.h>
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
@@ -103,7 +104,7 @@ struct rhashtable;
  * @min_size: Minimum size while shrinking
  * @nulls_base: Base value to generate nulls marker
  * @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
- * @hashfn: Function to hash key
+ * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
  * @obj_cmpfn: Function to compare key with object
  */
@@ -125,6 +126,7 @@ struct rhashtable_params {
  * struct rhashtable - Hash table handle
  * @tbl: Bucket table
  * @nelems: Number of elements in table
+ * @key_len: Key length for hashfn
  * @p: Configuration parameters
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
@@ -134,6 +136,7 @@ struct rhashtable {
 	struct bucket_table __rcu	*tbl;
 	atomic_t			nelems;
 	bool                            being_destroyed;
+	unsigned int			key_len;
 	struct rhashtable_params	p;
 	struct work_struct		run_work;
 	struct mutex                    mutex;
@@ -199,12 +202,30 @@ static inline unsigned int rht_key_hashfn(
 	struct rhashtable *ht, const struct bucket_table *tbl,
 	const void *key, const struct rhashtable_params params)
 {
-	unsigned key_len = __builtin_constant_p(params.key_len) ?
-			   (params.key_len ?: ht->p.key_len) :
-			   params.key_len;
+	unsigned hash;
+
+	if (!__builtin_constant_p(params.key_len))
+		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);
+	else if (params.key_len) {
+		unsigned key_len = params.key_len;
+
+		if (params.hashfn)
+			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+		else if (key_len & (sizeof(u32) - 1))
+			hash = jhash(key, key_len, tbl->hash_rnd);
+		else
+			hash = jhash2(key, key_len / sizeof(u32),
+				      tbl->hash_rnd);
+	} else {
+		unsigned key_len = ht->p.key_len;
+
+		if (params.hashfn)
+			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+		else
+			hash = jhash(key, key_len, tbl->hash_rnd);
+	}
 
-	return rht_bucket_index(tbl, params.hashfn(key, key_len,
-						   tbl->hash_rnd));
+	return rht_bucket_index(tbl, hash);
 }
 
 static inline unsigned int rht_head_hashfn(
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 618a3f0..798f01d 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -532,6 +532,11 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)
 		   (unsigned long)params->min_size);
 }
 
+static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
+{
+	return jhash2(key, length, seed);
+}
+
 /**
  * rhashtable_init - initialize a new hash table
  * @ht:		hash table to be initialized
@@ -583,7 +588,7 @@ int rhashtable_init(struct rhashtable *ht,
 
 	size = HASH_DEFAULT_SIZE;
 
-	if ((!(params->key_len && params->hashfn) && !params->obj_hashfn) ||
+	if ((!params->key_len && !params->obj_hashfn) ||
 	    (params->obj_hashfn && !params->obj_cmpfn))
 		return -EINVAL;
 
@@ -610,6 +615,16 @@ int rhashtable_init(struct rhashtable *ht,
 	else
 		ht->p.locks_mul = BUCKET_LOCKS_PER_CPU;
 
+	ht->key_len = ht->p.key_len;
+	if (!params->hashfn) {
+		ht->p.hashfn = jhash;
+
+		if (!(ht->key_len & (sizeof(u32) - 1))) {
+			ht->key_len /= sizeof(u32);
+			ht->p.hashfn = rhashtable_jhash2;
+		}
+	}
+
 	tbl = bucket_table_alloc(ht, size);
 	if (tbl == NULL)
 		return -ENOMEM;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [v2 PATCH 4/10] netlink: Use default rhashtable hashfn
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (2 preceding siblings ...)
  2015-03-22  8:04 ` [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset Herbert Xu
@ 2015-03-22  8:04 ` Herbert Xu
  2015-03-22 11:55   ` Thomas Graf
  2015-03-23  1:18   ` Simon Horman
  2015-03-22  8:04 ` [v2 PATCH 5/10] tipc: " Herbert Xu
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:04 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable.  The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/netlink/af_netlink.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6517921..e2f7f28 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -3133,13 +3133,12 @@ static inline u32 netlink_hash(const void *data, u32 seed)
 	struct netlink_compare_arg arg;
 
 	netlink_compare_arg_init(&arg, sock_net(&nlk->sk), nlk->portid);
-	return jhash(&arg, netlink_compare_arg_len, seed);
+	return jhash2((u32 *)&arg, netlink_compare_arg_len / sizeof(u32), seed);
 }
 
 static const struct rhashtable_params netlink_rhashtable_params = {
 	.head_offset = offsetof(struct netlink_sock, node),
 	.key_len = netlink_compare_arg_len,
-	.hashfn = jhash,
 	.obj_hashfn = netlink_hash,
 	.obj_cmpfn = netlink_compare,
 	.max_size = 65536,

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [v2 PATCH 5/10] tipc: Use default rhashtable hashfn
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (3 preceding siblings ...)
  2015-03-22  8:04 ` [v2 PATCH 4/10] netlink: Use default rhashtable hashfn Herbert Xu
@ 2015-03-22  8:04 ` Herbert Xu
  2015-03-22 11:56   ` Thomas Graf
  2015-03-22  8:04 ` [v2 PATCH 6/10] netfilter: " Herbert Xu
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:04 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable.  The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/tipc/socket.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 73c2f51..6dd5bd9 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -35,7 +35,6 @@
  */
 
 #include <linux/rhashtable.h>
-#include <linux/jhash.h>
 #include "core.h"
 #include "name_table.h"
 #include "node.h"
@@ -2294,7 +2293,6 @@ static const struct rhashtable_params tsk_rht_params = {
 	.head_offset = offsetof(struct tipc_sock, node),
 	.key_offset = offsetof(struct tipc_sock, portid),
 	.key_len = sizeof(u32), /* portid */
-	.hashfn = jhash,
 	.max_size = 1048576,
 	.min_size = 256,
 };

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [v2 PATCH 6/10] netfilter: Use default rhashtable hashfn
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (4 preceding siblings ...)
  2015-03-22  8:04 ` [v2 PATCH 5/10] tipc: " Herbert Xu
@ 2015-03-22  8:04 ` Herbert Xu
  2015-03-22 11:56   ` Thomas Graf
  2015-03-22  8:04 ` [v2 PATCH 7/10] rhashtable: Disable automatic shrinking Herbert Xu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:04 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable.  The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/netfilter/nft_hash.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 4585c57..3bda4e7 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -13,7 +13,6 @@
 #include <linux/module.h>
 #include <linux/list.h>
 #include <linux/log2.h>
-#include <linux/jhash.h>
 #include <linux/netlink.h>
 #include <linux/rhashtable.h>
 #include <linux/netfilter.h>
@@ -169,7 +168,6 @@ static unsigned int nft_hash_privsize(const struct nlattr * const nla[])
 static const struct rhashtable_params nft_hash_params = {
 	.head_offset = offsetof(struct nft_hash_elem, node),
 	.key_offset = offsetof(struct nft_hash_elem, key),
-	.hashfn = jhash,
 };
 
 static int nft_hash_init(const struct nft_set *set,

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (5 preceding siblings ...)
  2015-03-22  8:04 ` [v2 PATCH 6/10] netfilter: " Herbert Xu
@ 2015-03-22  8:04 ` Herbert Xu
  2015-03-22 12:17   ` Thomas Graf
  2015-03-22  8:04 ` [v2 PATCH 8/10] rhashtable: Add multiple rehash support Herbert Xu
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:04 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

Automatic shrinking is dangerous because it provides an easy
way for an adversary to cause us to do unnecessary work.  Thus
making the resizable hashtable a poor data structure.

This patch disables automatic shrinking but retains a manual
shrink function for those cases where insertions and removals
are overseen by a trusted entity, e.g., nft_hash.

The shrink function will now also shrink to fit rather than halve
the size of the table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   15 ---------------
 lib/rhashtable.c           |   44 ++++++++++++++++++++++++++++++--------------
 lib/test_rhashtable.c      |   16 ++++++----------
 3 files changed, 36 insertions(+), 39 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 2a98b95..44aa579 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -252,19 +252,6 @@ static inline bool rht_grow_above_75(const struct rhashtable *ht,
 	       (!ht->p.max_size || tbl->size < ht->p.max_size);
 }
 
-/**
- * rht_shrink_below_30 - returns true if nelems < 0.3 * table-size
- * @ht:		hash table
- * @tbl:	current table
- */
-static inline bool rht_shrink_below_30(const struct rhashtable *ht,
-				       const struct bucket_table *tbl)
-{
-	/* Shrink table beneath 30% load */
-	return atomic_read(&ht->nelems) < (tbl->size * 3 / 10) &&
-	       tbl->size > ht->p.min_size;
-}
-
 /* The bucket lock is selected based on the hash and protects mutations
  * on a group of hash buckets.
  *
@@ -745,8 +732,6 @@ static inline int rhashtable_remove_fast(
 		goto out;
 
 	atomic_dec(&ht->nelems);
-	if (rht_shrink_below_30(ht, tbl))
-		schedule_work(&ht->run_work);
 
 out:
 	rcu_read_unlock();
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 798f01d..08a6123 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -261,30 +261,48 @@ EXPORT_SYMBOL_GPL(rhashtable_expand);
  * rhashtable_shrink - Shrink hash table while allowing concurrent lookups
  * @ht:		the hash table to shrink
  *
- * This function may only be called in a context where it is safe to call
- * synchronize_rcu(), e.g. not within a rcu_read_lock() section.
- *
- * The caller must ensure that no concurrent resizing occurs by holding
- * ht->mutex.
- *
- * The caller must ensure that no concurrent table mutations take place.
- * It is however valid to have concurrent lookups if they are RCU protected.
+ * This function shrinks the hash table to fit, i.e., the smallest
+ * size would not cause it to expand right away automatically.
  *
  * It is valid to have concurrent insertions and deletions protected by per
  * bucket locks or concurrent RCU protected lookups and traversals.
  */
 int rhashtable_shrink(struct rhashtable *ht)
 {
-	struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht);
+	unsigned size = roundup_pow_of_two(atomic_read(&ht->nelems) * 4 / 3);
+	struct bucket_table *new_tbl;
+	struct bucket_table *tbl;
+	int err;
 
-	ASSERT_RHT_MUTEX(ht);
+	if (size < ht->p.min_size)
+		size = ht->p.min_size;
 
-	new_tbl = bucket_table_alloc(ht, old_tbl->size / 2);
+	new_tbl = bucket_table_alloc(ht, size);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
+	err = -EEXIST;
+
+	mutex_lock(&ht->mutex);
+	tbl = rht_dereference(ht->tbl, ht);
+	if (rht_dereference(tbl->future_tbl, ht))
+		goto out_free_tbl;
+
+	err = 0;
+	if (tbl->size <= size)
+		goto out_free_tbl;
+
 	rhashtable_rehash(ht, new_tbl);
-	return 0;
+
+	mutex_unlock(&ht->mutex);
+
+out:
+	return err;
+
+out_free_tbl:
+	mutex_unlock(&ht->mutex);
+	bucket_table_free(new_tbl);
+	goto out;
 }
 EXPORT_SYMBOL_GPL(rhashtable_shrink);
 
@@ -302,8 +320,6 @@ static void rht_deferred_worker(struct work_struct *work)
 
 	if (rht_grow_above_75(ht, tbl))
 		rhashtable_expand(ht);
-	else if (rht_shrink_below_30(ht, tbl))
-		rhashtable_shrink(ht);
 unlock:
 	mutex_unlock(&ht->mutex);
 }
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index a2ba6ad..0ceb332 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -167,17 +167,13 @@ static int __init test_rhashtable(struct rhashtable *ht)
 		rcu_read_unlock();
 	}
 
-	for (i = 0; i < TEST_NEXPANDS; i++) {
-		pr_info("  Table shrinkage iteration %u...\n", i);
-		mutex_lock(&ht->mutex);
-		rhashtable_shrink(ht);
-		mutex_unlock(&ht->mutex);
+	pr_info("  Table shrinkage...\n");
+	rhashtable_shrink(ht);
 
-		rcu_read_lock();
-		pr_info("  Verifying lookups...\n");
-		test_rht_lookup(ht);
-		rcu_read_unlock();
-	}
+	rcu_read_lock();
+	pr_info("  Verifying lookups...\n");
+	test_rht_lookup(ht);
+	rcu_read_unlock();
 
 	rcu_read_lock();
 	test_bucket_stats(ht, true);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [v2 PATCH 8/10] rhashtable: Add multiple rehash support
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (6 preceding siblings ...)
  2015-03-22  8:04 ` [v2 PATCH 7/10] rhashtable: Disable automatic shrinking Herbert Xu
@ 2015-03-22  8:04 ` Herbert Xu
  2015-03-22  8:04 ` [v2 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
  2015-03-22  8:04 ` [v2 PATCH 10/10] rhashtable: Add immediate rehash during insertion Herbert Xu
  9 siblings, 0 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:04 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch adds the missing bits to allow multiple rehashes.  The
read-side as well as remove already handle this correctly.  So it's
only the rehasher and insertion that need modification to handle
this.

Note that this patch doesn't actually enable it so for now rehashing
is still only performed by the worker thread and a user thread if
an explicit shrinking is ordered.

This patch also disables the rhashtable_expand interface because
it is useless since the table is meant to expand automatically.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   24 ++++++----
 lib/rhashtable.c           |   99 ++++++++++++++++++++++++++++++---------------
 lib/test_rhashtable.c      |   23 +---------
 3 files changed, 85 insertions(+), 61 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 44aa579..cba22d8 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -294,7 +294,6 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *old_tbl);
 
-int rhashtable_expand(struct rhashtable *ht);
 int rhashtable_shrink(struct rhashtable *ht);
 
 int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter);
@@ -527,17 +526,22 @@ static inline int __rhashtable_insert_fast(
 	rcu_read_lock();
 
 	tbl = rht_dereference_rcu(ht->tbl, ht);
-	hash = rht_head_hashfn(ht, tbl, obj, params);
-	lock = rht_bucket_lock(tbl, hash);
-
-	spin_lock_bh(lock);
 
-	/* Because we have already taken the bucket lock in tbl,
-	 * if we find that future_tbl is not yet visible then
-	 * that guarantees all other insertions of the same entry
-	 * will also grab the bucket lock in tbl because until
-	 * the rehash completes ht->tbl won't be changed.
+	/* All insertions must grab the oldest table containing
+	 * the hashed bucket that is yet to be rehashed.
 	 */
+	for (;;) {
+		hash = rht_head_hashfn(ht, tbl, obj, params);
+		lock = rht_bucket_lock(tbl, hash);
+		spin_lock_bh(lock);
+
+		if (tbl->rehash <= hash)
+			break;
+
+		spin_unlock_bh(lock);
+		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+	}
+
 	new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 	if (unlikely(new_tbl)) {
 		err = rhashtable_insert_slow(ht, key, obj, new_tbl);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 08a6123..c284099 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -136,11 +136,24 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 	return tbl;
 }
 
+static struct bucket_table *rhashtable_last_table(struct rhashtable *ht,
+						  struct bucket_table *tbl)
+{
+	struct bucket_table *new_tbl;
+
+	do {
+		new_tbl = tbl;
+		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+	} while (tbl);
+
+	return new_tbl;
+}
+
 static int rhashtable_rehash_one(struct rhashtable *ht, unsigned old_hash)
 {
 	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
-	struct bucket_table *new_tbl =
-		rht_dereference(old_tbl->future_tbl, ht) ?: old_tbl;
+	struct bucket_table *new_tbl = rhashtable_last_table(ht,
+		rht_dereference_rcu(old_tbl->future_tbl, ht));
 	struct rhash_head __rcu **pprev = &old_tbl->buckets[old_hash];
 	int err = -ENOENT;
 	struct rhash_head *head, *next, *entry;
@@ -196,12 +209,18 @@ static void rhashtable_rehash_chain(struct rhashtable *ht, unsigned old_hash)
 	spin_unlock_bh(old_bucket_lock);
 }
 
-static void rhashtable_rehash(struct rhashtable *ht,
-			      struct bucket_table *new_tbl)
+static int rhashtable_rehash_attach(struct rhashtable *ht,
+				    struct bucket_table *old_tbl,
+				    struct bucket_table *new_tbl)
 {
-	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
-	struct rhashtable_walker *walker;
-	unsigned old_hash;
+	/* Protect future_tbl using the first bucket lock. */
+	spin_lock_bh(old_tbl->locks);
+
+	/* Did somebody beat us to it? */
+	if (rcu_access_pointer(old_tbl->future_tbl)) {
+		spin_unlock_bh(old_tbl->locks);
+		return -EEXIST;
+	}
 
 	/* Make insertions go into the new, empty table right away. Deletions
 	 * and lookups will be attempted in both tables until we synchronize.
@@ -211,6 +230,22 @@ static void rhashtable_rehash(struct rhashtable *ht,
 	/* Ensure the new table is visible to readers. */
 	smp_wmb();
 
+	spin_unlock_bh(old_tbl->locks);
+
+	return 0;
+}
+
+static int rhashtable_rehash_table(struct rhashtable *ht)
+{
+	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
+	struct bucket_table *new_tbl;
+	struct rhashtable_walker *walker;
+	unsigned old_hash;
+
+	new_tbl = rht_dereference(old_tbl->future_tbl, ht);
+	if (!new_tbl)
+		return 0;
+
 	for (old_hash = 0; old_hash < old_tbl->size; old_hash++)
 		rhashtable_rehash_chain(ht, old_hash);
 
@@ -225,37 +260,29 @@ static void rhashtable_rehash(struct rhashtable *ht,
 	 * remain.
 	 */
 	call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
+
+	return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
 }
 
-/**
- * rhashtable_expand - Expand hash table while allowing concurrent lookups
- * @ht:		the hash table to expand
- *
- * A secondary bucket array is allocated and the hash entries are migrated.
- *
- * This function may only be called in a context where it is safe to call
- * synchronize_rcu(), e.g. not within a rcu_read_lock() section.
- *
- * The caller must ensure that no concurrent resizing occurs by holding
- * ht->mutex.
- *
- * It is valid to have concurrent insertions and deletions protected by per
- * bucket locks or concurrent RCU protected lookups and traversals.
- */
-int rhashtable_expand(struct rhashtable *ht)
+static int rhashtable_expand(struct rhashtable *ht)
 {
 	struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht);
+	int err;
 
 	ASSERT_RHT_MUTEX(ht);
 
+	old_tbl = rhashtable_last_table(ht, old_tbl);
+
 	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
-	rhashtable_rehash(ht, new_tbl);
-	return 0;
+	err = rhashtable_rehash_attach(ht, old_tbl, new_tbl);
+	if (err)
+		bucket_table_free(new_tbl);
+
+	return err;
 }
-EXPORT_SYMBOL_GPL(rhashtable_expand);
 
 /**
  * rhashtable_shrink - Shrink hash table while allowing concurrent lookups
@@ -281,18 +308,19 @@ int rhashtable_shrink(struct rhashtable *ht)
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
-	err = -EEXIST;
-
 	mutex_lock(&ht->mutex);
 	tbl = rht_dereference(ht->tbl, ht);
-	if (rht_dereference(tbl->future_tbl, ht))
-		goto out_free_tbl;
+	tbl = rhashtable_last_table(ht, tbl);
 
 	err = 0;
 	if (tbl->size <= size)
 		goto out_free_tbl;
 
-	rhashtable_rehash(ht, new_tbl);
+	err = rhashtable_rehash_attach(ht, tbl, new_tbl);
+	if (err)
+		goto out_free_tbl;
+
+	schedule_work(&ht->run_work);
 
 	mutex_unlock(&ht->mutex);
 
@@ -310,6 +338,7 @@ static void rht_deferred_worker(struct work_struct *work)
 {
 	struct rhashtable *ht;
 	struct bucket_table *tbl;
+	int err = 0;
 
 	ht = container_of(work, struct rhashtable, run_work);
 	mutex_lock(&ht->mutex);
@@ -317,11 +346,18 @@ static void rht_deferred_worker(struct work_struct *work)
 		goto unlock;
 
 	tbl = rht_dereference(ht->tbl, ht);
+	tbl = rhashtable_last_table(ht, tbl);
 
 	if (rht_grow_above_75(ht, tbl))
 		rhashtable_expand(ht);
+
+	err = rhashtable_rehash_table(ht);
+
 unlock:
 	mutex_unlock(&ht->mutex);
+
+	if (err)
+		schedule_work(&ht->run_work);
 }
 
 int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
@@ -332,6 +368,7 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 	unsigned hash;
 	int err = -EEXIST;
 
+	tbl = rhashtable_last_table(ht, tbl);
 	hash = head_hashfn(ht, tbl, obj);
 	spin_lock_nested(rht_bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING);
 
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 0ceb332..ca78dc2 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -151,26 +151,6 @@ static int __init test_rhashtable(struct rhashtable *ht)
 	}
 
 	rcu_read_lock();
-	test_bucket_stats(ht, true);
-	test_rht_lookup(ht);
-	rcu_read_unlock();
-
-	for (i = 0; i < TEST_NEXPANDS; i++) {
-		pr_info("  Table expansion iteration %u...\n", i);
-		mutex_lock(&ht->mutex);
-		rhashtable_expand(ht);
-		mutex_unlock(&ht->mutex);
-
-		rcu_read_lock();
-		pr_info("  Verifying lookups...\n");
-		test_rht_lookup(ht);
-		rcu_read_unlock();
-	}
-
-	pr_info("  Table shrinkage...\n");
-	rhashtable_shrink(ht);
-
-	rcu_read_lock();
 	pr_info("  Verifying lookups...\n");
 	test_rht_lookup(ht);
 	rcu_read_unlock();
@@ -190,6 +170,9 @@ static int __init test_rhashtable(struct rhashtable *ht)
 		kfree(obj);
 	}
 
+	pr_info("  Table shrinkage...\n");
+	rhashtable_shrink(ht);
+
 	return 0;
 
 error:

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [v2 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (7 preceding siblings ...)
  2015-03-22  8:04 ` [v2 PATCH 8/10] rhashtable: Add multiple rehash support Herbert Xu
@ 2015-03-22  8:04 ` Herbert Xu
  2015-03-22  8:04 ` [v2 PATCH 10/10] rhashtable: Add immediate rehash during insertion Herbert Xu
  9 siblings, 0 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:04 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch adds the ability to allocate bucket table with GFP_ATOMIC
instead of GFP_KERNEL.  This is needed when we perform an immediate
rehash during insertion.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 lib/rhashtable.c |   26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index c284099..2ed3cc1 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -58,7 +58,8 @@ EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
 #endif
 
 
-static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
+static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
+			      gfp_t gfp)
 {
 	unsigned int i, size;
 #if defined(CONFIG_PROVE_LOCKING)
@@ -75,12 +76,13 @@ static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
 
 	if (sizeof(spinlock_t) != 0) {
 #ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE)
+		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+		    gfp == GFP_KERNEL)
 			tbl->locks = vmalloc(size * sizeof(spinlock_t));
 		else
 #endif
 		tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
-					   GFP_KERNEL);
+					   gfp);
 		if (!tbl->locks)
 			return -ENOMEM;
 		for (i = 0; i < size; i++)
@@ -105,23 +107,25 @@ static void bucket_table_free_rcu(struct rcu_head *head)
 }
 
 static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
-					       size_t nbuckets)
+					       size_t nbuckets,
+					       gfp_t gfp)
 {
 	struct bucket_table *tbl = NULL;
 	size_t size;
 	int i;
 
 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		tbl = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (tbl == NULL)
+	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) ||
+	    gfp != GFP_KERNEL)
+		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
+	if (tbl == NULL && gfp == GFP_KERNEL)
 		tbl = vzalloc(size);
 	if (tbl == NULL)
 		return NULL;
 
 	tbl->size = nbuckets;
 
-	if (alloc_bucket_locks(ht, tbl) < 0) {
+	if (alloc_bucket_locks(ht, tbl, gfp) < 0) {
 		bucket_table_free(tbl);
 		return NULL;
 	}
@@ -273,7 +277,7 @@ static int rhashtable_expand(struct rhashtable *ht)
 
 	old_tbl = rhashtable_last_table(ht, old_tbl);
 
-	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2);
+	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2, GFP_KERNEL);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
@@ -304,7 +308,7 @@ int rhashtable_shrink(struct rhashtable *ht)
 	if (size < ht->p.min_size)
 		size = ht->p.min_size;
 
-	new_tbl = bucket_table_alloc(ht, size);
+	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
@@ -678,7 +682,7 @@ int rhashtable_init(struct rhashtable *ht,
 		}
 	}
 
-	tbl = bucket_table_alloc(ht, size);
+	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
 	if (tbl == NULL)
 		return -ENOMEM;
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [v2 PATCH 10/10] rhashtable: Add immediate rehash during insertion
  2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
                   ` (8 preceding siblings ...)
  2015-03-22  8:04 ` [v2 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
@ 2015-03-22  8:04 ` Herbert Xu
  9 siblings, 0 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-22  8:04 UTC (permalink / raw)
  To: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

This patch reintroduces immediate rehash during insertion.  If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.

If this rehash fails then the insertion will fail.  Otherwise the
insertion will be reattempted in the new hash table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   32 +++++++++++++++-----
 lib/rhashtable.c           |   71 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index cba22d8..c19cf89 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -103,6 +103,7 @@ struct rhashtable;
  * @max_size: Maximum size while expanding
  * @min_size: Minimum size while shrinking
  * @nulls_base: Base value to generate nulls marker
+ * @insecure_elasticity: Set to true to disable chain length checks
  * @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
  * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
@@ -116,6 +117,7 @@ struct rhashtable_params {
 	unsigned int		max_size;
 	unsigned int		min_size;
 	u32			nulls_base;
+	bool			insecure_elasticity;
 	size_t			locks_mul;
 	rht_hashfn_t		hashfn;
 	rht_obj_hashfn_t	obj_hashfn;
@@ -127,6 +129,7 @@ struct rhashtable_params {
  * @tbl: Bucket table
  * @nelems: Number of elements in table
  * @key_len: Key length for hashfn
+ * @elasticity: Maximum chain length before rehash
  * @p: Configuration parameters
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
@@ -137,6 +140,7 @@ struct rhashtable {
 	atomic_t			nelems;
 	bool                            being_destroyed;
 	unsigned int			key_len;
+	unsigned int			elasticity;
 	struct rhashtable_params	p;
 	struct work_struct		run_work;
 	struct mutex                    mutex;
@@ -252,6 +256,17 @@ static inline bool rht_grow_above_75(const struct rhashtable *ht,
 	       (!ht->p.max_size || tbl->size < ht->p.max_size);
 }
 
+/**
+ * rht_grow_above_100 - returns true if nelems > table-size
+ * @ht:		hash table
+ * @tbl:	current table
+ */
+static inline bool rht_grow_above_100(const struct rhashtable *ht,
+				      const struct bucket_table *tbl)
+{
+	return atomic_read(&ht->nelems) > tbl->size;
+}
+
 /* The bucket lock is selected based on the hash and protects mutations
  * on a group of hash buckets.
  *
@@ -517,11 +532,12 @@ static inline int __rhashtable_insert_fast(
 		.ht = ht,
 		.key = key,
 	};
-	int err = -EEXIST;
 	struct bucket_table *tbl, *new_tbl;
 	struct rhash_head *head;
 	spinlock_t *lock;
+	unsigned elasticity;
 	unsigned hash;
+	int err;
 
 	rcu_read_lock();
 
@@ -543,22 +559,24 @@ static inline int __rhashtable_insert_fast(
 	}
 
 	new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
-	if (unlikely(new_tbl)) {
+	if (unlikely(new_tbl || rht_grow_above_100(ht, tbl))) {
+slow_path:
 		err = rhashtable_insert_slow(ht, key, obj, new_tbl);
 		goto out;
 	}
 
-	if (!key)
-		goto skip_lookup;
-
+	err = -EEXIST;
+	elasticity = ht->elasticity;
 	rht_for_each(head, tbl, hash) {
-		if (unlikely(!(params.obj_cmpfn ?
+		if (key &&
+		    unlikely(!(params.obj_cmpfn ?
 			       params.obj_cmpfn(&arg, rht_obj(ht, head)) :
 			       rhashtable_compare(&arg, rht_obj(ht, head)))))
 			goto out;
+		if (!--elasticity)
+			goto slow_path;
 	}
 
-skip_lookup:
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 2ed3cc1..adfe94c 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -364,21 +364,80 @@ unlock:
 		schedule_work(&ht->run_work);
 }
 
+static bool rhashtable_check_elasticity(struct rhashtable *ht,
+					struct bucket_table *tbl,
+					unsigned hash)
+{
+	unsigned elasticity = ht->elasticity;
+	struct rhash_head *head;
+
+	rht_for_each(head, tbl, hash)
+		if (!--elasticity)
+			return true;
+
+	return false;
+}
+
+int rhashtable_expand_or_rehash(struct rhashtable *ht,
+				struct bucket_table *tbl)
+{
+	struct bucket_table *old_tbl;
+	struct bucket_table *new_tbl;
+	unsigned int size;
+	int err;
+
+	old_tbl = rht_dereference_rcu(ht->tbl, ht);
+	if (!tbl)
+		tbl = rhashtable_last_table(ht, old_tbl);
+
+	size = tbl->size;
+
+	if (rht_grow_above_75(ht, tbl))
+		size *= 2;
+	/* More than two rehashes (not resizes) detected. */
+	else if (WARN_ON(old_tbl != tbl && old_tbl->size == size))
+		return -EBUSY;
+
+	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
+	if (new_tbl == NULL)
+		return -ENOMEM;
+
+	err = rhashtable_rehash_attach(ht, tbl, new_tbl);
+	if (err) {
+		bucket_table_free(new_tbl);
+		if (err == -EEXIST)
+			err = 0;
+	} else
+		schedule_work(&ht->run_work);
+
+	return err;
+}
+
 int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *tbl)
 {
 	struct rhash_head *head;
 	unsigned hash;
-	int err = -EEXIST;
+	int err;
+
+	if (!tbl)
+		goto rehash;
 
+restart:
 	tbl = rhashtable_last_table(ht, tbl);
 	hash = head_hashfn(ht, tbl, obj);
 	spin_lock_nested(rht_bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING);
 
+	err = -EEXIST;
 	if (key && rhashtable_lookup_fast(ht, key, ht->p))
 		goto exit;
 
+	err = -EAGAIN;
+	if (rhashtable_check_elasticity(ht, tbl, hash) ||
+	    rht_grow_above_100(ht, tbl))
+		goto exit;
+
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
@@ -391,6 +450,13 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 
 exit:
 	spin_unlock(rht_bucket_lock(tbl, hash));
+ 
+	if (err == -EAGAIN) {
+rehash:
+		err = rhashtable_expand_or_rehash(ht, tbl);
+		if (!err)
+			goto restart;
+	}
 
 	return err;
 }
@@ -667,6 +733,9 @@ int rhashtable_init(struct rhashtable *ht,
 
 	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
 
+	if (!params->insecure_elasticity)
+		ht->elasticity = 16;
+
 	if (params->locks_mul)
 		ht->p.locks_mul = roundup_pow_of_two(params->locks_mul);
 	else

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker
  2015-03-22  8:03 ` [v2 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
@ 2015-03-22 10:47   ` Thomas Graf
  0 siblings, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2015-03-22 10:47 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/22/15 at 07:03pm, Herbert Xu wrote:
> The walker is a lockless reader so it too needs an smp_rmb before
> reading the future_tbl field in order to see any new tables that
> may contain elements that we should have walked over.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn
  2015-03-22  8:04 ` [v2 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
@ 2015-03-22 11:07   ` Thomas Graf
  0 siblings, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2015-03-22 11:07 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/22/15 at 07:04pm, Herbert Xu wrote:
> When rht_key_hashfn is called from rhashtable itself and params
> is equal to ht->p, there is no point in checking params.key_len
> and falling back to ht->p.key_len.
> 
> For some reason gcc couldn't figure out that params is the same
> as ht->p.  So let's help it by only checking params.key_len when
> it's a constant.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

A comment to document this gcc hack would be nice as it's not
obvious from just reading the code. Shouldn't hold up this series
though.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset
  2015-03-22  8:04 ` [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset Herbert Xu
@ 2015-03-22 11:55   ` Thomas Graf
  2015-03-22 12:04     ` Herbert Xu
  2015-03-23 14:29   ` David Laight
  1 sibling, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2015-03-22 11:55 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/22/15 at 07:04pm, Herbert Xu wrote:
> @@ -134,6 +136,7 @@ struct rhashtable {
>  	struct bucket_table __rcu	*tbl;
>  	atomic_t			nelems;
>  	bool                            being_destroyed;
> +	unsigned int			key_len;

Why is this needed? It looks like you're always initializing this
with ht->p.key_len

>  	struct rhashtable_params	p;
>  	struct work_struct		run_work;
>  	struct mutex                    mutex;
> @@ -199,12 +202,30 @@ static inline unsigned int rht_key_hashfn(
>  	struct rhashtable *ht, const struct bucket_table *tbl,
>  	const void *key, const struct rhashtable_params params)
>  {
> -	unsigned key_len = __builtin_constant_p(params.key_len) ?
> -			   (params.key_len ?: ht->p.key_len) :
> -			   params.key_len;
> +	unsigned hash;

unsigned int

In several places as well

> +	if (!__builtin_constant_p(params.key_len))
> +		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);

I don't understand this. It looks like you only consider
params->key_len if it's constant.

> +	else if (params.key_len) {
> +		unsigned key_len = params.key_len;
> +
> +		if (params.hashfn)
> +			hash = params.hashfn(key, key_len, tbl->hash_rnd);
> +		else if (key_len & (sizeof(u32) - 1))
> +			hash = jhash(key, key_len, tbl->hash_rnd);
> +		else
> +			hash = jhash2(key, key_len / sizeof(u32),
> +				      tbl->hash_rnd);
> +	} else {
> +		unsigned key_len = ht->p.key_len;
> +
> +		if (params.hashfn)
> +			hash = params.hashfn(key, key_len, tbl->hash_rnd);
> +		else
> +			hash = jhash(key, key_len, tbl->hash_rnd);

Why don't we opt-in to jhash2 in this case?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 4/10] netlink: Use default rhashtable hashfn
  2015-03-22  8:04 ` [v2 PATCH 4/10] netlink: Use default rhashtable hashfn Herbert Xu
@ 2015-03-22 11:55   ` Thomas Graf
  2015-03-23  1:18   ` Simon Horman
  1 sibling, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2015-03-22 11:55 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/22/15 at 07:04pm, Herbert Xu wrote:
> This patch removes the explicit jhash value for the hashfn parameter
> of rhashtable.  The default is now jhash so removing the setting
> makes no difference apart from making one less copy of jhash in
> the kernel.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 5/10] tipc: Use default rhashtable hashfn
  2015-03-22  8:04 ` [v2 PATCH 5/10] tipc: " Herbert Xu
@ 2015-03-22 11:56   ` Thomas Graf
  0 siblings, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2015-03-22 11:56 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/22/15 at 07:04pm, Herbert Xu wrote:
> This patch removes the explicit jhash value for the hashfn parameter
> of rhashtable.  The default is now jhash so removing the setting
> makes no difference apart from making one less copy of jhash in
> the kernel.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 6/10] netfilter: Use default rhashtable hashfn
  2015-03-22  8:04 ` [v2 PATCH 6/10] netfilter: " Herbert Xu
@ 2015-03-22 11:56   ` Thomas Graf
  2015-03-23 11:32     ` Herbert Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2015-03-22 11:56 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/22/15 at 07:04pm, Herbert Xu wrote:
> This patch removes the explicit jhash value for the hashfn parameter
> of rhashtable.  The default is now jhash so removing the setting
> makes no difference apart from making one less copy of jhash in
> the kernel.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Looks likes this is not needed given Patrick's work.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset
  2015-03-22 11:55   ` Thomas Graf
@ 2015-03-22 12:04     ` Herbert Xu
  2015-03-22 12:32       ` Thomas Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-22 12:04 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Sun, Mar 22, 2015 at 11:55:05AM +0000, Thomas Graf wrote:
> On 03/22/15 at 07:04pm, Herbert Xu wrote:
> > @@ -134,6 +136,7 @@ struct rhashtable {
> >  	struct bucket_table __rcu	*tbl;
> >  	atomic_t			nelems;
> >  	bool                            being_destroyed;
> > +	unsigned int			key_len;
> 
> Why is this needed? It looks like you're always initializing this
> with ht->p.key_len

It's ht->p.key_len/4 if we use jhash2.
 
> > +	if (!__builtin_constant_p(params.key_len))
> > +		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);
> 
> I don't understand this. It looks like you only consider
> params->key_len if it's constant.

If params->key_len is not constant, then params == ht->p.

> > +	else if (params.key_len) {
> > +		unsigned key_len = params.key_len;
> > +
> > +		if (params.hashfn)
> > +			hash = params.hashfn(key, key_len, tbl->hash_rnd);
> > +		else if (key_len & (sizeof(u32) - 1))
> > +			hash = jhash(key, key_len, tbl->hash_rnd);
> > +		else
> > +			hash = jhash2(key, key_len / sizeof(u32),
> > +				      tbl->hash_rnd);
> > +	} else {
> > +		unsigned key_len = ht->p.key_len;
> > +
> > +		if (params.hashfn)
> > +			hash = params.hashfn(key, key_len, tbl->hash_rnd);
> > +		else
> > +			hash = jhash(key, key_len, tbl->hash_rnd);
> 
> Why don't we opt-in to jhash2 in this case?

Because if key_len == 0 it means that key_len is not known at
compile-time.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-22  8:04 ` [v2 PATCH 7/10] rhashtable: Disable automatic shrinking Herbert Xu
@ 2015-03-22 12:17   ` Thomas Graf
  2015-03-22 13:06     ` Thomas Graf
  2015-03-23  0:09     ` Herbert Xu
  0 siblings, 2 replies; 44+ messages in thread
From: Thomas Graf @ 2015-03-22 12:17 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/22/15 at 07:04pm, Herbert Xu wrote:
> Automatic shrinking is dangerous because it provides an easy
> way for an adversary to cause us to do unnecessary work.  Thus
> making the resizable hashtable a poor data structure.
> 
> This patch disables automatic shrinking but retains a manual
> shrink function for those cases where insertions and removals
> are overseen by a trusted entity, e.g., nft_hash.

This is misleading. I agree that unconditional shrinking is dangerous.
Shrinking was an optional feature disabled by default before. The
inlining enabled it by default for all users. What is the benefit of
requiring this logic outside of rhashtable over just adding a flag to
enable shrinking at 30% utilization?

> The shrink function will now also shrink to fit rather than halve
> the size of the table.

I like this part a lot

>  int rhashtable_shrink(struct rhashtable *ht)
>  {
> -	struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht);
> +	unsigned size = roundup_pow_of_two(atomic_read(&ht->nelems) * 4 / 3);

If rhashtable_shrink() is called near the 75% border it will cause an
immediate expansion again. Maybe make this * 3 / 2 so we shrink near
30% utilization as before?

> +	struct bucket_table *new_tbl;
> +	struct bucket_table *tbl;
> +	int err;
>  
> -	ASSERT_RHT_MUTEX(ht);
> +	if (size < ht->p.min_size)
> +		size = ht->p.min_size;

We should only shrink if size < old_tbl->size

> -	new_tbl = bucket_table_alloc(ht, old_tbl->size / 2);
> +	new_tbl = bucket_table_alloc(ht, size);
>  	if (new_tbl == NULL)
>  		return -ENOMEM;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset
  2015-03-22 12:04     ` Herbert Xu
@ 2015-03-22 12:32       ` Thomas Graf
  2015-03-22 21:12         ` Herbert Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2015-03-22 12:32 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/22/15 at 11:04pm, Herbert Xu wrote:
> On Sun, Mar 22, 2015 at 11:55:05AM +0000, Thomas Graf wrote:
> > On 03/22/15 at 07:04pm, Herbert Xu wrote:
> > > @@ -134,6 +136,7 @@ struct rhashtable {
> > >  	struct bucket_table __rcu	*tbl;
> > >  	atomic_t			nelems;
> > >  	bool                            being_destroyed;
> > > +	unsigned int			key_len;
> > 
> > Why is this needed? It looks like you're always initializing this
> > with ht->p.key_len
> 
> It's ht->p.key_len/4 if we use jhash2.

Sure but why not just store key_len/4 in ht->p.key_len then if you
opt in to jhash2() in rhashtable_init()?

> > > +	if (!__builtin_constant_p(params.key_len))
> > > +		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);
> > 
> > I don't understand this. It looks like you only consider
> > params->key_len if it's constant.
> 
> If params->key_len is not constant, then params == ht->p.

I must be missing something obvious. Who guarantees that? I can see
that's true for the current callers but what prevents anybody from
using rhashtable_lookup_fast() with a key length not known at compile
time and pass it as rhashtable_params?

> > > +	else if (params.key_len) {
> > > +		unsigned key_len = params.key_len;
> > > +
> > > +		if (params.hashfn)
> > > +			hash = params.hashfn(key, key_len, tbl->hash_rnd);
> > > +		else if (key_len & (sizeof(u32) - 1))
> > > +			hash = jhash(key, key_len, tbl->hash_rnd);
> > > +		else
> > > +			hash = jhash2(key, key_len / sizeof(u32),
> > > +				      tbl->hash_rnd);
> > > +	} else {
> > > +		unsigned key_len = ht->p.key_len;
> > > +
> > > +		if (params.hashfn)
> > > +			hash = params.hashfn(key, key_len, tbl->hash_rnd);
> > > +		else
> > > +			hash = jhash(key, key_len, tbl->hash_rnd);
> > 
> > Why don't we opt-in to jhash2 in this case?
> 
> Because if key_len == 0 it means that key_len is not known at
> compile-time.

I still don't get this. Why do we fall back to jhash2() if
params.key_len is set but not if only ht->p.key_len is set?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-22 12:17   ` Thomas Graf
@ 2015-03-22 13:06     ` Thomas Graf
  2015-03-23  0:07       ` Herbert Xu
  2015-03-23  0:09     ` Herbert Xu
  1 sibling, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2015-03-22 13:06 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/22/15 at 12:17pm, Thomas Graf wrote:
> On 03/22/15 at 07:04pm, Herbert Xu wrote:
> > +	struct bucket_table *new_tbl;
> > +	struct bucket_table *tbl;
> > +	int err;
> >  
> > -	ASSERT_RHT_MUTEX(ht);
> > +	if (size < ht->p.min_size)
> > +		size = ht->p.min_size;
> 
> We should only shrink if size < old_tbl->size

I found the check further down. Any particular reason why check
after allocation and then free again? Why do you want to avoid
the allocation inside the mutex?

> > -	new_tbl = bucket_table_alloc(ht, old_tbl->size / 2);
> > +	new_tbl = bucket_table_alloc(ht, size);
> >  	if (new_tbl == NULL)
> >  		return -ENOMEM;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset
  2015-03-22 12:32       ` Thomas Graf
@ 2015-03-22 21:12         ` Herbert Xu
  2015-03-23  9:58           ` Thomas Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-22 21:12 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Sun, Mar 22, 2015 at 12:32:57PM +0000, Thomas Graf wrote:
> On 03/22/15 at 11:04pm, Herbert Xu wrote:
> > On Sun, Mar 22, 2015 at 11:55:05AM +0000, Thomas Graf wrote:
> > > On 03/22/15 at 07:04pm, Herbert Xu wrote:
> > > > @@ -134,6 +136,7 @@ struct rhashtable {
> > > >  	struct bucket_table __rcu	*tbl;
> > > >  	atomic_t			nelems;
> > > >  	bool                            being_destroyed;
> > > > +	unsigned int			key_len;
> > > 
> > > Why is this needed? It looks like you're always initializing this
> > > with ht->p.key_len
> > 
> > It's ht->p.key_len/4 if we use jhash2.
> 
> Sure but why not just store key_len/4 in ht->p.key_len then if you
> opt in to jhash2() in rhashtable_init()?

Because that breaks rhashtable_compare/memcmp.

> 
> > > > +	if (!__builtin_constant_p(params.key_len))
> > > > +		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);
> > > 
> > > I don't understand this. It looks like you only consider
> > > params->key_len if it's constant.
> > 
> > If params->key_len is not constant, then params == ht->p.
> 
> I must be missing something obvious. Who guarantees that? I can see
> that's true for the current callers but what prevents anybody from
> using rhashtable_lookup_fast() with a key length not known at compile
> time and pass it as rhashtable_params?

They shouldn't be doing that.  The whole point of this function
is to have it inlined so all external callers of it should be
supplying a constant parameter.  We could add a __ variant that
is only called by rhashtable if you like so we can enforce this
in rhashtable_lookup_fast.

> I still don't get this. Why do we fall back to jhash2() if
> params.key_len is set but not if only ht->p.key_len is set?

Because if params.key_len is not set then we have no idea whether
we should use jhash or jhash2 because ht->p.key_len cannot be
known at compile time.  This is only used by netfilter currently
as it has a key-length set at run-time.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-22 13:06     ` Thomas Graf
@ 2015-03-23  0:07       ` Herbert Xu
  2015-03-23  8:37         ` Thomas Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-23  0:07 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Sun, Mar 22, 2015 at 01:06:30PM +0000, Thomas Graf wrote:
> On 03/22/15 at 12:17pm, Thomas Graf wrote:
> > On 03/22/15 at 07:04pm, Herbert Xu wrote:
> > > +	struct bucket_table *new_tbl;
> > > +	struct bucket_table *tbl;
> > > +	int err;
> > >  
> > > -	ASSERT_RHT_MUTEX(ht);
> > > +	if (size < ht->p.min_size)
> > > +		size = ht->p.min_size;
> > 
> > We should only shrink if size < old_tbl->size
> 
> I found the check further down. Any particular reason why check
> after allocation and then free again? Why do you want to avoid
> the allocation inside the mutex?

It's just quality of code.  You should always try to minimise
the locked sections.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-22 12:17   ` Thomas Graf
  2015-03-22 13:06     ` Thomas Graf
@ 2015-03-23  0:09     ` Herbert Xu
  2015-03-23  8:33       ` Thomas Graf
  1 sibling, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-23  0:09 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Sun, Mar 22, 2015 at 12:17:55PM +0000, Thomas Graf wrote:
> 
> This is misleading. I agree that unconditional shrinking is dangerous.
> Shrinking was an optional feature disabled by default before. The

How was shrinking disabled before? AFAICS it always kicked in at
30%.

> inlining enabled it by default for all users. What is the benefit of
> requiring this logic outside of rhashtable over just adding a flag to
> enable shrinking at 30% utilization?

That would be adding an extra branch on the fast-path for an
operation which almost nobody needs.
 
> >  int rhashtable_shrink(struct rhashtable *ht)
> >  {
> > -	struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht);
> > +	unsigned size = roundup_pow_of_two(atomic_read(&ht->nelems) * 4 / 3);
> 
> If rhashtable_shrink() is called near the 75% border it will cause an
> immediate expansion again. Maybe make this * 3 / 2 so we shrink near
> 30% utilization as before?

Sure that makes sense.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 4/10] netlink: Use default rhashtable hashfn
  2015-03-22  8:04 ` [v2 PATCH 4/10] netlink: Use default rhashtable hashfn Herbert Xu
  2015-03-22 11:55   ` Thomas Graf
@ 2015-03-23  1:18   ` Simon Horman
  2015-03-23 12:56     ` Herbert Xu
  1 sibling, 1 reply; 44+ messages in thread
From: Simon Horman @ 2015-03-23  1:18 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

Hi Herbert,

On Sun, Mar 22, 2015 at 07:04:02PM +1100, Herbert Xu wrote:
> This patch removes the explicit jhash value for the hashfn parameter
> of rhashtable.  The default is now jhash so removing the setting
> makes no difference apart from making one less copy of jhash in
> the kernel.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> ---
> 
>  net/netlink/af_netlink.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 6517921..e2f7f28 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -3133,13 +3133,12 @@ static inline u32 netlink_hash(const void *data, u32 seed)
>  	struct netlink_compare_arg arg;
>  
>  	netlink_compare_arg_init(&arg, sock_net(&nlk->sk), nlk->portid);
> -	return jhash(&arg, netlink_compare_arg_len, seed);
> +	return jhash2((u32 *)&arg, netlink_compare_arg_len / sizeof(u32), seed);

I understand the above change in the context of the rest of the series,
however, it does not seem to match up with the changelog for this patch.

>  }
>  
>  static const struct rhashtable_params netlink_rhashtable_params = {
>  	.head_offset = offsetof(struct netlink_sock, node),
>  	.key_len = netlink_compare_arg_len,
> -	.hashfn = jhash,
>  	.obj_hashfn = netlink_hash,
>  	.obj_cmpfn = netlink_compare,
>  	.max_size = 65536,
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  0:09     ` Herbert Xu
@ 2015-03-23  8:33       ` Thomas Graf
  2015-03-23  9:28         ` Herbert Xu
  2015-03-23 16:44         ` David Miller
  0 siblings, 2 replies; 44+ messages in thread
From: Thomas Graf @ 2015-03-23  8:33 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/23/15 at 11:09am, Herbert Xu wrote:
> On Sun, Mar 22, 2015 at 12:17:55PM +0000, Thomas Graf wrote:
> > 
> > This is misleading. I agree that unconditional shrinking is dangerous.
> > Shrinking was an optional feature disabled by default before. The
> 
> How was shrinking disabled before? AFAICS it always kicked in at
> 30%.

Before Daniel removed the indirection due to all callers enabling
shrinking by default ;-) It was clear that some future users
eventually would not want shrinking and thus require a conditional.

> > inlining enabled it by default for all users. What is the benefit of
> > requiring this logic outside of rhashtable over just adding a flag to
> > enable shrinking at 30% utilization?
> 
> That would be adding an extra branch on the fast-path for an
> operation which almost nobody needs.

I don't get why almost nobody would want shrinking. I agree that for
tables like TCP hash tables, once you have grown you want to keep that
table size because the load is likely to come back. But we will also
have lots of users such as the Netlink socket with a table per protocol
where not shrinking results in giving the user the ability to waste
memory indefinitely for no gain.

I'm not claiming you always want shrinking but what gain is there by
removing integrated support? Can you show numbers that the additional
branch actually hurts?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  0:07       ` Herbert Xu
@ 2015-03-23  8:37         ` Thomas Graf
  2015-03-23  9:29           ` Herbert Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2015-03-23  8:37 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/23/15 at 11:07am, Herbert Xu wrote:
> On Sun, Mar 22, 2015 at 01:06:30PM +0000, Thomas Graf wrote:
> > On 03/22/15 at 12:17pm, Thomas Graf wrote:
> > > On 03/22/15 at 07:04pm, Herbert Xu wrote:
> > > > +	struct bucket_table *new_tbl;
> > > > +	struct bucket_table *tbl;
> > > > +	int err;
> > > >  
> > > > -	ASSERT_RHT_MUTEX(ht);
> > > > +	if (size < ht->p.min_size)
> > > > +		size = ht->p.min_size;
> > > 
> > > We should only shrink if size < old_tbl->size
> > 
> > I found the check further down. Any particular reason why check
> > after allocation and then free again? Why do you want to avoid
> > the allocation inside the mutex?
> 
> It's just quality of code.  You should always try to minimise
> the locked sections.

So do you expect the user to replicate the new table size calculation
outside of rhashtable_shrink() to avoid the cost of a possible massive
memory allocation even if no shrinking will take place?

I think rhashtable_shrink() should fetch ht->tbl in an RCU section to
cheaply get the current table size and only do the allocation and take
the lock if the table size warrants for shrinking.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  8:33       ` Thomas Graf
@ 2015-03-23  9:28         ` Herbert Xu
  2015-03-23  9:36           ` Thomas Graf
  2015-03-23 16:45           ` David Miller
  2015-03-23 16:44         ` David Miller
  1 sibling, 2 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-23  9:28 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 08:33:19AM +0000, Thomas Graf wrote:
>
> I'm not claiming you always want shrinking but what gain is there by
> removing integrated support? Can you show numbers that the additional
> branch actually hurts?

You never want automatic shrinking unless all your users are
trusted.  I doubt there would be many rhashtable users where
this would apply.  Even nft_hash is quite tenuous.

Besdies, if you really want automatic shrinking, you could always
do it in the caller of rhashtable_remove.  That way only you
would pay for the cost and not everybody else.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  8:37         ` Thomas Graf
@ 2015-03-23  9:29           ` Herbert Xu
  2015-03-23  9:43             ` Thomas Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-23  9:29 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 08:37:12AM +0000, Thomas Graf wrote:
>
> I think rhashtable_shrink() should fetch ht->tbl in an RCU section to
> cheaply get the current table size and only do the allocation and take
> the lock if the table size warrants for shrinking.

Well you should never invoke rhashtable_shrink unless you actually
wanted to shrink.  So this is something that you should have checked
before rhashtable_shrink is called.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  9:28         ` Herbert Xu
@ 2015-03-23  9:36           ` Thomas Graf
  2015-03-23  9:39             ` Herbert Xu
  2015-03-23 16:45           ` David Miller
  1 sibling, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2015-03-23  9:36 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/23/15 at 08:28pm, Herbert Xu wrote:
> On Mon, Mar 23, 2015 at 08:33:19AM +0000, Thomas Graf wrote:
> >
> > I'm not claiming you always want shrinking but what gain is there by
> > removing integrated support? Can you show numbers that the additional
> > branch actually hurts?
> 
> You never want automatic shrinking unless all your users are
> trusted.  I doubt there would be many rhashtable users where
> this would apply.  Even nft_hash is quite tenuous.

Why?

> Besdies, if you really want automatic shrinking, you could always
> do it in the caller of rhashtable_remove.  That way only you
> would pay for the cost and not everybody else.

Same can be said for growing. Why do we differ between the two?
Would you expect users requiring shrinking() to call
rhashtable_shrink() after every remove? Should they encode their
own logic based on rhashtable internals?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  9:36           ` Thomas Graf
@ 2015-03-23  9:39             ` Herbert Xu
  2015-03-23  9:44               ` Herbert Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-23  9:39 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 09:36:32AM +0000, Thomas Graf wrote:
>
> > You never want automatic shrinking unless all your users are
> > trusted.  I doubt there would be many rhashtable users where
> > this would apply.  Even nft_hash is quite tenuous.
> 
> Why?

Because with multiple rehashing it's quite easy to convert your
hash table into a linked list by repeatedly growing and shrinking.

Multiple rehashing simply cannot work unless you get rid of automatic
shrinking for the untrusted case.

> Same can be said for growing. Why do we differ between the two?
> Would you expect users requiring shrinking() to call
> rhashtable_shrink() after every remove? Should they encode their
> own logic based on rhashtable internals?

No we must be able to grow or insertions will fail.  The same cannot
be said for shrinking.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  9:29           ` Herbert Xu
@ 2015-03-23  9:43             ` Thomas Graf
  0 siblings, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2015-03-23  9:43 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/23/15 at 08:29pm, Herbert Xu wrote:
> On Mon, Mar 23, 2015 at 08:37:12AM +0000, Thomas Graf wrote:
> >
> > I think rhashtable_shrink() should fetch ht->tbl in an RCU section to
> > cheaply get the current table size and only do the allocation and take
> > the lock if the table size warrants for shrinking.
> 
> Well you should never invoke rhashtable_shrink unless you actually
> wanted to shrink.  So this is something that you should have checked
> before rhashtable_shrink is called.

How? The calculation of the table size is embedded in
rhashtable_shrink(). Should every user have a copy of that
calculation algorithm?

Why not just:

	unlikely(ht->p.shrink && rht_shrink_below_30(..))

If you really care about that additional conditional we
can also add:

static inline int rhashtable_remove_and_shrink()
{
        int err;

        rcu_read_lock();

	tbl = rht_dereference_rcu(ht->tbl, ht);

        err = rhashtable_remove_fast();
        if (unlikely(!err && rht_shrink_below_30(ht, tbl)))
                schedule_work(&ht->run_work);

	rcu_read_unlock();

	return err;
}

I just think it's wrong to rip out all the shrinking logic and
require every single user to re-add its own copy.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  9:39             ` Herbert Xu
@ 2015-03-23  9:44               ` Herbert Xu
  2015-03-23 10:08                 ` Thomas Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Herbert Xu @ 2015-03-23  9:44 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 08:39:52PM +1100, Herbert Xu wrote:
> 
> Because with multiple rehashing it's quite easy to convert your
> hash table into a linked list by repeatedly growing and shrinking.
> 
> Multiple rehashing simply cannot work unless you get rid of automatic
> shrinking for the untrusted case.

Actually what I could do is allow automatic shrinking when there
are no outstanding rehashes.  So maybe we could restore this feature
after all.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset
  2015-03-22 21:12         ` Herbert Xu
@ 2015-03-23  9:58           ` Thomas Graf
  2015-03-23 10:18             ` Herbert Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2015-03-23  9:58 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/23/15 at 08:12am, Herbert Xu wrote:
> On Sun, Mar 22, 2015 at 12:32:57PM +0000, Thomas Graf wrote:
> > Sure but why not just store key_len/4 in ht->p.key_len then if you
> > opt in to jhash2() in rhashtable_init()?
> 
> Because that breaks rhashtable_compare/memcmp.

Thanks. Didn't see that.

> > > > > +	if (!__builtin_constant_p(params.key_len))
> > > > > +		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);
> > > > 
> > > > I don't understand this. It looks like you only consider
> > > > params->key_len if it's constant.
> > > 
> > > If params->key_len is not constant, then params == ht->p.
> > 
> > I must be missing something obvious. Who guarantees that? I can see
> > that's true for the current callers but what prevents anybody from
> > using rhashtable_lookup_fast() with a key length not known at compile
> > time and pass it as rhashtable_params?
> 
> They shouldn't be doing that.  The whole point of this function
> is to have it inlined so all external callers of it should be
> supplying a constant parameter.  We could add a __ variant that
> is only called by rhashtable if you like so we can enforce this
> in rhashtable_lookup_fast.

If you add such constraints it must be clearly documented. There
is no way of figuring this out right now without reading the entire
rhashtable code (and talking to you).

> > I still don't get this. Why do we fall back to jhash2() if
> > params.key_len is set but not if only ht->p.key_len is set?
> 
> Because if params.key_len is not set then we have no idea whether
> we should use jhash or jhash2 because ht->p.key_len cannot be
> known at compile time.  This is only used by netfilter currently
> as it has a key-length set at run-time.

Sorry, still not getting this ;-)

nft_hash sets key_len to set->klen and passes it to rhashtable_init().
rhashtable_init() should then fall back to jhash() or jhash2() if no
hashfn is provided. Why is the logic in rht_key_hashfn() different?
Actually, in which case is ht->p.hashfn not set in rht_key_hashfn()?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  9:44               ` Herbert Xu
@ 2015-03-23 10:08                 ` Thomas Graf
  2015-03-23 10:19                   ` Herbert Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2015-03-23 10:08 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On 03/23/15 at 08:44pm, Herbert Xu wrote:
> On Mon, Mar 23, 2015 at 08:39:52PM +1100, Herbert Xu wrote:
> > 
> > Because with multiple rehashing it's quite easy to convert your
> > hash table into a linked list by repeatedly growing and shrinking.
> > 
> > Multiple rehashing simply cannot work unless you get rid of automatic
> > shrinking for the untrusted case.
> 
> Actually what I could do is allow automatic shrinking when there
> are no outstanding rehashes.  So maybe we could restore this feature
> after all.

OK. Maybe this patch should be posted in the context of enabling
multiple rehashes then. It is difficult to review without having
the full context. This correlation was not clear to me from the
commit message.

I have yet to understand the implications of multiple rehashes.
The idea of having to traverse N tables for each insert, removal
and lookup in a pressure situation is still frightening.

I would like to compare it with an exponential growing logic.
Eventually both approaches can be combined to limit the chain
length of rehashes.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset
  2015-03-23  9:58           ` Thomas Graf
@ 2015-03-23 10:18             ` Herbert Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-23 10:18 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 09:58:42AM +0000, Thomas Graf wrote:
>
> nft_hash sets key_len to set->klen and passes it to rhashtable_init().
> rhashtable_init() should then fall back to jhash() or jhash2() if no
> hashfn is provided. Why is the logic in rht_key_hashfn() different?
> Actually, in which case is ht->p.hashfn not set in rht_key_hashfn()?

jhash can be used in place of jhash2, the only difference is
that jhash is less optimised and uses unaligned loads.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23 10:08                 ` Thomas Graf
@ 2015-03-23 10:19                   ` Herbert Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-23 10:19 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 10:08:52AM +0000, Thomas Graf wrote:
>
> I have yet to understand the implications of multiple rehashes.
> The idea of having to traverse N tables for each insert, removal
> and lookup in a pressure situation is still frightening.

It's simple.  With only expansion you can never have more than
log N tables.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 6/10] netfilter: Use default rhashtable hashfn
  2015-03-22 11:56   ` Thomas Graf
@ 2015-03-23 11:32     ` Herbert Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-23 11:32 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Josh Triplett,
	Paul E. McKenney, netdev

On Sun, Mar 22, 2015 at 11:56:59AM +0000, Thomas Graf wrote:
> On 03/22/15 at 07:04pm, Herbert Xu wrote:
> > This patch removes the explicit jhash value for the hashfn parameter
> > of rhashtable.  The default is now jhash so removing the setting
> > makes no difference apart from making one less copy of jhash in
> > the kernel.
> > 
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Looks likes this is not needed given Patrick's work.

OK dropped.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 4/10] netlink: Use default rhashtable hashfn
  2015-03-23  1:18   ` Simon Horman
@ 2015-03-23 12:56     ` Herbert Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-23 12:56 UTC (permalink / raw)
  To: Simon Horman
  Cc: David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

On Mon, Mar 23, 2015 at 10:18:21AM +0900, Simon Horman wrote:
>
> I understand the above change in the context of the rest of the series,
> however, it does not seem to match up with the changelog for this patch.

OK I will update the changelog.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset
  2015-03-22  8:04 ` [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset Herbert Xu
  2015-03-22 11:55   ` Thomas Graf
@ 2015-03-23 14:29   ` David Laight
  1 sibling, 0 replies; 44+ messages in thread
From: David Laight @ 2015-03-23 14:29 UTC (permalink / raw)
  To: 'Herbert Xu',
	David S. Miller, Thomas Graf, Eric Dumazet, Patrick McHardy,
	Josh Triplett, Paul E. McKenney, netdev

From: Herbert Xu
> Since every current rhashtable user uses jhash as their hash
> function, the fact that jhash is an inline function causes each
> user to generate a copy of its code.
> 
> This function provides a solution to this problem by allowing
> hashfn to be unset.  In which case rhashtable will automatically
> set it to jhash.  Furthermore, if the key length is a multiple
> of 4, we will switch over to jhash2.

Would it make sense to do this as a run-time check for the NULL
function pointer so that the jhash code itself can be inlined?

The cost of the test is likely to be less that the indirect call.

	David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  8:33       ` Thomas Graf
  2015-03-23  9:28         ` Herbert Xu
@ 2015-03-23 16:44         ` David Miller
  2015-03-23 21:48           ` Herbert Xu
  2015-03-23 22:13           ` Thomas Graf
  1 sibling, 2 replies; 44+ messages in thread
From: David Miller @ 2015-03-23 16:44 UTC (permalink / raw)
  To: tgraf; +Cc: herbert, eric.dumazet, kaber, josh, paulmck, netdev

From: Thomas Graf <tgraf@suug.ch>
Date: Mon, 23 Mar 2015 08:33:19 +0000

> I don't get why almost nobody would want shrinking. I agree that for
> tables like TCP hash tables, once you have grown you want to keep that
> table size because the load is likely to come back. But we will also
> have lots of users such as the Netlink socket with a table per protocol
> where not shrinking results in giving the user the ability to waste
> memory indefinitely for no gain.

The user can't do this with TCP?  Why is netlink only susceptible?

The only plausible argument for shrinking I've ever heard of is the
nft_hash case, and there that code can _explicitly_ ask for a shrink
after it has made a major table modification.

That puts all of the smarts for when to shrink where the knowledge
resides, and in this case that's the user.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23  9:28         ` Herbert Xu
  2015-03-23  9:36           ` Thomas Graf
@ 2015-03-23 16:45           ` David Miller
  1 sibling, 0 replies; 44+ messages in thread
From: David Miller @ 2015-03-23 16:45 UTC (permalink / raw)
  To: herbert; +Cc: tgraf, eric.dumazet, kaber, josh, paulmck, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 23 Mar 2015 20:28:07 +1100

> You never want automatic shrinking unless all your users are
> trusted.

+1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23 16:44         ` David Miller
@ 2015-03-23 21:48           ` Herbert Xu
  2015-03-23 22:13           ` Thomas Graf
  1 sibling, 0 replies; 44+ messages in thread
From: Herbert Xu @ 2015-03-23 21:48 UTC (permalink / raw)
  To: David Miller; +Cc: tgraf, eric.dumazet, kaber, josh, paulmck, netdev

On Mon, Mar 23, 2015 at 12:44:34PM -0400, David Miller wrote:
>
> The user can't do this with TCP?  Why is netlink only susceptible?
> 
> The only plausible argument for shrinking I've ever heard of is the
> nft_hash case, and there that code can _explicitly_ ask for a shrink
> after it has made a major table modification.
> 
> That puts all of the smarts for when to shrink where the knowledge
> resides, and in this case that's the user.

One thing I got to say is that automatic shrinking is a really
good stress test as reenabling it allowed me to quickly identify
two bugs in the final patch :)

Other than that I totally agree that it should be disabled by
default as otherwise it increases the amortised cost of the hash
table for a paltry saving in memory.

We can do it afterwards once we're sure this whole thing is stable.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [v2 PATCH 7/10] rhashtable: Disable automatic shrinking
  2015-03-23 16:44         ` David Miller
  2015-03-23 21:48           ` Herbert Xu
@ 2015-03-23 22:13           ` Thomas Graf
  1 sibling, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2015-03-23 22:13 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, eric.dumazet, kaber, josh, paulmck, netdev

On 03/23/15 at 12:44pm, David Miller wrote:
> From: Thomas Graf <tgraf@suug.ch>
> Date: Mon, 23 Mar 2015 08:33:19 +0000
> 
> > I don't get why almost nobody would want shrinking. I agree that for
> > tables like TCP hash tables, once you have grown you want to keep that
> > table size because the load is likely to come back. But we will also
> > have lots of users such as the Netlink socket with a table per protocol
> > where not shrinking results in giving the user the ability to waste
> > memory indefinitely for no gain.
> 
> The user can't do this with TCP?  Why is netlink only susceptible?

You are right. Any table that doesn't shrink will eventually waste
memory. I used TCP vs Netlink because I believe it represents the
difference in priorities very well. TCP may go 0..1M flows within
a fraction of a second so if you've seen that many flows before you
might get hit again and you prioritize the "being ready" to handle it
higher than eventually wasting the memory indefinitely.

Whereas with Netlink it seems (glad to be proven wrong) that the
need for instant growth is lesser as it takes time to create 1M
sockets across many PIDs. So we gain something by releasing the
resources if not needed.

> The only plausible argument for shrinking I've ever heard of is the
> nft_hash case, and there that code can _explicitly_ ask for a shrink
> after it has made a major table modification.
> 
> That puts all of the smarts for when to shrink where the knowledge
> resides, and in this case that's the user.

My argument pro automatic shrinking is simplicity. I'm absolutely
fine with disabling it by default and to require enabling it
explicitly.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2015-03-23 22:13 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-22  8:03 [v2 PATCH 0/10] rhashtable: Multiple rehashing Herbert Xu
2015-03-22  8:03 ` [v2 PATCH 1/10] rhashtable: Add barrier to ensure we see new tables in walker Herbert Xu
2015-03-22 10:47   ` Thomas Graf
2015-03-22  8:04 ` [v2 PATCH 2/10] rhashtable: Eliminate unnecessary branch in rht_key_hashfn Herbert Xu
2015-03-22 11:07   ` Thomas Graf
2015-03-22  8:04 ` [v2 PATCH 3/10] rhashtable: Allow hashfn to be unset Herbert Xu
2015-03-22 11:55   ` Thomas Graf
2015-03-22 12:04     ` Herbert Xu
2015-03-22 12:32       ` Thomas Graf
2015-03-22 21:12         ` Herbert Xu
2015-03-23  9:58           ` Thomas Graf
2015-03-23 10:18             ` Herbert Xu
2015-03-23 14:29   ` David Laight
2015-03-22  8:04 ` [v2 PATCH 4/10] netlink: Use default rhashtable hashfn Herbert Xu
2015-03-22 11:55   ` Thomas Graf
2015-03-23  1:18   ` Simon Horman
2015-03-23 12:56     ` Herbert Xu
2015-03-22  8:04 ` [v2 PATCH 5/10] tipc: " Herbert Xu
2015-03-22 11:56   ` Thomas Graf
2015-03-22  8:04 ` [v2 PATCH 6/10] netfilter: " Herbert Xu
2015-03-22 11:56   ` Thomas Graf
2015-03-23 11:32     ` Herbert Xu
2015-03-22  8:04 ` [v2 PATCH 7/10] rhashtable: Disable automatic shrinking Herbert Xu
2015-03-22 12:17   ` Thomas Graf
2015-03-22 13:06     ` Thomas Graf
2015-03-23  0:07       ` Herbert Xu
2015-03-23  8:37         ` Thomas Graf
2015-03-23  9:29           ` Herbert Xu
2015-03-23  9:43             ` Thomas Graf
2015-03-23  0:09     ` Herbert Xu
2015-03-23  8:33       ` Thomas Graf
2015-03-23  9:28         ` Herbert Xu
2015-03-23  9:36           ` Thomas Graf
2015-03-23  9:39             ` Herbert Xu
2015-03-23  9:44               ` Herbert Xu
2015-03-23 10:08                 ` Thomas Graf
2015-03-23 10:19                   ` Herbert Xu
2015-03-23 16:45           ` David Miller
2015-03-23 16:44         ` David Miller
2015-03-23 21:48           ` Herbert Xu
2015-03-23 22:13           ` Thomas Graf
2015-03-22  8:04 ` [v2 PATCH 8/10] rhashtable: Add multiple rehash support Herbert Xu
2015-03-22  8:04 ` [v2 PATCH 9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation Herbert Xu
2015-03-22  8:04 ` [v2 PATCH 10/10] rhashtable: Add immediate rehash during insertion Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.