linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Three rhashtable improvements
@ 2019-03-14  5:05 NeilBrown
  2019-03-14  5:05 ` [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from NeilBrown
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: NeilBrown @ 2019-03-14  5:05 UTC (permalink / raw)
  To: Thomas Graf, Herbert Xu; +Cc: netdev, Paul E. McKenney, linux-kernel

These three patches have previously been posted, but at the end of a
set where some early patches were contentious.
These patches did not depend on the others, so I'm posting them
separately.
The second patch has been changed slightly to make use of the new
API that Paul McKenney provided to check is call_rcu() has been
called yet or not.

Thanks,
NeilBrown


---

NeilBrown (3):
      rhashtable: use cmpxchg() in nested_table_alloc()
      rhashtable: don't hold lock on first table throughout insertion.
      rhashtable: rename rht_for_each*continue as *from.


 .clang-format              |    8 +++---
 include/linux/rhashtable.h |   53 +++++++++++++++------------------------
 lib/rhashtable.c           |   60 +++++++++++++++-----------------------------
 3 files changed, 45 insertions(+), 76 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] rhashtable: use cmpxchg() in nested_table_alloc()
  2019-03-14  5:05 [PATCH 0/3] Three rhashtable improvements NeilBrown
  2019-03-14  5:05 ` [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from NeilBrown
  2019-03-14  5:05 ` [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion NeilBrown
@ 2019-03-14  5:05 ` NeilBrown
  2019-03-15  5:10   ` Herbert Xu
  2 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2019-03-14  5:05 UTC (permalink / raw)
  To: Thomas Graf, Herbert Xu; +Cc: netdev, Paul E. McKenney, linux-kernel

nested_table_alloc() relies on the fact that there is
at most one spinlock allocated for every slot in the top
level nested table, so it is not possible for two threads
to try to allocate the same table at the same time.

This assumption is a little fragile (it is not explicit) and is
unnecessary.  We can instead protect against
a race using cmpxchg() - if it the cmp fails, free the table
that was just allocated.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 lib/rhashtable.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 0a105d4af166..c983c0ee15d5 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -131,9 +131,11 @@ static union nested_table *nested_table_alloc(struct rhashtable *ht,
 			INIT_RHT_NULLS_HEAD(ntbl[i].bucket);
 	}
 
-	rcu_assign_pointer(*prev, ntbl);
-
-	return ntbl;
+	if (cmpxchg(prev, NULL, ntbl) == NULL)
+		return ntbl;
+	/* Raced with another thread. */
+	kfree(ntbl);
+	return rcu_dereference(*prev);
 }
 
 static struct bucket_table *nested_bucket_table_alloc(struct rhashtable *ht,



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion.
  2019-03-14  5:05 [PATCH 0/3] Three rhashtable improvements NeilBrown
  2019-03-14  5:05 ` [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from NeilBrown
@ 2019-03-14  5:05 ` NeilBrown
  2019-03-14 14:58   ` Paul E. McKenney
  2019-03-15  5:47   ` Herbert Xu
  2019-03-14  5:05 ` [PATCH 1/3] rhashtable: use cmpxchg() in nested_table_alloc() NeilBrown
  2 siblings, 2 replies; 14+ messages in thread
From: NeilBrown @ 2019-03-14  5:05 UTC (permalink / raw)
  To: Thomas Graf, Herbert Xu; +Cc: netdev, Paul E. McKenney, linux-kernel

rhashtable_try_insert() currently holds a lock on the bucket in
the first table, while also locking buckets in subsequent tables.
This is unnecessary and looks like a hold-over from some earlier
version of the implementation.

As insert and remove always lock a bucket in each table in turn, and
as insert only inserts in the final table, there cannot be any races
that are not covered by simply locking a bucket in each table in turn.

When an insert call reaches that last table it can be sure that there
is no matchinf entry in any other table as it has searched them all, and
insertion never happens anywhere but in the last table.  The fact that
code tests for the existence of future_tbl while holding a lock on
the relevant bucket ensures that two threads inserting the same key
will make compatible decisions about which is the "last" table.

This simplifies the code and allows the ->rehash field to be
discarded.

We still need a way to ensure that a dead bucket_table is never
re-linked by rhashtable_walk_stop().  This can be achieved by calling
call_rcu() inside the locked region, and checking with
rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
bucket table is empty and dead.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 include/linux/rhashtable.h |   13 -----------
 lib/rhashtable.c           |   50 +++++++++++++-------------------------------
 2 files changed, 15 insertions(+), 48 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index ae9c0f71f311..3864193d5e2e 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -63,7 +63,6 @@
 struct bucket_table {
 	unsigned int		size;
 	unsigned int		nest;
-	unsigned int		rehash;
 	u32			hash_rnd;
 	unsigned int		locks_mask;
 	spinlock_t		*locks;
@@ -776,12 +775,6 @@ static inline int rhltable_insert(
  * @obj:	pointer to hash head inside object
  * @params:	hash table parameters
  *
- * Locks down the bucket chain in both the old and new table if a resize
- * is in progress to ensure that writers can't remove from the old table
- * and can't insert to the new table during the atomic operation of search
- * and insertion. Searches for duplicates in both the old and new table if
- * a resize is in progress.
- *
  * This lookup function may only be used for fixed key hash table (key_len
  * parameter set). It will BUG() if used inappropriately.
  *
@@ -837,12 +830,6 @@ static inline void *rhashtable_lookup_get_insert_fast(
  * @obj:	pointer to hash head inside object
  * @params:	hash table parameters
  *
- * Locks down the bucket chain in both the old and new table if a resize
- * is in progress to ensure that writers can't remove from the old table
- * and can't insert to the new table during the atomic operation of search
- * and insertion. Searches for duplicates in both the old and new table if
- * a resize is in progress.
- *
  * Lookups may occur in parallel with hashtable mutations and resizing.
  *
  * Will trigger an automatic deferred table resizing if residency in the
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index c983c0ee15d5..03ba449c6d38 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -199,6 +199,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 		return NULL;
 	}
 
+	rcu_head_init(&tbl->rcu);
 	INIT_LIST_HEAD(&tbl->walkers);
 
 	tbl->hash_rnd = get_random_u32();
@@ -282,10 +283,9 @@ static int rhashtable_rehash_chain(struct rhashtable *ht,
 	while (!(err = rhashtable_rehash_one(ht, old_hash)))
 		;
 
-	if (err == -ENOENT) {
-		old_tbl->rehash++;
+	if (err == -ENOENT)
 		err = 0;
-	}
+
 	spin_unlock_bh(old_bucket_lock);
 
 	return err;
@@ -332,13 +332,16 @@ static int rhashtable_rehash_table(struct rhashtable *ht)
 	spin_lock(&ht->lock);
 	list_for_each_entry(walker, &old_tbl->walkers, list)
 		walker->tbl = NULL;
-	spin_unlock(&ht->lock);
 
 	/* Wait for readers. All new readers will see the new
 	 * table, and thus no references to the old table will
 	 * remain.
+	 * We do this inside the locked region so that
+	 * rhashtable_walk_stop() can use rcu_head_after_call_rcu()
+	 * to check if it should not re-link the table.
 	 */
 	call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
+	spin_unlock(&ht->lock);
 
 	return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
 }
@@ -580,36 +583,14 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 	struct bucket_table *new_tbl;
 	struct bucket_table *tbl;
 	unsigned int hash;
-	spinlock_t *lock;
 	void *data;
 
-	tbl = rcu_dereference(ht->tbl);
-
-	/* All insertions must grab the oldest table containing
-	 * the hashed bucket that is yet to be rehashed.
-	 */
-	for (;;) {
-		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
-		lock = rht_bucket_lock(tbl, hash);
-		spin_lock_bh(lock);
-
-		if (tbl->rehash <= hash)
-			break;
-
-		spin_unlock_bh(lock);
-		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
-	}
-
-	data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
-	new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
-	if (PTR_ERR(new_tbl) != -EEXIST)
-		data = ERR_CAST(new_tbl);
+	new_tbl = rcu_dereference(ht->tbl);
 
-	while (!IS_ERR_OR_NULL(new_tbl)) {
+	do {
 		tbl = new_tbl;
 		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
-		spin_lock_nested(rht_bucket_lock(tbl, hash),
-				 SINGLE_DEPTH_NESTING);
+		spin_lock(rht_bucket_lock(tbl, hash));
 
 		data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
 		new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
@@ -617,9 +598,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 			data = ERR_CAST(new_tbl);
 
 		spin_unlock(rht_bucket_lock(tbl, hash));
-	}
-
-	spin_unlock_bh(lock);
+	} while (!IS_ERR_OR_NULL(new_tbl));
 
 	if (PTR_ERR(data) == -EAGAIN)
 		data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
@@ -941,10 +920,11 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
 	ht = iter->ht;
 
 	spin_lock(&ht->lock);
-	if (tbl->rehash < tbl->size)
-		list_add(&iter->walker.list, &tbl->walkers);
-	else
+	if (rcu_head_after_call_rcu(&tbl->rcu, bucket_table_free_rcu))
+		/* This bucket table is being freed, don't re-link it. */
 		iter->walker.tbl = NULL;
+	else
+		list_add(&iter->walker.list, &tbl->walkers);
 	spin_unlock(&ht->lock);
 
 out:



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from.
  2019-03-14  5:05 [PATCH 0/3] Three rhashtable improvements NeilBrown
@ 2019-03-14  5:05 ` NeilBrown
  2019-03-15  5:49   ` Herbert Xu
  2019-03-17  7:50   ` Miguel Ojeda
  2019-03-14  5:05 ` [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion NeilBrown
  2019-03-14  5:05 ` [PATCH 1/3] rhashtable: use cmpxchg() in nested_table_alloc() NeilBrown
  2 siblings, 2 replies; 14+ messages in thread
From: NeilBrown @ 2019-03-14  5:05 UTC (permalink / raw)
  To: Thomas Graf, Herbert Xu; +Cc: netdev, Paul E. McKenney, linux-kernel

The pattern set by list.h is that for_each..continue()
iterators start at the next entry after the given one,
while for_each..from() iterators start at the given
entry.

The rht_for_each*continue() iterators are documented as though the
start at the 'next' entry, but actually start at the given entry,
and they are used expecting that behaviour.
So fix the documentation and change the names to *from for consistency
with list.h

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .clang-format              |    8 ++++----
 include/linux/rhashtable.h |   40 ++++++++++++++++++++--------------------
 lib/rhashtable.c           |    2 +-
 3 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/.clang-format b/.clang-format
index 201a4f531b90..8257b96ffbaf 100644
--- a/.clang-format
+++ b/.clang-format
@@ -367,14 +367,14 @@ ForEachMacros:
   - 'rhl_for_each_entry_rcu'
   - 'rhl_for_each_rcu'
   - 'rht_for_each'
-  - 'rht_for_each_continue'
+  - 'rht_for_each_from'
   - 'rht_for_each_entry'
-  - 'rht_for_each_entry_continue'
+  - 'rht_for_each_entry_from'
   - 'rht_for_each_entry_rcu'
-  - 'rht_for_each_entry_rcu_continue'
+  - 'rht_for_each_entry_rcu_from'
   - 'rht_for_each_entry_safe'
   - 'rht_for_each_rcu'
-  - 'rht_for_each_rcu_continue'
+  - 'rht_for_each_rcu_from'
   - '__rq_for_each_bio'
   - 'rq_for_each_segment'
   - 'scsi_for_each_prot_sg'
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 3864193d5e2e..86dfa417848d 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -306,13 +306,13 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
 }
 
 /**
- * rht_for_each_continue - continue iterating over hash chain
+ * rht_for_each_from - iterate over hash chain from given head
  * @pos:	the &struct rhash_head to use as a loop cursor.
- * @head:	the previous &struct rhash_head to continue from
+ * @head:	the &struct rhash_head to start from
  * @tbl:	the &struct bucket_table
  * @hash:	the hash value / bucket index
  */
-#define rht_for_each_continue(pos, head, tbl, hash) \
+#define rht_for_each_from(pos, head, tbl, hash) \
 	for (pos = rht_dereference_bucket(head, tbl, hash); \
 	     !rht_is_a_nulls(pos); \
 	     pos = rht_dereference_bucket((pos)->next, tbl, hash))
@@ -324,18 +324,18 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
  * @hash:	the hash value / bucket index
  */
 #define rht_for_each(pos, tbl, hash) \
-	rht_for_each_continue(pos, *rht_bucket(tbl, hash), tbl, hash)
+	rht_for_each_from(pos, *rht_bucket(tbl, hash), tbl, hash)
 
 /**
- * rht_for_each_entry_continue - continue iterating over hash chain
+ * rht_for_each_entry_from - iterate over hash chain from given head
  * @tpos:	the type * to use as a loop cursor.
  * @pos:	the &struct rhash_head to use as a loop cursor.
- * @head:	the previous &struct rhash_head to continue from
+ * @head:	the &struct rhash_head to start from
  * @tbl:	the &struct bucket_table
  * @hash:	the hash value / bucket index
  * @member:	name of the &struct rhash_head within the hashable struct.
  */
-#define rht_for_each_entry_continue(tpos, pos, head, tbl, hash, member)	\
+#define rht_for_each_entry_from(tpos, pos, head, tbl, hash, member)	\
 	for (pos = rht_dereference_bucket(head, tbl, hash);		\
 	     (!rht_is_a_nulls(pos)) && rht_entry(tpos, pos, member);	\
 	     pos = rht_dereference_bucket((pos)->next, tbl, hash))
@@ -349,7 +349,7 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
  * @member:	name of the &struct rhash_head within the hashable struct.
  */
 #define rht_for_each_entry(tpos, pos, tbl, hash, member)		\
-	rht_for_each_entry_continue(tpos, pos, *rht_bucket(tbl, hash),	\
+	rht_for_each_entry_from(tpos, pos, *rht_bucket(tbl, hash),	\
 				    tbl, hash, member)
 
 /**
@@ -374,9 +374,9 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
 		       rht_dereference_bucket(pos->next, tbl, hash) : NULL)
 
 /**
- * rht_for_each_rcu_continue - continue iterating over rcu hash chain
+ * rht_for_each_rcu_from - iterate over rcu hash chain from given head
  * @pos:	the &struct rhash_head to use as a loop cursor.
- * @head:	the previous &struct rhash_head to continue from
+ * @head:	the &struct rhash_head to start from
  * @tbl:	the &struct bucket_table
  * @hash:	the hash value / bucket index
  *
@@ -384,7 +384,7 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
  * the _rcu mutation primitives such as rhashtable_insert() as long as the
  * traversal is guarded by rcu_read_lock().
  */
-#define rht_for_each_rcu_continue(pos, head, tbl, hash)			\
+#define rht_for_each_rcu_from(pos, head, tbl, hash)			\
 	for (({barrier(); }),						\
 	     pos = rht_dereference_bucket_rcu(head, tbl, hash);		\
 	     !rht_is_a_nulls(pos);					\
@@ -401,13 +401,13 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
  * traversal is guarded by rcu_read_lock().
  */
 #define rht_for_each_rcu(pos, tbl, hash)				\
-	rht_for_each_rcu_continue(pos, *rht_bucket(tbl, hash), tbl, hash)
+	rht_for_each_rcu_from(pos, *rht_bucket(tbl, hash), tbl, hash)
 
 /**
- * rht_for_each_entry_rcu_continue - continue iterating over rcu hash chain
+ * rht_for_each_entry_rcu_from - iterated over rcu hash chain from given head
  * @tpos:	the type * to use as a loop cursor.
  * @pos:	the &struct rhash_head to use as a loop cursor.
- * @head:	the previous &struct rhash_head to continue from
+ * @head:	the &struct rhash_head to start from
  * @tbl:	the &struct bucket_table
  * @hash:	the hash value / bucket index
  * @member:	name of the &struct rhash_head within the hashable struct.
@@ -416,7 +416,7 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
  * the _rcu mutation primitives such as rhashtable_insert() as long as the
  * traversal is guarded by rcu_read_lock().
  */
-#define rht_for_each_entry_rcu_continue(tpos, pos, head, tbl, hash, member) \
+#define rht_for_each_entry_rcu_from(tpos, pos, head, tbl, hash, member) \
 	for (({barrier(); }),						    \
 	     pos = rht_dereference_bucket_rcu(head, tbl, hash);		    \
 	     (!rht_is_a_nulls(pos)) && rht_entry(tpos, pos, member);	    \
@@ -435,7 +435,7 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
  * traversal is guarded by rcu_read_lock().
  */
 #define rht_for_each_entry_rcu(tpos, pos, tbl, hash, member)		   \
-	rht_for_each_entry_rcu_continue(tpos, pos, *rht_bucket(tbl, hash), \
+	rht_for_each_entry_rcu_from(tpos, pos, *rht_bucket(tbl, hash), \
 					tbl, hash, member)
 
 /**
@@ -491,7 +491,7 @@ static inline struct rhash_head *__rhashtable_lookup(
 	hash = rht_key_hashfn(ht, tbl, key, params);
 	head = rht_bucket(tbl, hash);
 	do {
-		rht_for_each_rcu_continue(he, *head, tbl, hash) {
+		rht_for_each_rcu_from(he, *head, tbl, hash) {
 			if (params.obj_cmpfn ?
 			    params.obj_cmpfn(&arg, rht_obj(ht, he)) :
 			    rhashtable_compare(&arg, rht_obj(ht, he)))
@@ -625,7 +625,7 @@ static inline void *__rhashtable_insert_fast(
 	if (!pprev)
 		goto out;
 
-	rht_for_each_continue(head, *pprev, tbl, hash) {
+	rht_for_each_from(head, *pprev, tbl, hash) {
 		struct rhlist_head *plist;
 		struct rhlist_head *list;
 
@@ -890,7 +890,7 @@ static inline int __rhashtable_remove_fast_one(
 	spin_lock_bh(lock);
 
 	pprev = rht_bucket_var(tbl, hash);
-	rht_for_each_continue(he, *pprev, tbl, hash) {
+	rht_for_each_from(he, *pprev, tbl, hash) {
 		struct rhlist_head *list;
 
 		list = container_of(he, struct rhlist_head, rhead);
@@ -1042,7 +1042,7 @@ static inline int __rhashtable_replace_fast(
 	spin_lock_bh(lock);
 
 	pprev = rht_bucket_var(tbl, hash);
-	rht_for_each_continue(he, *pprev, tbl, hash) {
+	rht_for_each_from(he, *pprev, tbl, hash) {
 		if (he != obj_old) {
 			pprev = &he->next;
 			continue;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 03ba449c6d38..983c8a3d7ed5 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -492,7 +492,7 @@ static void *rhashtable_lookup_one(struct rhashtable *ht,
 
 	elasticity = RHT_ELASTICITY;
 	pprev = rht_bucket_var(tbl, hash);
-	rht_for_each_continue(head, *pprev, tbl, hash) {
+	rht_for_each_from(head, *pprev, tbl, hash) {
 		struct rhlist_head *list;
 		struct rhlist_head *plist;
 



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion.
  2019-03-14  5:05 ` [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion NeilBrown
@ 2019-03-14 14:58   ` Paul E. McKenney
  2019-03-15  5:47   ` Herbert Xu
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2019-03-14 14:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, Herbert Xu, netdev, linux-kernel

On Thu, Mar 14, 2019 at 04:05:28PM +1100, NeilBrown wrote:
> rhashtable_try_insert() currently holds a lock on the bucket in
> the first table, while also locking buckets in subsequent tables.
> This is unnecessary and looks like a hold-over from some earlier
> version of the implementation.
> 
> As insert and remove always lock a bucket in each table in turn, and
> as insert only inserts in the final table, there cannot be any races
> that are not covered by simply locking a bucket in each table in turn.
> 
> When an insert call reaches that last table it can be sure that there
> is no matchinf entry in any other table as it has searched them all, and
> insertion never happens anywhere but in the last table.  The fact that
> code tests for the existence of future_tbl while holding a lock on
> the relevant bucket ensures that two threads inserting the same key
> will make compatible decisions about which is the "last" table.
> 
> This simplifies the code and allows the ->rehash field to be
> discarded.
> 
> We still need a way to ensure that a dead bucket_table is never
> re-linked by rhashtable_walk_stop().  This can be achieved by calling
> call_rcu() inside the locked region, and checking with
> rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
> bucket table is empty and dead.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

This looks good to me.  From an rcu_head_init() and
rcu_head_after_call_rcu() viewpoint, and assuming that the value in
rhashtable_walk_stop()'s tbl pointer was fetched using rcu_dereference()
or similar within the same RCU read-side critical section in effect
during the call to rhashtable_walk_stop():

Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>

Some commentary below outlining my reasoning in more detail.

> ---
>  include/linux/rhashtable.h |   13 -----------
>  lib/rhashtable.c           |   50 +++++++++++++-------------------------------
>  2 files changed, 15 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
> index ae9c0f71f311..3864193d5e2e 100644
> --- a/include/linux/rhashtable.h
> +++ b/include/linux/rhashtable.h
> @@ -63,7 +63,6 @@
>  struct bucket_table {
>  	unsigned int		size;
>  	unsigned int		nest;
> -	unsigned int		rehash;
>  	u32			hash_rnd;
>  	unsigned int		locks_mask;
>  	spinlock_t		*locks;
> @@ -776,12 +775,6 @@ static inline int rhltable_insert(
>   * @obj:	pointer to hash head inside object
>   * @params:	hash table parameters
>   *
> - * Locks down the bucket chain in both the old and new table if a resize
> - * is in progress to ensure that writers can't remove from the old table
> - * and can't insert to the new table during the atomic operation of search
> - * and insertion. Searches for duplicates in both the old and new table if
> - * a resize is in progress.
> - *
>   * This lookup function may only be used for fixed key hash table (key_len
>   * parameter set). It will BUG() if used inappropriately.
>   *
> @@ -837,12 +830,6 @@ static inline void *rhashtable_lookup_get_insert_fast(
>   * @obj:	pointer to hash head inside object
>   * @params:	hash table parameters
>   *
> - * Locks down the bucket chain in both the old and new table if a resize
> - * is in progress to ensure that writers can't remove from the old table
> - * and can't insert to the new table during the atomic operation of search
> - * and insertion. Searches for duplicates in both the old and new table if
> - * a resize is in progress.
> - *
>   * Lookups may occur in parallel with hashtable mutations and resizing.
>   *
>   * Will trigger an automatic deferred table resizing if residency in the
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index c983c0ee15d5..03ba449c6d38 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -199,6 +199,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
>  		return NULL;
>  	}
> 
> +	rcu_head_init(&tbl->rcu);

Good, you initialize this while allocating.  Presumably there are not any
other sneak allocations.  ;-)

>  	INIT_LIST_HEAD(&tbl->walkers);
> 
>  	tbl->hash_rnd = get_random_u32();
> @@ -282,10 +283,9 @@ static int rhashtable_rehash_chain(struct rhashtable *ht,
>  	while (!(err = rhashtable_rehash_one(ht, old_hash)))
>  		;
> 
> -	if (err == -ENOENT) {
> -		old_tbl->rehash++;
> +	if (err == -ENOENT)
>  		err = 0;
> -	}
> +
>  	spin_unlock_bh(old_bucket_lock);
> 
>  	return err;
> @@ -332,13 +332,16 @@ static int rhashtable_rehash_table(struct rhashtable *ht)
>  	spin_lock(&ht->lock);
>  	list_for_each_entry(walker, &old_tbl->walkers, list)
>  		walker->tbl = NULL;
> -	spin_unlock(&ht->lock);
> 
>  	/* Wait for readers. All new readers will see the new
>  	 * table, and thus no references to the old table will
>  	 * remain.
> +	 * We do this inside the locked region so that
> +	 * rhashtable_walk_stop() can use rcu_head_after_call_rcu()
> +	 * to check if it should not re-link the table.
>  	 */
>  	call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
> +	spin_unlock(&ht->lock);

My first thought was that moving this spin_unlock() was unnecessary,
but that was due to my focusing solely on avoiding the splat that
rcu_head_after_call_rcu() can generate.  Thinking a bit harder about
it, it looks like the purpose of moving the lock is to make sure
that rhashtable_walk_stop()'s check doesn't happen between the above
list_for_each_entry() and the above call_rcu(), which would result in
rcu_head_after_call_rcu() saying "Nope, no call_rcu() yet!".  And then
rhashtable_walk_stop() would add the about-to-be-call_rcu()ed element
back into the table, which would void all manner of warranties.

So, yes, I finally see why it is absolutely necessary to move this
spin_unlock().  ;-)

>  	return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
>  }
> @@ -580,36 +583,14 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>  	struct bucket_table *new_tbl;
>  	struct bucket_table *tbl;
>  	unsigned int hash;
> -	spinlock_t *lock;
>  	void *data;
> 
> -	tbl = rcu_dereference(ht->tbl);
> -
> -	/* All insertions must grab the oldest table containing
> -	 * the hashed bucket that is yet to be rehashed.
> -	 */
> -	for (;;) {
> -		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
> -		lock = rht_bucket_lock(tbl, hash);
> -		spin_lock_bh(lock);
> -
> -		if (tbl->rehash <= hash)
> -			break;
> -
> -		spin_unlock_bh(lock);
> -		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
> -	}
> -
> -	data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
> -	new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
> -	if (PTR_ERR(new_tbl) != -EEXIST)
> -		data = ERR_CAST(new_tbl);
> +	new_tbl = rcu_dereference(ht->tbl);
> 
> -	while (!IS_ERR_OR_NULL(new_tbl)) {
> +	do {
>  		tbl = new_tbl;
>  		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
> -		spin_lock_nested(rht_bucket_lock(tbl, hash),
> -				 SINGLE_DEPTH_NESTING);
> +		spin_lock(rht_bucket_lock(tbl, hash));
> 
>  		data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
>  		new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
> @@ -617,9 +598,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>  			data = ERR_CAST(new_tbl);
> 
>  		spin_unlock(rht_bucket_lock(tbl, hash));
> -	}
> -
> -	spin_unlock_bh(lock);
> +	} while (!IS_ERR_OR_NULL(new_tbl));
> 
>  	if (PTR_ERR(data) == -EAGAIN)
>  		data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
> @@ -941,10 +920,11 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
>  	ht = iter->ht;
> 
>  	spin_lock(&ht->lock);
> -	if (tbl->rehash < tbl->size)
> -		list_add(&iter->walker.list, &tbl->walkers);
> -	else
> +	if (rcu_head_after_call_rcu(&tbl->rcu, bucket_table_free_rcu))

And in v5.0, this is within an RCU read-side critical section, as is
necessary.  (Otherwise, the grace period might end and the thing
pointed to by tbl might be reallocated as something else, in which
case with high probability rcu_head_after_call_rcu() would complain
bitterly.)

This assumes that tbl was also fetched with rcu_dereference() within
this same RCU read-side critical section.  A quick glance at the
rhashtable_walk_start_check() leads me to believe that this is
the case, but I must confess that I did not check all the calls to
rhashtable_walk_stop() and rhashtable_walk_start().  Yes, lazy this
morning, what can I say?

> +		/* This bucket table is being freed, don't re-link it. */
>  		iter->walker.tbl = NULL;
> +	else
> +		list_add(&iter->walker.list, &tbl->walkers);
>  	spin_unlock(&ht->lock);
> 
>  out:
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] rhashtable: use cmpxchg() in nested_table_alloc()
  2019-03-14  5:05 ` [PATCH 1/3] rhashtable: use cmpxchg() in nested_table_alloc() NeilBrown
@ 2019-03-15  5:10   ` Herbert Xu
  2019-03-15  6:51     ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Herbert Xu @ 2019-03-15  5:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, Paul E. McKenney, linux-kernel

Hi Neil:

On Thu, Mar 14, 2019 at 04:05:28PM +1100, NeilBrown wrote:
> nested_table_alloc() relies on the fact that there is
> at most one spinlock allocated for every slot in the top
> level nested table, so it is not possible for two threads
> to try to allocate the same table at the same time.
> 
> This assumption is a little fragile (it is not explicit) and is
> unnecessary.  We can instead protect against
> a race using cmpxchg() - if it the cmp fails, free the table
> that was just allocated.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  lib/rhashtable.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)

You probably explained it to me before but it's been long enough
that I no longer remember why we need this change.  So please
explain in the commit log why this change is needed.  Because
on the face of it you are adding locking/sychronisation and not
taking it away.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion.
  2019-03-14  5:05 ` [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion NeilBrown
  2019-03-14 14:58   ` Paul E. McKenney
@ 2019-03-15  5:47   ` Herbert Xu
  2019-03-15  6:46     ` NeilBrown
  1 sibling, 1 reply; 14+ messages in thread
From: Herbert Xu @ 2019-03-15  5:47 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, Paul E. McKenney, linux-kernel

On Thu, Mar 14, 2019 at 04:05:28PM +1100, NeilBrown wrote:
> rhashtable_try_insert() currently holds a lock on the bucket in
> the first table, while also locking buckets in subsequent tables.
> This is unnecessary and looks like a hold-over from some earlier
> version of the implementation.
> 
> As insert and remove always lock a bucket in each table in turn, and
> as insert only inserts in the final table, there cannot be any races
> that are not covered by simply locking a bucket in each table in turn.
> 
> When an insert call reaches that last table it can be sure that there
> is no matchinf entry in any other table as it has searched them all, and
> insertion never happens anywhere but in the last table.  The fact that
> code tests for the existence of future_tbl while holding a lock on
> the relevant bucket ensures that two threads inserting the same key
> will make compatible decisions about which is the "last" table.
> 
> This simplifies the code and allows the ->rehash field to be
> discarded.
> 
> We still need a way to ensure that a dead bucket_table is never
> re-linked by rhashtable_walk_stop().  This can be achieved by calling
> call_rcu() inside the locked region, and checking with
> rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
> bucket table is empty and dead.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  include/linux/rhashtable.h |   13 -----------
>  lib/rhashtable.c           |   50 +++++++++++++-------------------------------
>  2 files changed, 15 insertions(+), 48 deletions(-)

Thanks! This looks very nice.

> @@ -580,36 +583,14 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>  	struct bucket_table *new_tbl;
>  	struct bucket_table *tbl;
>  	unsigned int hash;
> -	spinlock_t *lock;
>  	void *data;
>  
> -	tbl = rcu_dereference(ht->tbl);
> -
> -	/* All insertions must grab the oldest table containing
> -	 * the hashed bucket that is yet to be rehashed.
> -	 */
> -	for (;;) {
> -		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
> -		lock = rht_bucket_lock(tbl, hash);
> -		spin_lock_bh(lock);
> -
> -		if (tbl->rehash <= hash)
> -			break;
> -
> -		spin_unlock_bh(lock);
> -		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
> -	}
> -
> -	data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
> -	new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
> -	if (PTR_ERR(new_tbl) != -EEXIST)
> -		data = ERR_CAST(new_tbl);
> +	new_tbl = rcu_dereference(ht->tbl);
>  
> -	while (!IS_ERR_OR_NULL(new_tbl)) {
> +	do {
>  		tbl = new_tbl;
>  		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
> -		spin_lock_nested(rht_bucket_lock(tbl, hash),
> -				 SINGLE_DEPTH_NESTING);
> +		spin_lock(rht_bucket_lock(tbl, hash));

One small problem.  I think this needs to be spin_lock_bh.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from.
  2019-03-14  5:05 ` [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from NeilBrown
@ 2019-03-15  5:49   ` Herbert Xu
  2019-03-17  7:50   ` Miguel Ojeda
  1 sibling, 0 replies; 14+ messages in thread
From: Herbert Xu @ 2019-03-15  5:49 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, Paul E. McKenney, linux-kernel

On Thu, Mar 14, 2019 at 04:05:28PM +1100, NeilBrown wrote:
> The pattern set by list.h is that for_each..continue()
> iterators start at the next entry after the given one,
> while for_each..from() iterators start at the given
> entry.
> 
> The rht_for_each*continue() iterators are documented as though the
> start at the 'next' entry, but actually start at the given entry,
> and they are used expecting that behaviour.
> So fix the documentation and change the names to *from for consistency
> with list.h
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .clang-format              |    8 ++++----
>  include/linux/rhashtable.h |   40 ++++++++++++++++++++--------------------
>  lib/rhashtable.c           |    2 +-
>  3 files changed, 25 insertions(+), 25 deletions(-)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion.
  2019-03-15  5:47   ` Herbert Xu
@ 2019-03-15  6:46     ` NeilBrown
  2019-03-20  5:40       ` Herbert Xu
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2019-03-15  6:46 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Thomas Graf, netdev, Paul E. McKenney, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 9732 bytes --]

On Fri, Mar 15 2019, Herbert Xu wrote:

> On Thu, Mar 14, 2019 at 04:05:28PM +1100, NeilBrown wrote:
>> rhashtable_try_insert() currently holds a lock on the bucket in
>> the first table, while also locking buckets in subsequent tables.
>> This is unnecessary and looks like a hold-over from some earlier
>> version of the implementation.
>> 
>> As insert and remove always lock a bucket in each table in turn, and
>> as insert only inserts in the final table, there cannot be any races
>> that are not covered by simply locking a bucket in each table in turn.
>> 
>> When an insert call reaches that last table it can be sure that there
>> is no matchinf entry in any other table as it has searched them all, and
>> insertion never happens anywhere but in the last table.  The fact that
>> code tests for the existence of future_tbl while holding a lock on
>> the relevant bucket ensures that two threads inserting the same key
>> will make compatible decisions about which is the "last" table.
>> 
>> This simplifies the code and allows the ->rehash field to be
>> discarded.
>> 
>> We still need a way to ensure that a dead bucket_table is never
>> re-linked by rhashtable_walk_stop().  This can be achieved by calling
>> call_rcu() inside the locked region, and checking with
>> rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
>> bucket table is empty and dead.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  include/linux/rhashtable.h |   13 -----------
>>  lib/rhashtable.c           |   50 +++++++++++++-------------------------------
>>  2 files changed, 15 insertions(+), 48 deletions(-)
>
> Thanks! This looks very nice.
>
>> @@ -580,36 +583,14 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>>  	struct bucket_table *new_tbl;
>>  	struct bucket_table *tbl;
>>  	unsigned int hash;
>> -	spinlock_t *lock;
>>  	void *data;
>>  
>> -	tbl = rcu_dereference(ht->tbl);
>> -
>> -	/* All insertions must grab the oldest table containing
>> -	 * the hashed bucket that is yet to be rehashed.
>> -	 */
>> -	for (;;) {
>> -		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
>> -		lock = rht_bucket_lock(tbl, hash);
>> -		spin_lock_bh(lock);
>> -
>> -		if (tbl->rehash <= hash)
>> -			break;
>> -
>> -		spin_unlock_bh(lock);
>> -		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>> -	}
>> -
>> -	data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
>> -	new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
>> -	if (PTR_ERR(new_tbl) != -EEXIST)
>> -		data = ERR_CAST(new_tbl);
>> +	new_tbl = rcu_dereference(ht->tbl);
>>  
>> -	while (!IS_ERR_OR_NULL(new_tbl)) {
>> +	do {
>>  		tbl = new_tbl;
>>  		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
>> -		spin_lock_nested(rht_bucket_lock(tbl, hash),
>> -				 SINGLE_DEPTH_NESTING);
>> +		spin_lock(rht_bucket_lock(tbl, hash));
>
> One small problem.  I think this needs to be spin_lock_bh.

Yes, thanks for the catching that - and the match spin_unlock needs
fixing too.  See below.

Thanks,
NeilBrown

From: NeilBrown <neilb@suse.com>
Date: Mon, 25 Jun 2018 08:09:19 +1000
Subject: [PATCH] rhashtable: don't hold lock on first table throughout
 insertion.

rhashtable_try_insert() currently holds a lock on the bucket in
the first table, while also locking buckets in subsequent tables.
This is unnecessary and looks like a hold-over from some earlier
version of the implementation.

As insert and remove always lock a bucket in each table in turn, and
as insert only inserts in the final table, there cannot be any races
that are not covered by simply locking a bucket in each table in turn.

When an insert call reaches that last table it can be sure that there
is no matchinf entry in any other table as it has searched them all, and
insertion never happens anywhere but in the last table.  The fact that
code tests for the existence of future_tbl while holding a lock on
the relevant bucket ensures that two threads inserting the same key
will make compatible decisions about which is the "last" table.

This simplifies the code and allows the ->rehash field to be
discarded.

We still need a way to ensure that a dead bucket_table is never
re-linked by rhashtable_walk_stop().  This can be achieved by calling
call_rcu() inside the locked region, and checking with
rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
bucket table is empty and dead.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 include/linux/rhashtable.h | 13 ------------
 lib/rhashtable.c           | 52 ++++++++++++++--------------------------------
 2 files changed, 16 insertions(+), 49 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index ae9c0f71f311..3864193d5e2e 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -63,7 +63,6 @@
 struct bucket_table {
 	unsigned int		size;
 	unsigned int		nest;
-	unsigned int		rehash;
 	u32			hash_rnd;
 	unsigned int		locks_mask;
 	spinlock_t		*locks;
@@ -776,12 +775,6 @@ static inline int rhltable_insert(
  * @obj:	pointer to hash head inside object
  * @params:	hash table parameters
  *
- * Locks down the bucket chain in both the old and new table if a resize
- * is in progress to ensure that writers can't remove from the old table
- * and can't insert to the new table during the atomic operation of search
- * and insertion. Searches for duplicates in both the old and new table if
- * a resize is in progress.
- *
  * This lookup function may only be used for fixed key hash table (key_len
  * parameter set). It will BUG() if used inappropriately.
  *
@@ -837,12 +830,6 @@ static inline void *rhashtable_lookup_get_insert_fast(
  * @obj:	pointer to hash head inside object
  * @params:	hash table parameters
  *
- * Locks down the bucket chain in both the old and new table if a resize
- * is in progress to ensure that writers can't remove from the old table
- * and can't insert to the new table during the atomic operation of search
- * and insertion. Searches for duplicates in both the old and new table if
- * a resize is in progress.
- *
  * Lookups may occur in parallel with hashtable mutations and resizing.
  *
  * Will trigger an automatic deferred table resizing if residency in the
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index c983c0ee15d5..8711c694e359 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -199,6 +199,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 		return NULL;
 	}
 
+	rcu_head_init(&tbl->rcu);
 	INIT_LIST_HEAD(&tbl->walkers);
 
 	tbl->hash_rnd = get_random_u32();
@@ -282,10 +283,9 @@ static int rhashtable_rehash_chain(struct rhashtable *ht,
 	while (!(err = rhashtable_rehash_one(ht, old_hash)))
 		;
 
-	if (err == -ENOENT) {
-		old_tbl->rehash++;
+	if (err == -ENOENT)
 		err = 0;
-	}
+
 	spin_unlock_bh(old_bucket_lock);
 
 	return err;
@@ -332,13 +332,16 @@ static int rhashtable_rehash_table(struct rhashtable *ht)
 	spin_lock(&ht->lock);
 	list_for_each_entry(walker, &old_tbl->walkers, list)
 		walker->tbl = NULL;
-	spin_unlock(&ht->lock);
 
 	/* Wait for readers. All new readers will see the new
 	 * table, and thus no references to the old table will
 	 * remain.
+	 * We do this inside the locked region so that
+	 * rhashtable_walk_stop() can use rcu_head_after_call_rcu()
+	 * to check if it should not re-link the table.
 	 */
 	call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
+	spin_unlock(&ht->lock);
 
 	return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
 }
@@ -580,46 +583,22 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 	struct bucket_table *new_tbl;
 	struct bucket_table *tbl;
 	unsigned int hash;
-	spinlock_t *lock;
 	void *data;
 
-	tbl = rcu_dereference(ht->tbl);
-
-	/* All insertions must grab the oldest table containing
-	 * the hashed bucket that is yet to be rehashed.
-	 */
-	for (;;) {
-		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
-		lock = rht_bucket_lock(tbl, hash);
-		spin_lock_bh(lock);
-
-		if (tbl->rehash <= hash)
-			break;
-
-		spin_unlock_bh(lock);
-		tbl = rht_dereference_rcu(tbl->future_tbl, ht);
-	}
-
-	data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
-	new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
-	if (PTR_ERR(new_tbl) != -EEXIST)
-		data = ERR_CAST(new_tbl);
+	new_tbl = rcu_dereference(ht->tbl);
 
-	while (!IS_ERR_OR_NULL(new_tbl)) {
+	do {
 		tbl = new_tbl;
 		hash = rht_head_hashfn(ht, tbl, obj, ht->p);
-		spin_lock_nested(rht_bucket_lock(tbl, hash),
-				 SINGLE_DEPTH_NESTING);
+		spin_lock_bh(rht_bucket_lock(tbl, hash));
 
 		data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
 		new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
 		if (PTR_ERR(new_tbl) != -EEXIST)
 			data = ERR_CAST(new_tbl);
 
-		spin_unlock(rht_bucket_lock(tbl, hash));
-	}
-
-	spin_unlock_bh(lock);
+		spin_unlock_bh(rht_bucket_lock(tbl, hash));
+	} while (!IS_ERR_OR_NULL(new_tbl));
 
 	if (PTR_ERR(data) == -EAGAIN)
 		data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
@@ -941,10 +920,11 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
 	ht = iter->ht;
 
 	spin_lock(&ht->lock);
-	if (tbl->rehash < tbl->size)
-		list_add(&iter->walker.list, &tbl->walkers);
-	else
+	if (rcu_head_after_call_rcu(&tbl->rcu, bucket_table_free_rcu))
+		/* This bucket table is being freed, don't re-link it. */
 		iter->walker.tbl = NULL;
+	else
+		list_add(&iter->walker.list, &tbl->walkers);
 	spin_unlock(&ht->lock);
 
 out:
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] rhashtable: use cmpxchg() in nested_table_alloc()
  2019-03-15  5:10   ` Herbert Xu
@ 2019-03-15  6:51     ` NeilBrown
  2019-03-20  5:41       ` Herbert Xu
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2019-03-15  6:51 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Thomas Graf, netdev, Paul E. McKenney, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

On Fri, Mar 15 2019, Herbert Xu wrote:

> Hi Neil:
>
> On Thu, Mar 14, 2019 at 04:05:28PM +1100, NeilBrown wrote:
>> nested_table_alloc() relies on the fact that there is
>> at most one spinlock allocated for every slot in the top
>> level nested table, so it is not possible for two threads
>> to try to allocate the same table at the same time.
>> 
>> This assumption is a little fragile (it is not explicit) and is
>> unnecessary.  We can instead protect against
>> a race using cmpxchg() - if it the cmp fails, free the table
>> that was just allocated.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  lib/rhashtable.c |    8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> You probably explained it to me before but it's been long enough
> that I no longer remember why we need this change.  So please
> explain in the commit log why this change is needed.  Because
> on the face of it you are adding locking/sychronisation and not
> taking it away.
>

I hoped the patch could be justified on the basis that the current
behaviour is fragile - the dependency that a single spin lock covers a
while slot (and all children) in the top-level nested table is not at
all obvious.

I do have a stronger reason though - I want the replace the spinlocks
with bit-spin-locks.  With those we will only hold a lock for the
particular chain being worked on.  If you need that extra explanation to
justify the patch, I'll hold it over until the other two patches land
and the rest of the bit-spin-lock series is ready.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from.
  2019-03-14  5:05 ` [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from NeilBrown
  2019-03-15  5:49   ` Herbert Xu
@ 2019-03-17  7:50   ` Miguel Ojeda
  2019-03-18  0:15     ` NeilBrown
  1 sibling, 1 reply; 14+ messages in thread
From: Miguel Ojeda @ 2019-03-17  7:50 UTC (permalink / raw)
  To: NeilBrown
  Cc: Thomas Graf, Herbert Xu, Network Development, Paul E. McKenney,
	linux-kernel

On Thu, Mar 14, 2019 at 6:10 AM NeilBrown <neilb@suse.com> wrote:
>
> The pattern set by list.h is that for_each..continue()
> iterators start at the next entry after the given one,
> while for_each..from() iterators start at the given
> entry.
>
> The rht_for_each*continue() iterators are documented as though the
> start at the 'next' entry, but actually start at the given entry,
> and they are used expecting that behaviour.
> So fix the documentation and change the names to *from for consistency
> with list.h

Thank you for taking care of the .clang-format changes!

Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from.
  2019-03-17  7:50   ` Miguel Ojeda
@ 2019-03-18  0:15     ` NeilBrown
  0 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2019-03-18  0:15 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Thomas Graf, Herbert Xu, Network Development, Paul E. McKenney,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 771 bytes --]

On Sun, Mar 17 2019, Miguel Ojeda wrote:

> On Thu, Mar 14, 2019 at 6:10 AM NeilBrown <neilb@suse.com> wrote:
>>
>> The pattern set by list.h is that for_each..continue()
>> iterators start at the next entry after the given one,
>> while for_each..from() iterators start at the given
>> entry.
>>
>> The rht_for_each*continue() iterators are documented as though the
>> start at the 'next' entry, but actually start at the given entry,
>> and they are used expecting that behaviour.
>> So fix the documentation and change the names to *from for consistency
>> with list.h
>
> Thank you for taking care of the .clang-format changes!

:-) that's what "git grep" is for!

>
> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>

Thanks,
NeilBrown

>
> Cheers,
> Miguel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion.
  2019-03-15  6:46     ` NeilBrown
@ 2019-03-20  5:40       ` Herbert Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Herbert Xu @ 2019-03-20  5:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, Paul E. McKenney, linux-kernel

On Fri, Mar 15, 2019 at 05:46:37PM +1100, NeilBrown wrote:
>
> From: NeilBrown <neilb@suse.com>
> Date: Mon, 25 Jun 2018 08:09:19 +1000
> Subject: [PATCH] rhashtable: don't hold lock on first table throughout
>  insertion.
> 
> rhashtable_try_insert() currently holds a lock on the bucket in
> the first table, while also locking buckets in subsequent tables.
> This is unnecessary and looks like a hold-over from some earlier
> version of the implementation.
> 
> As insert and remove always lock a bucket in each table in turn, and
> as insert only inserts in the final table, there cannot be any races
> that are not covered by simply locking a bucket in each table in turn.
> 
> When an insert call reaches that last table it can be sure that there
> is no matchinf entry in any other table as it has searched them all, and
> insertion never happens anywhere but in the last table.  The fact that
> code tests for the existence of future_tbl while holding a lock on
> the relevant bucket ensures that two threads inserting the same key
> will make compatible decisions about which is the "last" table.
> 
> This simplifies the code and allows the ->rehash field to be
> discarded.
> 
> We still need a way to ensure that a dead bucket_table is never
> re-linked by rhashtable_walk_stop().  This can be achieved by calling
> call_rcu() inside the locked region, and checking with
> rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
> bucket table is empty and dead.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  include/linux/rhashtable.h | 13 ------------
>  lib/rhashtable.c           | 52 ++++++++++++++--------------------------------
>  2 files changed, 16 insertions(+), 49 deletions(-)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] rhashtable: use cmpxchg() in nested_table_alloc()
  2019-03-15  6:51     ` NeilBrown
@ 2019-03-20  5:41       ` Herbert Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Herbert Xu @ 2019-03-20  5:41 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, Paul E. McKenney, linux-kernel

On Fri, Mar 15, 2019 at 05:51:55PM +1100, NeilBrown wrote:
>
> I hoped the patch could be justified on the basis that the current
> behaviour is fragile - the dependency that a single spin lock covers a
> while slot (and all children) in the top-level nested table is not at
> all obvious.
> 
> I do have a stronger reason though - I want the replace the spinlocks
> with bit-spin-locks.  With those we will only hold a lock for the
> particular chain being worked on.  If you need that extra explanation to
> justify the patch, I'll hold it over until the other two patches land
> and the rest of the bit-spin-lock series is ready.

I think it would make more sense to combine this patch with your
bit-spin-lock patch in a series.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-03-20  5:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-14  5:05 [PATCH 0/3] Three rhashtable improvements NeilBrown
2019-03-14  5:05 ` [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from NeilBrown
2019-03-15  5:49   ` Herbert Xu
2019-03-17  7:50   ` Miguel Ojeda
2019-03-18  0:15     ` NeilBrown
2019-03-14  5:05 ` [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion NeilBrown
2019-03-14 14:58   ` Paul E. McKenney
2019-03-15  5:47   ` Herbert Xu
2019-03-15  6:46     ` NeilBrown
2019-03-20  5:40       ` Herbert Xu
2019-03-14  5:05 ` [PATCH 1/3] rhashtable: use cmpxchg() in nested_table_alloc() NeilBrown
2019-03-15  5:10   ` Herbert Xu
2019-03-15  6:51     ` NeilBrown
2019-03-20  5:41       ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).