linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Herbert Xu <herbert@gondor.apana.org.au>
To: Phil Sutter <phil@nwl.cc>,
	davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, tgraf@suug.ch,
	fengguang.wu@intel.com, wfg@linux.intel.com, lkp@01.org
Subject: rhashtable: Prevent spurious EBUSY errors on insertion
Date: Thu, 3 Dec 2015 20:41:29 +0800	[thread overview]
Message-ID: <20151203124129.GA5505@gondor.apana.org.au> (raw)
In-Reply-To: <20151130101859.GA8378@gondor.apana.org.au>

On Mon, Nov 30, 2015 at 06:18:59PM +0800, Herbert Xu wrote:
> 
> OK that's better.  I think I see the problem.  The test in
> rhashtable_insert_rehash is racy and if two threads both try
> to grow the table one of them may be tricked into doing a rehash
> instead.
> 
> I'm working on a fix.

OK this patch fixes the EBUSY problem as far as I can tell.  Please
let me know if you still observe EBUSY with it.  I'll respond to the
ENOMEM problem in another email.

---8<---
Thomas and Phil observed that under stress rhashtable insertion
sometimes failed with EBUSY, even though this error should only
ever been seen when we're under attack and our hash chain length
has grown to an unacceptable level, even after a rehash.

It turns out that the logic for detecting whether there is an
existing rehash is faulty.  In particular, when two threads both
try to grow the same table at the same time, one of them may see
the newly grown table and thus erroneously conclude that it had
been rehashed.  This is what leads to the EBUSY error.

This patch fixes this by remembering the current last table we
used during insertion so that rhashtable_insert_rehash can detect
when another thread has also done a resize/rehash.  When this is
detected we will give up our resize/rehash and simply retry the
insertion with the new table.

Reported-by: Thomas Graf <tgraf@suug.ch>
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 843ceca..e50b31d 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -19,6 +19,7 @@
 
 #include <linux/atomic.h>
 #include <linux/compiler.h>
+#include <linux/err.h>
 #include <linux/errno.h>
 #include <linux/jhash.h>
 #include <linux/list_nulls.h>
@@ -339,10 +340,11 @@ static inline int lockdep_rht_bucket_is_held(const struct bucket_table *tbl,
 int rhashtable_init(struct rhashtable *ht,
 		    const struct rhashtable_params *params);
 
-int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
-			   struct rhash_head *obj,
-			   struct bucket_table *old_tbl);
-int rhashtable_insert_rehash(struct rhashtable *ht);
+struct bucket_table *rhashtable_insert_slow(struct rhashtable *ht,
+					    const void *key,
+					    struct rhash_head *obj,
+					    struct bucket_table *old_tbl);
+int rhashtable_insert_rehash(struct rhashtable *ht, struct bucket_table *tbl);
 
 int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter);
 void rhashtable_walk_exit(struct rhashtable_iter *iter);
@@ -598,9 +600,11 @@ restart:
 
 	new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 	if (unlikely(new_tbl)) {
-		err = rhashtable_insert_slow(ht, key, obj, new_tbl);
-		if (err == -EAGAIN)
+		tbl = rhashtable_insert_slow(ht, key, obj, new_tbl);
+		if (!IS_ERR_OR_NULL(tbl))
 			goto slow_path;
+
+		err = PTR_ERR(tbl);
 		goto out;
 	}
 
@@ -611,7 +615,7 @@ restart:
 	if (unlikely(rht_grow_above_100(ht, tbl))) {
 slow_path:
 		spin_unlock_bh(lock);
-		err = rhashtable_insert_rehash(ht);
+		err = rhashtable_insert_rehash(ht, tbl);
 		rcu_read_unlock();
 		if (err)
 			return err;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index a54ff89..2ff7ed9 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -389,33 +389,31 @@ static bool rhashtable_check_elasticity(struct rhashtable *ht,
 	return false;
 }
 
-int rhashtable_insert_rehash(struct rhashtable *ht)
+int rhashtable_insert_rehash(struct rhashtable *ht,
+			     struct bucket_table *tbl)
 {
 	struct bucket_table *old_tbl;
 	struct bucket_table *new_tbl;
-	struct bucket_table *tbl;
 	unsigned int size;
 	int err;
 
 	old_tbl = rht_dereference_rcu(ht->tbl, ht);
-	tbl = rhashtable_last_table(ht, old_tbl);
 
 	size = tbl->size;
 
+	err = -EBUSY;
+
 	if (rht_grow_above_75(ht, tbl))
 		size *= 2;
 	/* Do not schedule more than one rehash */
 	else if (old_tbl != tbl)
-		return -EBUSY;
+		goto fail;
+
+	err = -ENOMEM;
 
 	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
-	if (new_tbl == NULL) {
-		/* Schedule async resize/rehash to try allocation
-		 * non-atomic context.
-		 */
-		schedule_work(&ht->run_work);
-		return -ENOMEM;
-	}
+	if (new_tbl == NULL)
+		goto fail;
 
 	err = rhashtable_rehash_attach(ht, tbl, new_tbl);
 	if (err) {
@@ -426,12 +424,24 @@ int rhashtable_insert_rehash(struct rhashtable *ht)
 		schedule_work(&ht->run_work);
 
 	return err;
+
+fail:
+	/* Do not fail the insert if someone else did a rehash. */
+	if (likely(rcu_dereference_raw(tbl->future_tbl)))
+		return 0;
+
+	/* Schedule async rehash to retry allocation in process context. */
+	if (err == -ENOMEM)
+		schedule_work(&ht->run_work);
+
+	return err;
 }
 EXPORT_SYMBOL_GPL(rhashtable_insert_rehash);
 
-int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
-			   struct rhash_head *obj,
-			   struct bucket_table *tbl)
+struct bucket_table *rhashtable_insert_slow(struct rhashtable *ht,
+					    const void *key,
+					    struct rhash_head *obj,
+					    struct bucket_table *tbl)
 {
 	struct rhash_head *head;
 	unsigned int hash;
@@ -467,7 +477,12 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 exit:
 	spin_unlock(rht_bucket_lock(tbl, hash));
 
-	return err;
+	if (err == 0)
+		return NULL;
+	else if (err == -EAGAIN)
+		return tbl;
+	else
+		return ERR_PTR(err);
 }
 EXPORT_SYMBOL_GPL(rhashtable_insert_slow);
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

  reply	other threads:[~2015-12-03 12:41 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20 17:17 [PATCH v2 0/4] improve fault-tolerance of rhashtable runtime-test Phil Sutter
2015-11-20 17:17 ` [PATCH v2 1/4] rhashtable-test: add cond_resched() to thread test Phil Sutter
2015-11-20 17:17 ` [PATCH v2 2/4] rhashtable-test: retry insert operations Phil Sutter
2015-11-20 17:17 ` [PATCH v2 3/4] rhashtable-test: calculate max_entries value by default Phil Sutter
2015-11-20 17:17 ` [PATCH v2 4/4] rhashtable-test: allow to retry even if -ENOMEM was returned Phil Sutter
2015-11-20 17:28   ` Phil Sutter
2015-11-23 17:38 ` [PATCH v2 0/4] improve fault-tolerance of rhashtable runtime-test David Miller
2015-11-30  9:37 ` Herbert Xu
2015-11-30 10:14   ` Phil Sutter
2015-11-30 10:18     ` Herbert Xu
2015-12-03 12:41       ` Herbert Xu [this message]
2015-12-03 15:38         ` rhashtable: Prevent spurious EBUSY errors on insertion Phil Sutter
2015-12-04 19:38         ` David Miller
2015-12-17  8:46         ` Xin Long
2015-12-17  8:48           ` Herbert Xu
2015-12-17  9:00             ` Xin Long
2015-12-17 16:07               ` Xin Long
2015-12-18  2:26                 ` Herbert Xu
2015-12-18  8:18                   ` Xin Long
2015-12-17 17:00               ` David Miller
2015-12-03 12:51       ` rhashtable: ENOMEM errors when hit with a flood of insertions Herbert Xu
2015-12-03 15:08         ` David Laight
2015-12-03 16:08         ` Eric Dumazet
2015-12-04  0:07           ` Herbert Xu
2015-12-04 14:39           ` rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation Herbert Xu
2015-12-04 17:01             ` Phil Sutter
2015-12-04 17:45               ` Eric Dumazet
2015-12-04 18:15                 ` Phil Sutter
2015-12-05  7:06                   ` Herbert Xu
2015-12-07 15:35                     ` Thomas Graf
2015-12-07 19:29                       ` David Miller
2015-12-09  2:18                     ` Thomas Graf
2015-12-09  2:24                       ` Herbert Xu
2015-12-09  2:36                         ` Thomas Graf
2015-12-09  2:38                           ` Herbert Xu
2015-12-09  2:42                             ` Thomas Graf
2015-12-04 21:53             ` David Miller
2015-12-05  7:03               ` Herbert Xu
2015-12-06  3:48                 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151203124129.GA5505@gondor.apana.org.au \
    --to=herbert@gondor.apana.org.au \
    --cc=davem@davemloft.net \
    --cc=fengguang.wu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=netdev@vger.kernel.org \
    --cc=phil@nwl.cc \
    --cc=tgraf@suug.ch \
    --cc=wfg@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).