linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Herbert Xu <herbert@gondor.apana.org.au>
To: Josh Elsasser <jelsasser@appneta.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	josh@elsasser.ca, Thomas Graf <tgraf@suug.ch>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [v2 PATCH] rhashtable: Still do rehash when we get EEXIST
Date: Thu, 24 Jan 2019 11:08:41 +0800	[thread overview]
Message-ID: <20190124030841.n4jtsqka5zji3e62@gondor.apana.org.au> (raw)
In-Reply-To: <20190123211758.104275-1-jelsasser@appneta.com>

On Wed, Jan 23, 2019 at 01:17:58PM -0800, Josh Elsasser wrote:
> When running workloads with large bursts of fragmented packets, we've seen
> a few machines stuck returning -EEXIST from rht_shrink() and endlessly
> rescheduling their hash table's deferred work, pegging a CPU core.
> 
> Root cause is commit da20420f83ea ("rhashtable: Add nested tables"), which
> stops ignoring the return code of rhashtable_shrink() and the reallocs
> used to grow the hashtable. This uncovers a bug in the shrink logic where
> "needs to shrink" check runs against the last table but the actual shrink
> operation runs on the first bucket_table in the hashtable (see below):
> 
>  +-------+    +--------------+          +---------------+
>  | ht    |    | "first" tbl  |          | "last" tbl    |
>  | - tbl ---> | - future_tbl ---------> |  - future_tbl ---> NULL
>  +-------+    +--------------+          +---------------+
>                ^^^                          ^^^
> 	       used by rhashtable_shrink()  used by rht_shrink_below_30()
> 
> A rehash then stalls out when both the last table needs to shrink, the
> first table has more elements than the target size, but rht_shrink() hits
> a non-NULL future_tbl and returns -EEXIST. This skips the item rehashing
> and kicks off a reschedule loop, as no forward progress can be made while
> the rhashtable needs to shrink.
> 
> Extend rhashtable_shrink() with a "tbl" param to avoid endless exit-and-
> reschedules after hitting the EEXIST, allowing it to check a future_tbl
> pointer that can actually be non-NULL and make forward progress when the
> hashtable needs to shrink.
> 
> Fixes: da20420f83ea ("rhashtable: Add nested tables")
> Signed-off-by: Josh Elsasser <jelsasser@appneta.com>

Thanks for catching this!

Although I think we should fix this in a different way.  The problem
here is that the shrink cannot proceed because there was a previous
rehash that is still incomplete.  We should wait for its completion
and then reattempt a shrinnk should it still be necessary.

So something like this:

---8<---
As it stands if a shrink is delayed because of an outstanding
rehash, we will go into a rescheduling loop without ever doing
the rehash.

This patch fixes this by still carrying out the rehash and then
rescheduling so that we can shrink after the completion of the
rehash should it still be necessary.

The return value of EEXIST captures this case and other cases
(e.g., another thread expanded/rehashed the table at the same
time) where we should still proceed with the rehash.

Fixes: da20420f83ea ("rhashtable: Add nested tables")
Reported-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 852ffa5160f1..4edcf3310513 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -416,8 +416,12 @@ static void rht_deferred_worker(struct work_struct *work)
 	else if (tbl->nest)
 		err = rhashtable_rehash_alloc(ht, tbl, tbl->size);
 
-	if (!err)
-		err = rhashtable_rehash_table(ht);
+	if (!err || err == -EEXIST) {
+		int nerr;
+
+		nerr = rhashtable_rehash_table(ht);
+		err = err ?: nerr;
+	}
 
 	mutex_unlock(&ht->mutex);
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

  reply	other threads:[~2019-01-24  3:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 21:17 [PATCH net] rhashtable: avoid reschedule loop after rapid growth and shrink Josh Elsasser
2019-01-24  3:08 ` Herbert Xu [this message]
2019-01-24  3:40   ` [v2 PATCH] rhashtable: Still do rehash when we get EEXIST Josh Elsasser
2019-01-26 22:02     ` Josh Elsasser
2019-03-20 22:39       ` Josh Hunt
     [not found]       ` <CAKA=qzY4Pzee9BVzRCciW32toeHSz7t0q9LuvQXKLG2fX9fBbg@mail.gmail.com>
2019-03-21  1:39         ` [v3 " Herbert Xu
2019-03-21  1:46           ` Herbert Xu
2019-03-21 20:58             ` David Miller
2019-03-21 20:58           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190124030841.n4jtsqka5zji3e62@gondor.apana.org.au \
    --to=herbert@gondor.apana.org.au \
    --cc=davem@davemloft.net \
    --cc=jelsasser@appneta.com \
    --cc=josh@elsasser.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).