All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] rhashtable: avoid reschedule loop after rapid growth and shrink
@ 2019-01-23 21:17 Josh Elsasser
  2019-01-24  3:08 ` [v2 PATCH] rhashtable: Still do rehash when we get EEXIST Herbert Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Josh Elsasser @ 2019-01-23 21:17 UTC (permalink / raw)
  To: David S . Miller
  Cc: josh, Josh Elsasser, Thomas Graf, Herbert Xu, netdev, linux-kernel

When running workloads with large bursts of fragmented packets, we've seen
a few machines stuck returning -EEXIST from rht_shrink() and endlessly
rescheduling their hash table's deferred work, pegging a CPU core.

Root cause is commit da20420f83ea ("rhashtable: Add nested tables"), which
stops ignoring the return code of rhashtable_shrink() and the reallocs
used to grow the hashtable. This uncovers a bug in the shrink logic where
"needs to shrink" check runs against the last table but the actual shrink
operation runs on the first bucket_table in the hashtable (see below):

 +-------+    +--------------+          +---------------+
 | ht    |    | "first" tbl  |          | "last" tbl    |
 | - tbl ---> | - future_tbl ---------> |  - future_tbl ---> NULL
 +-------+    +--------------+          +---------------+
               ^^^                          ^^^
	       used by rhashtable_shrink()  used by rht_shrink_below_30()

A rehash then stalls out when both the last table needs to shrink, the
first table has more elements than the target size, but rht_shrink() hits
a non-NULL future_tbl and returns -EEXIST. This skips the item rehashing
and kicks off a reschedule loop, as no forward progress can be made while
the rhashtable needs to shrink.

Extend rhashtable_shrink() with a "tbl" param to avoid endless exit-and-
reschedules after hitting the EEXIST, allowing it to check a future_tbl
pointer that can actually be non-NULL and make forward progress when the
hashtable needs to shrink.

Fixes: da20420f83ea ("rhashtable: Add nested tables")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
---
 lib/rhashtable.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 852ffa5160f1..98e91f9544fa 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -377,9 +377,9 @@ static int rhashtable_rehash_alloc(struct rhashtable *ht,
  * It is valid to have concurrent insertions and deletions protected by per
  * bucket locks or concurrent RCU protected lookups and traversals.
  */
-static int rhashtable_shrink(struct rhashtable *ht)
+static int rhashtable_shrink(struct rhashtable *ht,
+			     struct bucket_table *old_tbl)
 {
-	struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
 	unsigned int nelems = atomic_read(&ht->nelems);
 	unsigned int size = 0;
 
@@ -412,7 +412,7 @@ static void rht_deferred_worker(struct work_struct *work)
 	if (rht_grow_above_75(ht, tbl))
 		err = rhashtable_rehash_alloc(ht, tbl, tbl->size * 2);
 	else if (ht->p.automatic_shrinking && rht_shrink_below_30(ht, tbl))
-		err = rhashtable_shrink(ht);
+		err = rhashtable_shrink(ht, tbl);
 	else if (tbl->nest)
 		err = rhashtable_rehash_alloc(ht, tbl, tbl->size);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [v2 PATCH] rhashtable: Still do rehash when we get EEXIST
  2019-01-23 21:17 [PATCH net] rhashtable: avoid reschedule loop after rapid growth and shrink Josh Elsasser
@ 2019-01-24  3:08 ` Herbert Xu
  2019-01-24  3:40   ` Josh Elsasser
  0 siblings, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2019-01-24  3:08 UTC (permalink / raw)
  To: Josh Elsasser; +Cc: David S . Miller, josh, Thomas Graf, netdev, linux-kernel

On Wed, Jan 23, 2019 at 01:17:58PM -0800, Josh Elsasser wrote:
> When running workloads with large bursts of fragmented packets, we've seen
> a few machines stuck returning -EEXIST from rht_shrink() and endlessly
> rescheduling their hash table's deferred work, pegging a CPU core.
> 
> Root cause is commit da20420f83ea ("rhashtable: Add nested tables"), which
> stops ignoring the return code of rhashtable_shrink() and the reallocs
> used to grow the hashtable. This uncovers a bug in the shrink logic where
> "needs to shrink" check runs against the last table but the actual shrink
> operation runs on the first bucket_table in the hashtable (see below):
> 
>  +-------+    +--------------+          +---------------+
>  | ht    |    | "first" tbl  |          | "last" tbl    |
>  | - tbl ---> | - future_tbl ---------> |  - future_tbl ---> NULL
>  +-------+    +--------------+          +---------------+
>                ^^^                          ^^^
> 	       used by rhashtable_shrink()  used by rht_shrink_below_30()
> 
> A rehash then stalls out when both the last table needs to shrink, the
> first table has more elements than the target size, but rht_shrink() hits
> a non-NULL future_tbl and returns -EEXIST. This skips the item rehashing
> and kicks off a reschedule loop, as no forward progress can be made while
> the rhashtable needs to shrink.
> 
> Extend rhashtable_shrink() with a "tbl" param to avoid endless exit-and-
> reschedules after hitting the EEXIST, allowing it to check a future_tbl
> pointer that can actually be non-NULL and make forward progress when the
> hashtable needs to shrink.
> 
> Fixes: da20420f83ea ("rhashtable: Add nested tables")
> Signed-off-by: Josh Elsasser <jelsasser@appneta.com>

Thanks for catching this!

Although I think we should fix this in a different way.  The problem
here is that the shrink cannot proceed because there was a previous
rehash that is still incomplete.  We should wait for its completion
and then reattempt a shrinnk should it still be necessary.

So something like this:

---8<---
As it stands if a shrink is delayed because of an outstanding
rehash, we will go into a rescheduling loop without ever doing
the rehash.

This patch fixes this by still carrying out the rehash and then
rescheduling so that we can shrink after the completion of the
rehash should it still be necessary.

The return value of EEXIST captures this case and other cases
(e.g., another thread expanded/rehashed the table at the same
time) where we should still proceed with the rehash.

Fixes: da20420f83ea ("rhashtable: Add nested tables")
Reported-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 852ffa5160f1..4edcf3310513 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -416,8 +416,12 @@ static void rht_deferred_worker(struct work_struct *work)
 	else if (tbl->nest)
 		err = rhashtable_rehash_alloc(ht, tbl, tbl->size);
 
-	if (!err)
-		err = rhashtable_rehash_table(ht);
+	if (!err || err == -EEXIST) {
+		int nerr;
+
+		nerr = rhashtable_rehash_table(ht);
+		err = err ?: nerr;
+	}
 
 	mutex_unlock(&ht->mutex);
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [v2 PATCH] rhashtable: Still do rehash when we get EEXIST
  2019-01-24  3:08 ` [v2 PATCH] rhashtable: Still do rehash when we get EEXIST Herbert Xu
@ 2019-01-24  3:40   ` Josh Elsasser
  2019-01-26 22:02     ` Josh Elsasser
  0 siblings, 1 reply; 9+ messages in thread
From: Josh Elsasser @ 2019-01-24  3:40 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S . Miller, Thomas Graf, netdev, linux-kernel

On Jan 23, 2019, at 7:08 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Thanks for catching this!
> 
> Although I think we should fix this in a different way.  The problem
> here is that the shrink cannot proceed because there was a previous
> rehash that is still incomplete.  We should wait for its completion
> and then reattempt a shrinnk should it still be necessary.
> 
> So something like this:

SGTM. 

I can't test this right now because our VM server's down after a power
 outage this evening, but I tried a similar patch that swallowed the
 -EEXIST err and even with that oversight the hashtable dodged the
reschedule loop.

- Josh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v2 PATCH] rhashtable: Still do rehash when we get EEXIST
  2019-01-24  3:40   ` Josh Elsasser
@ 2019-01-26 22:02     ` Josh Elsasser
  2019-03-20 22:39       ` Josh Hunt
       [not found]       ` <CAKA=qzY4Pzee9BVzRCciW32toeHSz7t0q9LuvQXKLG2fX9fBbg@mail.gmail.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Josh Elsasser @ 2019-01-26 22:02 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller
  Cc: Thomas Graf, netdev, linux-kernel, Josh Elsasser

On Jan 23, 2019, at 7:40 PM, Josh Elsasser <jelsasser@appneta.com> wrote:
> On Jan 23, 2019, at 7:08 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
>> Thanks for catching this!
>> 
>> Although I think we should fix this in a different way.  The problem
>> here is that the shrink cannot proceed because there was a previous
>> rehash that is still incomplete.  We should wait for its completion
>> and then reattempt a shrinnk should it still be necessary.
> 
> I can't test this right now because our VM server's down 

Got one of the poor little reproducer VM's back up and running and loaded
up this patch. Works like a charm. For the v2 PATCH, can add my:

Tested-by: Josh Elsasser <jelsasser@appneta.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v2 PATCH] rhashtable: Still do rehash when we get EEXIST
  2019-01-26 22:02     ` Josh Elsasser
@ 2019-03-20 22:39       ` Josh Hunt
       [not found]       ` <CAKA=qzY4Pzee9BVzRCciW32toeHSz7t0q9LuvQXKLG2fX9fBbg@mail.gmail.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Josh Hunt @ 2019-03-20 22:39 UTC (permalink / raw)
  To: Josh Elsasser; +Cc: Herbert Xu, David S . Miller, Thomas Graf, netdev, LKML

On Sat, Jan 26, 2019 at 2:03 PM Josh Elsasser <jelsasser@appneta.com> wrote:
>
> On Jan 23, 2019, at 7:40 PM, Josh Elsasser <jelsasser@appneta.com> wrote:
> > On Jan 23, 2019, at 7:08 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >
> >> Thanks for catching this!
> >>
> >> Although I think we should fix this in a different way.  The problem
> >> here is that the shrink cannot proceed because there was a previous
> >> rehash that is still incomplete.  We should wait for its completion
> >> and then reattempt a shrinnk should it still be necessary.
> >
> > I can't test this right now because our VM server's down
>
> Got one of the poor little reproducer VM's back up and running and loaded
> up this patch. Works like a charm. For the v2 PATCH, can add my:
>
> Tested-by: Josh Elsasser <jelsasser@appneta.com>

Trying again... gmail sent HTML mail first time.

Herbert

We're seeing this pretty regularly on 4.14 LTS kernels. I didn't see
your change in any of the regular trees. Are there plans to submit
this? If so, can it get queued up for 4.14 stable too?

Thanks!
-- 
Josh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [v3 PATCH] rhashtable: Still do rehash when we get EEXIST
       [not found]       ` <CAKA=qzY4Pzee9BVzRCciW32toeHSz7t0q9LuvQXKLG2fX9fBbg@mail.gmail.com>
@ 2019-03-21  1:39         ` Herbert Xu
  2019-03-21  1:46           ` Herbert Xu
  2019-03-21 20:58           ` David Miller
  0 siblings, 2 replies; 9+ messages in thread
From: Herbert Xu @ 2019-03-21  1:39 UTC (permalink / raw)
  To: Josh Hunt; +Cc: Josh Elsasser, David S . Miller, Thomas Graf, netdev, LKML

On Wed, Mar 20, 2019 at 03:29:17PM -0700, Josh Hunt wrote:
>
> Herbert
> 
> We're seeing this pretty regularly on 4.14 LTS kernels. I didn't see your
> change in any of the regular trees. Are there plans to submit this? If so,
> can it get queued up for 4.14 stable too?

Hi Josh:

Thanks for reminding me.  Looks like this one slipped through the
cracks.

Dave, could you please apply this? Thanks!

---8<---
As it stands if a shrink is delayed because of an outstanding
rehash, we will go into a rescheduling loop without ever doing
the rehash.

This patch fixes this by still carrying out the rehash and then
rescheduling so that we can shrink after the completion of the
rehash should it still be necessary.

The return value of EEXIST captures this case and other cases
(e.g., another thread expanded/rehashed the table at the same
time) where we should still proceed with the rehash.

Fixes: da20420f83ea ("rhashtable: Add nested tables")
Reported-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 852ffa5160f1..4edcf3310513 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -416,8 +416,12 @@ static void rht_deferred_worker(struct work_struct *work)
 	else if (tbl->nest)
 		err = rhashtable_rehash_alloc(ht, tbl, tbl->size);
 
-	if (!err)
-		err = rhashtable_rehash_table(ht);
+	if (!err || err == -EEXIST) {
+		int nerr;
+
+		nerr = rhashtable_rehash_table(ht);
+		err = err ?: nerr;
+	}
 
 	mutex_unlock(&ht->mutex);

-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [v3 PATCH] rhashtable: Still do rehash when we get EEXIST
  2019-03-21  1:39         ` [v3 " Herbert Xu
@ 2019-03-21  1:46           ` Herbert Xu
  2019-03-21 20:58             ` David Miller
  2019-03-21 20:58           ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2019-03-21  1:46 UTC (permalink / raw)
  To: Josh Hunt; +Cc: Josh Elsasser, David S . Miller, Thomas Graf, netdev, LKML

On Thu, Mar 21, 2019 at 09:39:52AM +0800, Herbert Xu wrote:
>
> Dave, could you please apply this? Thanks!

I forgot to include the tested-by so here it is for patchwork:

Tested-by: Josh Elsasser <jelsasser@appneta.com>
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v3 PATCH] rhashtable: Still do rehash when we get EEXIST
  2019-03-21  1:39         ` [v3 " Herbert Xu
  2019-03-21  1:46           ` Herbert Xu
@ 2019-03-21 20:58           ` David Miller
  1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2019-03-21 20:58 UTC (permalink / raw)
  To: herbert; +Cc: joshhunt00, jelsasser, tgraf, netdev, linux-kernel

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 21 Mar 2019 09:39:52 +0800

> As it stands if a shrink is delayed because of an outstanding
> rehash, we will go into a rescheduling loop without ever doing
> the rehash.
> 
> This patch fixes this by still carrying out the rehash and then
> rescheduling so that we can shrink after the completion of the
> rehash should it still be necessary.
> 
> The return value of EEXIST captures this case and other cases
> (e.g., another thread expanded/rehashed the table at the same
> time) where we should still proceed with the rehash.
> 
> Fixes: da20420f83ea ("rhashtable: Add nested tables")
> Reported-by: Josh Elsasser <jelsasser@appneta.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied and queued up for -stable, thanks Herbert.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v3 PATCH] rhashtable: Still do rehash when we get EEXIST
  2019-03-21  1:46           ` Herbert Xu
@ 2019-03-21 20:58             ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2019-03-21 20:58 UTC (permalink / raw)
  To: herbert; +Cc: joshhunt00, jelsasser, tgraf, netdev, linux-kernel

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 21 Mar 2019 09:46:15 +0800

> On Thu, Mar 21, 2019 at 09:39:52AM +0800, Herbert Xu wrote:
>>
>> Dave, could you please apply this? Thanks!
> 
> I forgot to include the tested-by so here it is for patchwork:
> 
> Tested-by: Josh Elsasser <jelsasser@appneta.com>

Thanks for providing this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-03-21 20:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-23 21:17 [PATCH net] rhashtable: avoid reschedule loop after rapid growth and shrink Josh Elsasser
2019-01-24  3:08 ` [v2 PATCH] rhashtable: Still do rehash when we get EEXIST Herbert Xu
2019-01-24  3:40   ` Josh Elsasser
2019-01-26 22:02     ` Josh Elsasser
2019-03-20 22:39       ` Josh Hunt
     [not found]       ` <CAKA=qzY4Pzee9BVzRCciW32toeHSz7t0q9LuvQXKLG2fX9fBbg@mail.gmail.com>
2019-03-21  1:39         ` [v3 " Herbert Xu
2019-03-21  1:46           ` Herbert Xu
2019-03-21 20:58             ` David Miller
2019-03-21 20:58           ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.