All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
@ 2015-02-25 22:09 Alexei Starovoitov
  2015-02-26  7:53 ` Patrick McHardy
  0 siblings, 1 reply; 31+ messages in thread
From: Alexei Starovoitov @ 2015-02-25 22:09 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Eric Dumazet, Daniel Borkmann, David Laight, davem, tgraf, pablo,
	johunt, netdev

On Wed, Feb 25, 2015 at 12:10 PM, Patrick McHardy <kaber@trash.net> wrote:
> On 25.02, Eric Dumazet wrote:
>> But if any workload had to grow the table to 2^20 slots, we had to
>> consume GB of memory anyway to hold sockets and everything.
>>
>> Trying to shrink is simply not worth it, unless you expect your host
>> never reboots and you desperately need back these 8 MBytes of memory.
>
> That may be true in the TCP case, but for not for nftables. We might
> have many sets and, especially when used to represent more complicated
> classification algorithms, their size might change by a lot.

sounds like grow/shrink decision cannot be generalized within
rhashtable, but two callbacks are about to be removed and the
are costly. So would it make sense to disable auto-expand/shrink
completely and let nft/tcp call expand/shrink when needed?
nft can potentially do smarter batching this way.
If it sees a lot of entries are about to be inserted, it can call
expand directly to quickly grow sparsely populated table
into large one, and then insert all the entries.
That will mitigate 'slow rcu' issue as well.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 22:09 [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions Alexei Starovoitov
@ 2015-02-26  7:53 ` Patrick McHardy
  2015-02-26  8:54   ` Daniel Borkmann
  2015-02-26 14:50   ` Eric Dumazet
  0 siblings, 2 replies; 31+ messages in thread
From: Patrick McHardy @ 2015-02-26  7:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Daniel Borkmann, David Laight, davem, tgraf, pablo,
	johunt, netdev

On 25.02, Alexei Starovoitov wrote:
> On Wed, Feb 25, 2015 at 12:10 PM, Patrick McHardy <kaber@trash.net> wrote:
> > On 25.02, Eric Dumazet wrote:
> >> But if any workload had to grow the table to 2^20 slots, we had to
> >> consume GB of memory anyway to hold sockets and everything.
> >>
> >> Trying to shrink is simply not worth it, unless you expect your host
> >> never reboots and you desperately need back these 8 MBytes of memory.
> >
> > That may be true in the TCP case, but for not for nftables. We might
> > have many sets and, especially when used to represent more complicated
> > classification algorithms, their size might change by a lot.
> 
> sounds like grow/shrink decision cannot be generalized within
> rhashtable, but two callbacks are about to be removed and the
> are costly. So would it make sense to disable auto-expand/shrink
> completely and let nft/tcp call expand/shrink when needed?

My understanding was that Eric was arguing against shrinking in general.
But assuming we have it, what's the downside of also performing
shrinking for TCP?

> nft can potentially do smarter batching this way.
> If it sees a lot of entries are about to be inserted, it can call
> expand directly to quickly grow sparsely populated table
> into large one, and then insert all the entries.
> That will mitigate 'slow rcu' issue as well.

I like that idea.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-26  7:53 ` Patrick McHardy
@ 2015-02-26  8:54   ` Daniel Borkmann
  2015-02-26 14:50   ` Eric Dumazet
  1 sibling, 0 replies; 31+ messages in thread
From: Daniel Borkmann @ 2015-02-26  8:54 UTC (permalink / raw)
  To: Patrick McHardy, Alexei Starovoitov
  Cc: Eric Dumazet, David Laight, davem, tgraf, pablo, johunt, netdev

On 02/26/2015 08:53 AM, Patrick McHardy wrote:
> On 25.02, Alexei Starovoitov wrote:
>> On Wed, Feb 25, 2015 at 12:10 PM, Patrick McHardy <kaber@trash.net> wrote:
>>> On 25.02, Eric Dumazet wrote:
>>>> But if any workload had to grow the table to 2^20 slots, we had to
>>>> consume GB of memory anyway to hold sockets and everything.
>>>>
>>>> Trying to shrink is simply not worth it, unless you expect your host
>>>> never reboots and you desperately need back these 8 MBytes of memory.
>>>
>>> That may be true in the TCP case, but for not for nftables. We might
>>> have many sets and, especially when used to represent more complicated
>>> classification algorithms, their size might change by a lot.
>>
>> sounds like grow/shrink decision cannot be generalized within
>> rhashtable, but two callbacks are about to be removed and the
>> are costly. So would it make sense to disable auto-expand/shrink
>> completely and let nft/tcp call expand/shrink when needed?
>
> My understanding was that Eric was arguing against shrinking in general.
> But assuming we have it, what's the downside of also performing
> shrinking for TCP?
>
>> nft can potentially do smarter batching this way.
>> If it sees a lot of entries are about to be inserted, it can call
>> expand directly to quickly grow sparsely populated table
>> into large one, and then insert all the entries.
>> That will mitigate 'slow rcu' issue as well.
>
> I like that idea.

I think shrinking/expanding could still be configurable when we
get there. Perhaps as a flag parameter, definitely something more
lightweight at least, as both grow/shrink decision functions seem
to be quite reusable and could therefore stay private.

Perhaps those users that want to specifically optimize grow/shrink
could then disallow auto-expand/shrink from within rhashtable (via
initialization parameters) and could use the APIs directly, which
we need to expose then. That way we can keep it simple for netlink,
tipc and what else pops up.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-26  7:53 ` Patrick McHardy
  2015-02-26  8:54   ` Daniel Borkmann
@ 2015-02-26 14:50   ` Eric Dumazet
  2015-02-26 14:54     ` Patrick McHardy
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
  1 sibling, 2 replies; 31+ messages in thread
From: Eric Dumazet @ 2015-02-26 14:50 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Alexei Starovoitov, Daniel Borkmann, David Laight, davem, tgraf,
	pablo, johunt, netdev

On Thu, 2015-02-26 at 07:53 +0000, Patrick McHardy wrote:

> My understanding was that Eric was arguing against shrinking in general.
> But assuming we have it, what's the downside of also performing
> shrinking for TCP?

Hash resize is horribly expensive (this is general to all hash
implementations), so once you took the risk to expand the table once,
you do not want to take the risk another time. You are lucky if host was
not already taken out of the server pool.

Fact that cond_resched() is not yet used anywhere in lib/rhashtable.c
reveals part of the problem we have here.

(I'll submit patches for that)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-26 14:50   ` Eric Dumazet
@ 2015-02-26 14:54     ` Patrick McHardy
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
  1 sibling, 0 replies; 31+ messages in thread
From: Patrick McHardy @ 2015-02-26 14:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Daniel Borkmann, David Laight, davem, tgraf,
	pablo, johunt, netdev

On 26.02, Eric Dumazet wrote:
> On Thu, 2015-02-26 at 07:53 +0000, Patrick McHardy wrote:
> 
> > My understanding was that Eric was arguing against shrinking in general.
> > But assuming we have it, what's the downside of also performing
> > shrinking for TCP?
> 
> Hash resize is horribly expensive (this is general to all hash
> implementations), so once you took the risk to expand the table once,
> you do not want to take the risk another time. You are lucky if host was
> not already taken out of the server pool.

Well, it's async, so I'd hope it wouldn't take you out of your server
pool. But sure, for TCP your point seems reasonable, as long as we'll
keep it available for nftables I'm happy.

> Fact that cond_resched() is not yet used anywhere in lib/rhashtable.c
> reveals part of the problem we have here.
> 
> (I'll submit patches for that)

Reminds me that I also need this for nft set GC, thanks :)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH net] rhashtable: use cond_resched()
  2015-02-26 14:50   ` Eric Dumazet
  2015-02-26 14:54     ` Patrick McHardy
@ 2015-02-26 15:20     ` Eric Dumazet
  2015-02-26 15:36       ` David Laight
                         ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Eric Dumazet @ 2015-02-26 15:20 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Alexei Starovoitov, Daniel Borkmann, David Laight, davem, tgraf,
	pablo, johunt, netdev

From: Eric Dumazet <edumazet@google.com>

If a hash table has 128 slots and 16384 elems, expand to 256 slots
takes more than one second. For larger sets, a soft lockup is detected.

Holding cpu for that long, even in a work queue is a show stopper
for non preemptable kernels.

cond_resched() at strategic points to allow process scheduler
to reschedule us.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 lib/rhashtable.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index e3a04e4b3ec5..f3073fc600ed 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -17,6 +17,7 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/log2.h>
+#include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/mm.h>
@@ -414,6 +415,7 @@ int rhashtable_expand(struct rhashtable *ht)
 			}
 		}
 		unlock_buckets(new_tbl, old_tbl, new_hash);
+		cond_resched();
 	}
 
 	/* Unzip interleaved hash chains */
@@ -437,6 +439,7 @@ int rhashtable_expand(struct rhashtable *ht)
 				complete = false;
 
 			unlock_buckets(new_tbl, old_tbl, old_hash);
+			cond_resched();
 		}
 	}
 
@@ -495,6 +498,7 @@ int rhashtable_shrink(struct rhashtable *ht)
 				   tbl->buckets[new_hash + new_tbl->size]);
 
 		unlock_buckets(new_tbl, tbl, new_hash);
+		cond_resched();
 	}
 
 	/* Publish the new, valid hash table */

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
@ 2015-02-26 15:36       ` David Laight
  2015-02-26 15:46         ` Patrick McHardy
  2015-02-26 15:50       ` Daniel Borkmann
  2015-02-27 22:55       ` David Miller
  2 siblings, 1 reply; 31+ messages in thread
From: David Laight @ 2015-02-26 15:36 UTC (permalink / raw)
  To: 'Eric Dumazet', Patrick McHardy
  Cc: Alexei Starovoitov, Daniel Borkmann, davem, tgraf, pablo, johunt, netdev

From: Eric Dumazet 
Sent: 26 February 2015 15:21
> If a hash table has 128 slots and 16384 elems, expand to 256 slots
> takes more than one second. For larger sets, a soft lockup is detected.

What on earth is it doing?
Presumably something to do with the rcu actions needed to allow
lockless lookup during resize.

There has to be a better solution?
Perhaps even two sets of chain pointers down the hash lists.
Then the old hash table can be kept completely valid while the
whole 'unzip' action is done.

	David


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:36       ` David Laight
@ 2015-02-26 15:46         ` Patrick McHardy
  2015-02-26 16:03           ` David Laight
  0 siblings, 1 reply; 31+ messages in thread
From: Patrick McHardy @ 2015-02-26 15:46 UTC (permalink / raw)
  To: David Laight
  Cc: 'Eric Dumazet',
	Alexei Starovoitov, Daniel Borkmann, davem, tgraf, pablo, johunt,
	netdev

On 26.02, David Laight wrote:
> From: Eric Dumazet 
> Sent: 26 February 2015 15:21
> > If a hash table has 128 slots and 16384 elems, expand to 256 slots
> > takes more than one second. For larger sets, a soft lockup is detected.
> 
> What on earth is it doing?
> Presumably something to do with the rcu actions needed to allow
> lockless lookup during resize.
> 
> There has to be a better solution?
> Perhaps even two sets of chain pointers down the hash lists.
> Then the old hash table can be kept completely valid while the
> whole 'unzip' action is done.

One of the main points of rhashtable is that you don't need those.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
  2015-02-26 15:36       ` David Laight
@ 2015-02-26 15:50       ` Daniel Borkmann
  2015-02-27 22:55       ` David Miller
  2 siblings, 0 replies; 31+ messages in thread
From: Daniel Borkmann @ 2015-02-26 15:50 UTC (permalink / raw)
  To: Eric Dumazet, Patrick McHardy
  Cc: Alexei Starovoitov, David Laight, davem, tgraf, pablo, johunt, netdev

On 02/26/2015 04:20 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> If a hash table has 128 slots and 16384 elems, expand to 256 slots
> takes more than one second. For larger sets, a soft lockup is detected.
>
> Holding cpu for that long, even in a work queue is a show stopper
> for non preemptable kernels.
>
> cond_resched() at strategic points to allow process scheduler
> to reschedule us.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:46         ` Patrick McHardy
@ 2015-02-26 16:03           ` David Laight
  0 siblings, 0 replies; 31+ messages in thread
From: David Laight @ 2015-02-26 16:03 UTC (permalink / raw)
  To: 'Patrick McHardy'
  Cc: 'Eric Dumazet',
	Alexei Starovoitov, Daniel Borkmann, davem, tgraf, pablo, johunt,
	netdev

From: Patrick McHardy [mailto:kaber@trash.net]
> On 26.02, David Laight wrote:
> > From: Eric Dumazet
> > Sent: 26 February 2015 15:21
> > > If a hash table has 128 slots and 16384 elems, expand to 256 slots
> > > takes more than one second. For larger sets, a soft lockup is detected.
> >
> > What on earth is it doing?
> > Presumably something to do with the rcu actions needed to allow
> > lockless lookup during resize.
> >
> > There has to be a better solution?
> > Perhaps even two sets of chain pointers down the hash lists.
> > Then the old hash table can be kept completely valid while the
> > whole 'unzip' action is done.
> 
> One of the main points of rhashtable is that you don't need those.

Would it be possible to keep the hash chains sorted into 'raw hash' order?
I think that would mean the 'unzip' only had to change a single linkage.

	David

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
  2015-02-26 15:36       ` David Laight
  2015-02-26 15:50       ` Daniel Borkmann
@ 2015-02-27 22:55       ` David Miller
  2 siblings, 0 replies; 31+ messages in thread
From: David Miller @ 2015-02-27 22:55 UTC (permalink / raw)
  To: eric.dumazet
  Cc: kaber, alexei.starovoitov, daniel, David.Laight, tgraf, pablo,
	johunt, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 26 Feb 2015 07:20:34 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> If a hash table has 128 slots and 16384 elems, expand to 256 slots
> takes more than one second. For larger sets, a soft lockup is detected.
> 
> Holding cpu for that long, even in a work queue is a show stopper
> for non preemptable kernels.
> 
> cond_resched() at strategic points to allow process scheduler
> to reschedule us.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-03-12 16:57               ` Thomas Graf
@ 2015-03-13  7:06                 ` Herbert Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Herbert Xu @ 2015-03-13  7:06 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Eric Dumazet, daniel, David.Laight, davem, pablo, johunt, kaber, netdev

On Thu, Mar 12, 2015 at 04:57:33PM +0000, Thomas Graf wrote:
>
> I agree that the max elements should be enforced by users as done by
> Netlink right now. I'm also fine with ditching nelems from rhashtable
> as long as shrinking is covered. Patrick mentioned several times by
> now that nft sets want to have shrinking.

My plan is to have expansions triggered solely through insert,
either immediate or delayed, and shrinking will only be done
when explicitly requested by the user.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-03-11  6:42             ` Herbert Xu
@ 2015-03-12 16:57               ` Thomas Graf
  2015-03-13  7:06                 ` Herbert Xu
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Graf @ 2015-03-12 16:57 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric Dumazet, daniel, David.Laight, davem, pablo, johunt, kaber, netdev

On 03/11/15 at 05:42pm, Herbert Xu wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > There is no need to track number of elements, using either an atomic_t
> > or percpu_counter. This adds unnecessary burden.
> > 
> > 1) Automatic shrinking is a non issue. This will free very little
> > memory, compared to previous peak usage (including objects put in
> > rhashtable). If hash grown to a certain point, it's likely it will grow
> > again later.
> > 
> > 2) Growing can be triggered when any bucket has more than X elems, and
> > that is given for free at insert time.
> > X could be log2(buckets)/2 I guess. (aka shift/2)
> > 
> > A global limit on number of elements should be controlled by rhashtable
> > users - if needed -, not in the rhashtable itself.
> 
> I agree in the strongest terms :)

I agree that the max elements should be enforced by users as done by
Netlink right now. I'm also fine with ditching nelems from rhashtable
as long as shrinking is covered. Patrick mentioned several times by
now that nft sets want to have shrinking.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 17:41           ` Eric Dumazet
                               ` (3 preceding siblings ...)
  2015-02-26 14:18             ` David Laight
@ 2015-03-11  6:42             ` Herbert Xu
  2015-03-12 16:57               ` Thomas Graf
  4 siblings, 1 reply; 31+ messages in thread
From: Herbert Xu @ 2015-03-11  6:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: daniel, David.Laight, davem, tgraf, pablo, johunt, kaber, netdev

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-02-25 at 17:14 +0100, Daniel Borkmann wrote:
> 
>> The nelems can be a percpu_counter where we have a batched sync
>> point and can make that dependent on the current table shift as
>> we don't need to be overly precise, we can just read the sync'ed
>> value. Currently, nelems are also being used by rhashtable users
>> outside of the core code to track if we are still allowed to
>> insert new elements, but I think we might also want to address
>> that at some point.
> 
> There is no need to track number of elements, using either an atomic_t
> or percpu_counter. This adds unnecessary burden.
> 
> 1) Automatic shrinking is a non issue. This will free very little
> memory, compared to previous peak usage (including objects put in
> rhashtable). If hash grown to a certain point, it's likely it will grow
> again later.
> 
> 2) Growing can be triggered when any bucket has more than X elems, and
> that is given for free at insert time.
> X could be log2(buckets)/2 I guess. (aka shift/2)
> 
> A global limit on number of elements should be controlled by rhashtable
> users - if needed -, not in the rhashtable itself.

I agree in the strongest terms :)

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-28  0:48                 ` Patrick McHardy
@ 2015-02-28  1:35                   ` David Miller
  0 siblings, 0 replies; 31+ messages in thread
From: David Miller @ 2015-02-28  1:35 UTC (permalink / raw)
  To: kaber; +Cc: tgraf, eric.dumazet, daniel, David.Laight, pablo, johunt, netdev

From: Patrick McHardy <kaber@trash.net>
Date: Sat, 28 Feb 2015 00:48:40 +0000

> It seems to me that we have enough use cases right now that problems
> have become quite visible, and until we're sure they're actually
> solvable for all these different use cases I'd advocate for not
> moving more subsystems to use this.

No objections.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-27 22:30               ` David Miller
@ 2015-02-28  0:48                 ` Patrick McHardy
  2015-02-28  1:35                   ` David Miller
  0 siblings, 1 reply; 31+ messages in thread
From: Patrick McHardy @ 2015-02-28  0:48 UTC (permalink / raw)
  To: David Miller
  Cc: tgraf, eric.dumazet, daniel, David.Laight, pablo, johunt, netdev

On 27.02, David Miller wrote:
> From: "tgraf@suug.ch" <tgraf@suug.ch>
> Date: Thu, 26 Feb 2015 10:02:35 +0000
> 
> > I just want to point out here that TCP is not the only future
> > use case. The station table for mac80211 has recently been
> > converted. We need to keep all of them on the radar.
> 
> Just wanted to note in passing that I wonder how legal
> net/core/neighbour.c:neigh_hash_grow() is.  It's growing an RCU
> hashtable without using rhashtable. :-)

I would like to note that it seem premature to convert all these
users without having the fundamental problems solved, specifically
the unbounded growing hash chains with async resizing.

It seems to me that we have enough use cases right now that problems
have become quite visible, and until we're sure they're actually
solvable for all these different use cases I'd advocate for not
moving more subsystems to use this.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-26 10:02             ` tgraf
@ 2015-02-27 22:30               ` David Miller
  2015-02-28  0:48                 ` Patrick McHardy
  0 siblings, 1 reply; 31+ messages in thread
From: David Miller @ 2015-02-27 22:30 UTC (permalink / raw)
  To: tgraf; +Cc: eric.dumazet, daniel, David.Laight, pablo, johunt, kaber, netdev

From: "tgraf@suug.ch" <tgraf@suug.ch>
Date: Thu, 26 Feb 2015 10:02:35 +0000

> I just want to point out here that TCP is not the only future
> use case. The station table for mac80211 has recently been
> converted. We need to keep all of them on the radar.

Just wanted to note in passing that I wonder how legal
net/core/neighbour.c:neigh_hash_grow() is.  It's growing an RCU
hashtable without using rhashtable. :-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 17:41           ` Eric Dumazet
                               ` (2 preceding siblings ...)
  2015-02-26 10:02             ` tgraf
@ 2015-02-26 14:18             ` David Laight
  2015-03-11  6:42             ` Herbert Xu
  4 siblings, 0 replies; 31+ messages in thread
From: David Laight @ 2015-02-26 14:18 UTC (permalink / raw)
  To: 'Eric Dumazet', Daniel Borkmann
  Cc: davem, tgraf, pablo, johunt, kaber, netdev

> 1) Automatic shrinking is a non issue. This will free very little
> memory, compared to previous peak usage (including objects put in
> rhashtable). If hash grown to a certain point, it's likely it will grow
> again later.

Shrink could be done 'on demand' from the 'user' when the number of
elements is known to have significantly reduced.

> 2) Growing can be triggered when any bucket has more than X elems, and
> that is given for free at insert time.
> X could be log2(buckets)/2 I guess. (aka shift/2)

If growing is triggered when a bucket has more than X elems, then setting
X very high will stop automatic growing.
Growing would then only be done in response to user requests.

Whether X should be constant, or log2(buckets)/n is another matter.
Or something like N1 + (N2 * log2(buckets))/64 in any case it
only needs recalculating after an auto-grow.

	David


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 17:41           ` Eric Dumazet
  2015-02-25 17:49             ` David Laight
  2015-02-25 18:56             ` Daniel Borkmann
@ 2015-02-26 10:02             ` tgraf
  2015-02-27 22:30               ` David Miller
  2015-02-26 14:18             ` David Laight
  2015-03-11  6:42             ` Herbert Xu
  4 siblings, 1 reply; 31+ messages in thread
From: tgraf @ 2015-02-26 10:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, David Laight, davem, pablo, johunt, kaber, netdev

On 02/25/15 at 09:41am, Eric Dumazet wrote:
> There is no need to track number of elements, using either an atomic_t
> or percpu_counter. This adds unnecessary burden.
> 
> 1) Automatic shrinking is a non issue. This will free very little
> memory, compared to previous peak usage (including objects put in
> rhashtable). If hash grown to a certain point, it's likely it will grow
> again later.
> 
> 2) Growing can be triggered when any bucket has more than X elems, and
> that is given for free at insert time.
> X could be log2(buckets)/2 I guess. (aka shift/2)
> 
> A global limit on number of elements should be controlled by rhashtable
> users - if needed -, not in the rhashtable itself.

I don't think we need to kill shrink to get rid of the elements
counters. It looks all users of shrink could trigger it manually
by maintaing their own element counter, we just have to export the
shrink function.

I just want to point out here that TCP is not the only future
use case. The station table for mac80211 has recently been
converted. We need to keep all of them on the radar.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 19:52               ` Eric Dumazet
@ 2015-02-25 20:10                 ` Patrick McHardy
  0 siblings, 0 replies; 31+ messages in thread
From: Patrick McHardy @ 2015-02-25 20:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, David Laight, davem, tgraf, pablo, johunt, netdev

On 25.02, Eric Dumazet wrote:
> On Wed, 2015-02-25 at 19:56 +0100, Daniel Borkmann wrote:
> > On 02/25/2015 06:41 PM, Eric Dumazet wrote:
> > ...
> > > There is no need to track number of elements, using either an atomic_t
> > > or percpu_counter. This adds unnecessary burden.
> > >
> > > 1) Automatic shrinking is a non issue. This will free very little
> > > memory, compared to previous peak usage (including objects put in
> > > rhashtable). If hash grown to a certain point, it's likely it will grow
> > > again later.
> > 
> > So you are saying that shrinking is most likely a rather undesirable
> > use-case in rhashtable?
> 
> Main point of rhashtable is to start with a small/reasonable size,
> and only big consumers need to _expand_ the table.
> 
> Like, instead of having 512000 slots in TCP hash table, start with a
> 2^10 size.
> 
> Like my laptop currently has :
> 
> TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
> 
> Which is kind of ridiculous....
> 
> But if any workload had to grow the table to 2^20 slots, we had to
> consume GB of memory anyway to hold sockets and everything.
> 
> Trying to shrink is simply not worth it, unless you expect your host
> never reboots and you desperately need back these 8 MBytes of memory.

That may be true in the TCP case, but for not for nftables. We might
have many sets and, especially when used to represent more complicated
classification algorithms, their size might change by a lot.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 18:56             ` Daniel Borkmann
@ 2015-02-25 19:52               ` Eric Dumazet
  2015-02-25 20:10                 ` Patrick McHardy
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2015-02-25 19:52 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Laight, davem, tgraf, pablo, johunt, kaber, netdev

On Wed, 2015-02-25 at 19:56 +0100, Daniel Borkmann wrote:
> On 02/25/2015 06:41 PM, Eric Dumazet wrote:
> ...
> > There is no need to track number of elements, using either an atomic_t
> > or percpu_counter. This adds unnecessary burden.
> >
> > 1) Automatic shrinking is a non issue. This will free very little
> > memory, compared to previous peak usage (including objects put in
> > rhashtable). If hash grown to a certain point, it's likely it will grow
> > again later.
> 
> So you are saying that shrinking is most likely a rather undesirable
> use-case in rhashtable?

Main point of rhashtable is to start with a small/reasonable size,
and only big consumers need to _expand_ the table.

Like, instead of having 512000 slots in TCP hash table, start with a
2^10 size.

Like my laptop currently has :

TCP established hash table entries: 131072 (order: 8, 1048576 bytes)

Which is kind of ridiculous....

But if any workload had to grow the table to 2^20 slots, we had to
consume GB of memory anyway to hold sockets and everything.

Trying to shrink is simply not worth it, unless you expect your host
never reboots and you desperately need back these 8 MBytes of memory.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 17:41           ` Eric Dumazet
  2015-02-25 17:49             ` David Laight
@ 2015-02-25 18:56             ` Daniel Borkmann
  2015-02-25 19:52               ` Eric Dumazet
  2015-02-26 10:02             ` tgraf
                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Daniel Borkmann @ 2015-02-25 18:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Laight, davem, tgraf, pablo, johunt, kaber, netdev

On 02/25/2015 06:41 PM, Eric Dumazet wrote:
...
> There is no need to track number of elements, using either an atomic_t
> or percpu_counter. This adds unnecessary burden.
>
> 1) Automatic shrinking is a non issue. This will free very little
> memory, compared to previous peak usage (including objects put in
> rhashtable). If hash grown to a certain point, it's likely it will grow
> again later.

So you are saying that shrinking is most likely a rather undesirable
use-case in rhashtable?

> 2) Growing can be triggered when any bucket has more than X elems, and
> that is given for free at insert time.
> X could be log2(buckets)/2 I guess. (aka shift/2)

Yes, I think such a facility can be added easily.

> A global limit on number of elements should be controlled by rhashtable
> users - if needed -, not in the rhashtable itself.

Makes sense, thanks for the suggestion! I'll experiment with the above
per bucket tracking a bit if you're okay with that, and also unfiddle
the rht counter internals from netlink.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 17:49             ` David Laight
@ 2015-02-25 18:15               ` Eric Dumazet
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Dumazet @ 2015-02-25 18:15 UTC (permalink / raw)
  To: David Laight; +Cc: Daniel Borkmann, davem, tgraf, pablo, johunt, kaber, netdev

On Wed, 2015-02-25 at 17:49 +0000, David Laight wrote:

> That won't work if someone is managing to break the hash function.
> You need some approximation to the total count as well.
> OTOH there is no point growing the table if the current list isn't 'full'.

Well, if hash function is broken, having 1,000,000 elements in one
bucket will kill your host, even if hash table has 1 million buckets.

Better find this out sooner than later.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 17:41           ` Eric Dumazet
@ 2015-02-25 17:49             ` David Laight
  2015-02-25 18:15               ` Eric Dumazet
  2015-02-25 18:56             ` Daniel Borkmann
                               ` (3 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: David Laight @ 2015-02-25 17:49 UTC (permalink / raw)
  To: 'Eric Dumazet', Daniel Borkmann
  Cc: davem, tgraf, pablo, johunt, kaber, netdev

From: Eric Dumazet
...
> 1) Automatic shrinking is a non issue. This will free very little
> memory, compared to previous peak usage (including objects put in
> rhashtable). If hash grown to a certain point, it's likely it will grow
> again later.

FWIW I tend to agree.

> 2) Growing can be triggered when any bucket has more than X elems, and
> that is given for free at insert time.
> X could be log2(buckets)/2 I guess. (aka shift/2)

That won't work if someone is managing to break the hash function.
You need some approximation to the total count as well.
OTOH there is no point growing the table if the current list isn't 'full'.

> A global limit on number of elements should be controlled by rhashtable
> users - if needed -, not in the rhashtable itself.

Possibly you could require the user to give an approximation of the
number of elements.

	David


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 16:14         ` Daniel Borkmann
@ 2015-02-25 17:41           ` Eric Dumazet
  2015-02-25 17:49             ` David Laight
                               ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Eric Dumazet @ 2015-02-25 17:41 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Laight, davem, tgraf, pablo, johunt, kaber, netdev

On Wed, 2015-02-25 at 17:14 +0100, Daniel Borkmann wrote:

> The nelems can be a percpu_counter where we have a batched sync
> point and can make that dependent on the current table shift as
> we don't need to be overly precise, we can just read the sync'ed
> value. Currently, nelems are also being used by rhashtable users
> outside of the core code to track if we are still allowed to
> insert new elements, but I think we might also want to address
> that at some point.

There is no need to track number of elements, using either an atomic_t
or percpu_counter. This adds unnecessary burden.

1) Automatic shrinking is a non issue. This will free very little
memory, compared to previous peak usage (including objects put in
rhashtable). If hash grown to a certain point, it's likely it will grow
again later.

2) Growing can be triggered when any bucket has more than X elems, and
that is given for free at insert time.
X could be log2(buckets)/2 I guess. (aka shift/2)

A global limit on number of elements should be controlled by rhashtable
users - if needed -, not in the rhashtable itself.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 15:31 ` [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions Daniel Borkmann
  2015-02-25 15:41   ` David Laight
@ 2015-02-25 17:23   ` Thomas Graf
  1 sibling, 0 replies; 31+ messages in thread
From: Thomas Graf @ 2015-02-25 17:23 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, pablo, johunt, kaber, netdev

On 02/25/15 at 04:31pm, Daniel Borkmann wrote:
> Currently, all real users of rhashtable default their grow and shrink
> decision functions to rht_grow_above_75() and rht_shrink_below_30(),
> so that there's currently no need to have this explicitly selectable.
> 
> It can/should be generic and private inside rhashtable until a real
> use case pops up. Since we can make this private, we'll save us this
> additional indirection layer and can improve insertion/deletion time
> as well.
> 
> Reference: http://patchwork.ozlabs.org/patch/443040/
> Suggested-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Nice

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 15:51       ` David Laight
@ 2015-02-25 16:14         ` Daniel Borkmann
  2015-02-25 17:41           ` Eric Dumazet
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Borkmann @ 2015-02-25 16:14 UTC (permalink / raw)
  To: David Laight, davem; +Cc: tgraf, pablo, johunt, kaber, netdev

On 02/25/2015 04:51 PM, David Laight wrote:
> From: Daniel Borkmann [
>> On 02/25/2015 04:41 PM, David Laight wrote:
>> ...
>>> Why not cache the 'number of items before we need to expand' value
>>> after each expansion, setting it to 'infinite' when expansion is disabled.
>>> Then the above check is a simple comparison.
>>> You probably don't even need an atomic_read() - provided something is
>>> double checked once the first test determines that an expansion is needed.
>>
>> Did you read my cover letter? ;)
>
> I probably didn't infer the relevant references :-)
>
> Actually even if some code wanted other rules, provided they
> are based on comparing the 'number of items' to a preset limit
> the limit could be set during expansion/contraction.

The shift itself doesn't need to be atomic, if we store that in
the table directly instead of rhashtable, once on allocation it's
being set and can stay immutable during its table lifetime.

The nelems can be a percpu_counter where we have a batched sync
point and can make that dependent on the current table shift as
we don't need to be overly precise, we can just read the sync'ed
value. Currently, nelems are also being used by rhashtable users
outside of the core code to track if we are still allowed to
insert new elements, but I think we might also want to address
that at some point.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 15:46     ` Daniel Borkmann
@ 2015-02-25 15:51       ` David Laight
  2015-02-25 16:14         ` Daniel Borkmann
  0 siblings, 1 reply; 31+ messages in thread
From: David Laight @ 2015-02-25 15:51 UTC (permalink / raw)
  To: 'Daniel Borkmann', davem; +Cc: tgraf, pablo, johunt, kaber, netdev

From: Daniel Borkmann [ 
> On 02/25/2015 04:41 PM, David Laight wrote:
> ...
> > Why not cache the 'number of items before we need to expand' value
> > after each expansion, setting it to 'infinite' when expansion is disabled.
> > Then the above check is a simple comparison.
> > You probably don't even need an atomic_read() - provided something is
> > double checked once the first test determines that an expansion is needed.
> 
> Did you read my cover letter? ;)

I probably didn't infer the relevant references :-)

Actually even if some code wanted other rules, provided they
are based on comparing the 'number of items' to a preset limit
the limit could be set during expansion/contraction.

	David

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 15:41   ` David Laight
@ 2015-02-25 15:46     ` Daniel Borkmann
  2015-02-25 15:51       ` David Laight
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Borkmann @ 2015-02-25 15:46 UTC (permalink / raw)
  To: David Laight, davem; +Cc: tgraf, pablo, johunt, kaber, netdev

On 02/25/2015 04:41 PM, David Laight wrote:
...
> Why not cache the 'number of items before we need to expand' value
> after each expansion, setting it to 'infinite' when expansion is disabled.
> Then the above check is a simple comparison.
> You probably don't even need an atomic_read() - provided something is
> double checked once the first test determines that an expansion is needed.

Did you read my cover letter? ;)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 15:31 ` [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions Daniel Borkmann
@ 2015-02-25 15:41   ` David Laight
  2015-02-25 15:46     ` Daniel Borkmann
  2015-02-25 17:23   ` Thomas Graf
  1 sibling, 1 reply; 31+ messages in thread
From: David Laight @ 2015-02-25 15:41 UTC (permalink / raw)
  To: 'Daniel Borkmann', davem; +Cc: tgraf, pablo, johunt, kaber, netdev

From: Daniel Borkmann
> Currently, all real users of rhashtable default their grow and shrink
> decision functions to rht_grow_above_75() and rht_shrink_below_30(),
> so that there's currently no need to have this explicitly selectable.
> 
> It can/should be generic and private inside rhashtable until a real
> use case pops up. Since we can make this private, we'll save us this
> additional indirection layer and can improve insertion/deletion time
> as well.
...
> +static bool rht_grow_above_75(const struct rhashtable *ht, size_t new_size)
>  {
>  	/* Expand table when exceeding 75% load */
>  	return atomic_read(&ht->nelems) > (new_size / 4 * 3) &&
>  	       (!ht->p.max_shift || atomic_read(&ht->shift) < ht->p.max_shift);
>  }

Why not cache the 'number of items before we need to expand' value
after each expansion, setting it to 'infinite' when expansion is disabled.
Then the above check is a simple comparison.
You probably don't even need an atomic_read() - provided something is
double checked once the first test determines that an expansion is needed.

	David

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 15:31 [PATCH net 0/2] rhashtable updates Daniel Borkmann
@ 2015-02-25 15:31 ` Daniel Borkmann
  2015-02-25 15:41   ` David Laight
  2015-02-25 17:23   ` Thomas Graf
  0 siblings, 2 replies; 31+ messages in thread
From: Daniel Borkmann @ 2015-02-25 15:31 UTC (permalink / raw)
  To: davem; +Cc: tgraf, pablo, johunt, kaber, netdev, Daniel Borkmann

Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.

It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.

Reference: http://patchwork.ozlabs.org/patch/443040/
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/rhashtable.h | 13 -----------
 lib/rhashtable.c           | 56 ++++++++++++++--------------------------------
 lib/test_rhashtable.c      |  1 +
 net/netfilter/nft_hash.c   |  2 --
 net/netlink/af_netlink.c   |  2 --
 net/tipc/socket.c          |  2 --
 6 files changed, 18 insertions(+), 58 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index cb2104b..d438eeb 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -79,12 +79,6 @@ struct rhashtable;
  * @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
  * @hashfn: Function to hash key
  * @obj_hashfn: Function to hash object
- * @grow_decision: If defined, may return true if table should expand
- * @shrink_decision: If defined, may return true if table should shrink
- *
- * Note: when implementing the grow and shrink decision function, min/max
- * shift must be enforced, otherwise, resizing watermarks they set may be
- * useless.
  */
 struct rhashtable_params {
 	size_t			nelem_hint;
@@ -98,10 +92,6 @@ struct rhashtable_params {
 	size_t			locks_mul;
 	rht_hashfn_t		hashfn;
 	rht_obj_hashfn_t	obj_hashfn;
-	bool			(*grow_decision)(const struct rhashtable *ht,
-						 size_t new_size);
-	bool			(*shrink_decision)(const struct rhashtable *ht,
-						   size_t new_size);
 };
 
 /**
@@ -193,9 +183,6 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params);
 void rhashtable_insert(struct rhashtable *ht, struct rhash_head *node);
 bool rhashtable_remove(struct rhashtable *ht, struct rhash_head *node);
 
-bool rht_grow_above_75(const struct rhashtable *ht, size_t new_size);
-bool rht_shrink_below_30(const struct rhashtable *ht, size_t new_size);
-
 int rhashtable_expand(struct rhashtable *ht);
 int rhashtable_shrink(struct rhashtable *ht);
 
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index bcf119b..090641d 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -247,26 +247,24 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
  * @ht:		hash table
  * @new_size:	new table size
  */
-bool rht_grow_above_75(const struct rhashtable *ht, size_t new_size)
+static bool rht_grow_above_75(const struct rhashtable *ht, size_t new_size)
 {
 	/* Expand table when exceeding 75% load */
 	return atomic_read(&ht->nelems) > (new_size / 4 * 3) &&
 	       (!ht->p.max_shift || atomic_read(&ht->shift) < ht->p.max_shift);
 }
-EXPORT_SYMBOL_GPL(rht_grow_above_75);
 
 /**
  * rht_shrink_below_30 - returns true if nelems < 0.3 * table-size
  * @ht:		hash table
  * @new_size:	new table size
  */
-bool rht_shrink_below_30(const struct rhashtable *ht, size_t new_size)
+static bool rht_shrink_below_30(const struct rhashtable *ht, size_t new_size)
 {
 	/* Shrink table beneath 30% load */
 	return atomic_read(&ht->nelems) < (new_size * 3 / 10) &&
 	       (atomic_read(&ht->shift) > ht->p.min_shift);
 }
-EXPORT_SYMBOL_GPL(rht_shrink_below_30);
 
 static void lock_buckets(struct bucket_table *new_tbl,
 			 struct bucket_table *old_tbl, unsigned int hash)
@@ -528,40 +526,19 @@ static void rht_deferred_worker(struct work_struct *work)
 	list_for_each_entry(walker, &ht->walkers, list)
 		walker->resize = true;
 
-	if (ht->p.grow_decision && ht->p.grow_decision(ht, tbl->size))
+	if (rht_grow_above_75(ht, tbl->size))
 		rhashtable_expand(ht);
-	else if (ht->p.shrink_decision && ht->p.shrink_decision(ht, tbl->size))
+	else if (rht_shrink_below_30(ht, tbl->size))
 		rhashtable_shrink(ht);
-
 unlock:
 	mutex_unlock(&ht->mutex);
 }
 
-static void rhashtable_probe_expand(struct rhashtable *ht)
-{
-	const struct bucket_table *new_tbl = rht_dereference_rcu(ht->future_tbl, ht);
-	const struct bucket_table *tbl = rht_dereference_rcu(ht->tbl, ht);
-
-	/* Only adjust the table if no resizing is currently in progress. */
-	if (tbl == new_tbl && ht->p.grow_decision &&
-	    ht->p.grow_decision(ht, tbl->size))
-		schedule_work(&ht->run_work);
-}
-
-static void rhashtable_probe_shrink(struct rhashtable *ht)
-{
-	const struct bucket_table *new_tbl = rht_dereference_rcu(ht->future_tbl, ht);
-	const struct bucket_table *tbl = rht_dereference_rcu(ht->tbl, ht);
-
-	/* Only adjust the table if no resizing is currently in progress. */
-	if (tbl == new_tbl && ht->p.shrink_decision &&
-	    ht->p.shrink_decision(ht, tbl->size))
-		schedule_work(&ht->run_work);
-}
-
 static void __rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj,
-				struct bucket_table *tbl, u32 hash)
+				struct bucket_table *tbl,
+				const struct bucket_table *old_tbl, u32 hash)
 {
+	bool no_resize_running = tbl == old_tbl;
 	struct rhash_head *head;
 
 	hash = rht_bucket_index(tbl, hash);
@@ -577,8 +554,8 @@ static void __rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj,
 	rcu_assign_pointer(tbl->buckets[hash], obj);
 
 	atomic_inc(&ht->nelems);
-
-	rhashtable_probe_expand(ht);
+	if (no_resize_running && rht_grow_above_75(ht, tbl->size))
+		schedule_work(&ht->run_work);
 }
 
 /**
@@ -608,7 +585,7 @@ void rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj)
 	hash = obj_raw_hashfn(ht, rht_obj(ht, obj));
 
 	lock_buckets(tbl, old_tbl, hash);
-	__rhashtable_insert(ht, obj, tbl, hash);
+	__rhashtable_insert(ht, obj, tbl, old_tbl, hash);
 	unlock_buckets(tbl, old_tbl, hash);
 
 	rcu_read_unlock();
@@ -690,8 +667,11 @@ found:
 	unlock_buckets(new_tbl, old_tbl, new_hash);
 
 	if (ret) {
+		bool no_resize_running = new_tbl == old_tbl;
+
 		atomic_dec(&ht->nelems);
-		rhashtable_probe_shrink(ht);
+		if (no_resize_running && rht_shrink_below_30(ht, new_tbl->size))
+			schedule_work(&ht->run_work);
 	}
 
 	rcu_read_unlock();
@@ -861,7 +841,7 @@ bool rhashtable_lookup_compare_insert(struct rhashtable *ht,
 		goto exit;
 	}
 
-	__rhashtable_insert(ht, obj, new_tbl, new_hash);
+	__rhashtable_insert(ht, obj, new_tbl, old_tbl, new_hash);
 
 exit:
 	unlock_buckets(new_tbl, old_tbl, new_hash);
@@ -1123,8 +1103,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
 	if (!ht->p.hash_rnd)
 		get_random_bytes(&ht->p.hash_rnd, sizeof(ht->p.hash_rnd));
 
-	if (ht->p.grow_decision || ht->p.shrink_decision)
-		INIT_WORK(&ht->run_work, rht_deferred_worker);
+	INIT_WORK(&ht->run_work, rht_deferred_worker);
 
 	return 0;
 }
@@ -1142,8 +1121,7 @@ void rhashtable_destroy(struct rhashtable *ht)
 {
 	ht->being_destroyed = true;
 
-	if (ht->p.grow_decision || ht->p.shrink_decision)
-		cancel_work_sync(&ht->run_work);
+	cancel_work_sync(&ht->run_work);
 
 	mutex_lock(&ht->mutex);
 	bucket_table_free(rht_dereference(ht->tbl, ht));
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index f9e9d73..67c7593 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -201,6 +201,7 @@ static int __init test_rht_init(void)
 		.key_offset = offsetof(struct test_obj, value),
 		.key_len = sizeof(int),
 		.hashfn = jhash,
+		.max_shift = 1, /* we expand/shrink manually here */
 		.nulls_base = (3U << RHT_BASE_SHIFT),
 	};
 	int err;
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 61e6c40..c82df0a 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -192,8 +192,6 @@ static int nft_hash_init(const struct nft_set *set,
 		.key_offset = offsetof(struct nft_hash_elem, key),
 		.key_len = set->klen,
 		.hashfn = jhash,
-		.grow_decision = rht_grow_above_75,
-		.shrink_decision = rht_shrink_below_30,
 	};
 
 	return rhashtable_init(priv, &params);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2702673..05919bf 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -3126,8 +3126,6 @@ static int __init netlink_proto_init(void)
 		.key_len = sizeof(u32), /* portid */
 		.hashfn = jhash,
 		.max_shift = 16, /* 64K */
-		.grow_decision = rht_grow_above_75,
-		.shrink_decision = rht_shrink_below_30,
 	};
 
 	if (err != 0)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index f73e975..b4d4467 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2364,8 +2364,6 @@ int tipc_sk_rht_init(struct net *net)
 		.hashfn = jhash,
 		.max_shift = 20, /* 1M */
 		.min_shift = 8,  /* 256 */
-		.grow_decision = rht_grow_above_75,
-		.shrink_decision = rht_shrink_below_30,
 	};
 
 	return rhashtable_init(&tn->sk_rht, &rht_params);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-03-13  7:07 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-25 22:09 [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions Alexei Starovoitov
2015-02-26  7:53 ` Patrick McHardy
2015-02-26  8:54   ` Daniel Borkmann
2015-02-26 14:50   ` Eric Dumazet
2015-02-26 14:54     ` Patrick McHardy
2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
2015-02-26 15:36       ` David Laight
2015-02-26 15:46         ` Patrick McHardy
2015-02-26 16:03           ` David Laight
2015-02-26 15:50       ` Daniel Borkmann
2015-02-27 22:55       ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2015-02-25 15:31 [PATCH net 0/2] rhashtable updates Daniel Borkmann
2015-02-25 15:31 ` [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions Daniel Borkmann
2015-02-25 15:41   ` David Laight
2015-02-25 15:46     ` Daniel Borkmann
2015-02-25 15:51       ` David Laight
2015-02-25 16:14         ` Daniel Borkmann
2015-02-25 17:41           ` Eric Dumazet
2015-02-25 17:49             ` David Laight
2015-02-25 18:15               ` Eric Dumazet
2015-02-25 18:56             ` Daniel Borkmann
2015-02-25 19:52               ` Eric Dumazet
2015-02-25 20:10                 ` Patrick McHardy
2015-02-26 10:02             ` tgraf
2015-02-27 22:30               ` David Miller
2015-02-28  0:48                 ` Patrick McHardy
2015-02-28  1:35                   ` David Miller
2015-02-26 14:18             ` David Laight
2015-03-11  6:42             ` Herbert Xu
2015-03-12 16:57               ` Thomas Graf
2015-03-13  7:06                 ` Herbert Xu
2015-02-25 17:23   ` Thomas Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.