All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
@ 2015-02-25 22:09 Alexei Starovoitov
  2015-02-26  7:53 ` Patrick McHardy
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2015-02-25 22:09 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Eric Dumazet, Daniel Borkmann, David Laight, davem, tgraf, pablo,
	johunt, netdev

On Wed, Feb 25, 2015 at 12:10 PM, Patrick McHardy <kaber@trash.net> wrote:
> On 25.02, Eric Dumazet wrote:
>> But if any workload had to grow the table to 2^20 slots, we had to
>> consume GB of memory anyway to hold sockets and everything.
>>
>> Trying to shrink is simply not worth it, unless you expect your host
>> never reboots and you desperately need back these 8 MBytes of memory.
>
> That may be true in the TCP case, but for not for nftables. We might
> have many sets and, especially when used to represent more complicated
> classification algorithms, their size might change by a lot.

sounds like grow/shrink decision cannot be generalized within
rhashtable, but two callbacks are about to be removed and the
are costly. So would it make sense to disable auto-expand/shrink
completely and let nft/tcp call expand/shrink when needed?
nft can potentially do smarter batching this way.
If it sees a lot of entries are about to be inserted, it can call
expand directly to quickly grow sparsely populated table
into large one, and then insert all the entries.
That will mitigate 'slow rcu' issue as well.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-25 22:09 [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions Alexei Starovoitov
@ 2015-02-26  7:53 ` Patrick McHardy
  2015-02-26  8:54   ` Daniel Borkmann
  2015-02-26 14:50   ` Eric Dumazet
  0 siblings, 2 replies; 11+ messages in thread
From: Patrick McHardy @ 2015-02-26  7:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Daniel Borkmann, David Laight, davem, tgraf, pablo,
	johunt, netdev

On 25.02, Alexei Starovoitov wrote:
> On Wed, Feb 25, 2015 at 12:10 PM, Patrick McHardy <kaber@trash.net> wrote:
> > On 25.02, Eric Dumazet wrote:
> >> But if any workload had to grow the table to 2^20 slots, we had to
> >> consume GB of memory anyway to hold sockets and everything.
> >>
> >> Trying to shrink is simply not worth it, unless you expect your host
> >> never reboots and you desperately need back these 8 MBytes of memory.
> >
> > That may be true in the TCP case, but for not for nftables. We might
> > have many sets and, especially when used to represent more complicated
> > classification algorithms, their size might change by a lot.
> 
> sounds like grow/shrink decision cannot be generalized within
> rhashtable, but two callbacks are about to be removed and the
> are costly. So would it make sense to disable auto-expand/shrink
> completely and let nft/tcp call expand/shrink when needed?

My understanding was that Eric was arguing against shrinking in general.
But assuming we have it, what's the downside of also performing
shrinking for TCP?

> nft can potentially do smarter batching this way.
> If it sees a lot of entries are about to be inserted, it can call
> expand directly to quickly grow sparsely populated table
> into large one, and then insert all the entries.
> That will mitigate 'slow rcu' issue as well.

I like that idea.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-26  7:53 ` Patrick McHardy
@ 2015-02-26  8:54   ` Daniel Borkmann
  2015-02-26 14:50   ` Eric Dumazet
  1 sibling, 0 replies; 11+ messages in thread
From: Daniel Borkmann @ 2015-02-26  8:54 UTC (permalink / raw)
  To: Patrick McHardy, Alexei Starovoitov
  Cc: Eric Dumazet, David Laight, davem, tgraf, pablo, johunt, netdev

On 02/26/2015 08:53 AM, Patrick McHardy wrote:
> On 25.02, Alexei Starovoitov wrote:
>> On Wed, Feb 25, 2015 at 12:10 PM, Patrick McHardy <kaber@trash.net> wrote:
>>> On 25.02, Eric Dumazet wrote:
>>>> But if any workload had to grow the table to 2^20 slots, we had to
>>>> consume GB of memory anyway to hold sockets and everything.
>>>>
>>>> Trying to shrink is simply not worth it, unless you expect your host
>>>> never reboots and you desperately need back these 8 MBytes of memory.
>>>
>>> That may be true in the TCP case, but for not for nftables. We might
>>> have many sets and, especially when used to represent more complicated
>>> classification algorithms, their size might change by a lot.
>>
>> sounds like grow/shrink decision cannot be generalized within
>> rhashtable, but two callbacks are about to be removed and the
>> are costly. So would it make sense to disable auto-expand/shrink
>> completely and let nft/tcp call expand/shrink when needed?
>
> My understanding was that Eric was arguing against shrinking in general.
> But assuming we have it, what's the downside of also performing
> shrinking for TCP?
>
>> nft can potentially do smarter batching this way.
>> If it sees a lot of entries are about to be inserted, it can call
>> expand directly to quickly grow sparsely populated table
>> into large one, and then insert all the entries.
>> That will mitigate 'slow rcu' issue as well.
>
> I like that idea.

I think shrinking/expanding could still be configurable when we
get there. Perhaps as a flag parameter, definitely something more
lightweight at least, as both grow/shrink decision functions seem
to be quite reusable and could therefore stay private.

Perhaps those users that want to specifically optimize grow/shrink
could then disallow auto-expand/shrink from within rhashtable (via
initialization parameters) and could use the APIs directly, which
we need to expose then. That way we can keep it simple for netlink,
tipc and what else pops up.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-26  7:53 ` Patrick McHardy
  2015-02-26  8:54   ` Daniel Borkmann
@ 2015-02-26 14:50   ` Eric Dumazet
  2015-02-26 14:54     ` Patrick McHardy
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
  1 sibling, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2015-02-26 14:50 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Alexei Starovoitov, Daniel Borkmann, David Laight, davem, tgraf,
	pablo, johunt, netdev

On Thu, 2015-02-26 at 07:53 +0000, Patrick McHardy wrote:

> My understanding was that Eric was arguing against shrinking in general.
> But assuming we have it, what's the downside of also performing
> shrinking for TCP?

Hash resize is horribly expensive (this is general to all hash
implementations), so once you took the risk to expand the table once,
you do not want to take the risk another time. You are lucky if host was
not already taken out of the server pool.

Fact that cond_resched() is not yet used anywhere in lib/rhashtable.c
reveals part of the problem we have here.

(I'll submit patches for that)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions
  2015-02-26 14:50   ` Eric Dumazet
@ 2015-02-26 14:54     ` Patrick McHardy
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
  1 sibling, 0 replies; 11+ messages in thread
From: Patrick McHardy @ 2015-02-26 14:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Daniel Borkmann, David Laight, davem, tgraf,
	pablo, johunt, netdev

On 26.02, Eric Dumazet wrote:
> On Thu, 2015-02-26 at 07:53 +0000, Patrick McHardy wrote:
> 
> > My understanding was that Eric was arguing against shrinking in general.
> > But assuming we have it, what's the downside of also performing
> > shrinking for TCP?
> 
> Hash resize is horribly expensive (this is general to all hash
> implementations), so once you took the risk to expand the table once,
> you do not want to take the risk another time. You are lucky if host was
> not already taken out of the server pool.

Well, it's async, so I'd hope it wouldn't take you out of your server
pool. But sure, for TCP your point seems reasonable, as long as we'll
keep it available for nftables I'm happy.

> Fact that cond_resched() is not yet used anywhere in lib/rhashtable.c
> reveals part of the problem we have here.
> 
> (I'll submit patches for that)

Reminds me that I also need this for nft set GC, thanks :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net] rhashtable: use cond_resched()
  2015-02-26 14:50   ` Eric Dumazet
  2015-02-26 14:54     ` Patrick McHardy
@ 2015-02-26 15:20     ` Eric Dumazet
  2015-02-26 15:36       ` David Laight
                         ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Eric Dumazet @ 2015-02-26 15:20 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Alexei Starovoitov, Daniel Borkmann, David Laight, davem, tgraf,
	pablo, johunt, netdev

From: Eric Dumazet <edumazet@google.com>

If a hash table has 128 slots and 16384 elems, expand to 256 slots
takes more than one second. For larger sets, a soft lockup is detected.

Holding cpu for that long, even in a work queue is a show stopper
for non preemptable kernels.

cond_resched() at strategic points to allow process scheduler
to reschedule us.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 lib/rhashtable.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index e3a04e4b3ec5..f3073fc600ed 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -17,6 +17,7 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/log2.h>
+#include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/mm.h>
@@ -414,6 +415,7 @@ int rhashtable_expand(struct rhashtable *ht)
 			}
 		}
 		unlock_buckets(new_tbl, old_tbl, new_hash);
+		cond_resched();
 	}
 
 	/* Unzip interleaved hash chains */
@@ -437,6 +439,7 @@ int rhashtable_expand(struct rhashtable *ht)
 				complete = false;
 
 			unlock_buckets(new_tbl, old_tbl, old_hash);
+			cond_resched();
 		}
 	}
 
@@ -495,6 +498,7 @@ int rhashtable_shrink(struct rhashtable *ht)
 				   tbl->buckets[new_hash + new_tbl->size]);
 
 		unlock_buckets(new_tbl, tbl, new_hash);
+		cond_resched();
 	}
 
 	/* Publish the new, valid hash table */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
@ 2015-02-26 15:36       ` David Laight
  2015-02-26 15:46         ` Patrick McHardy
  2015-02-26 15:50       ` Daniel Borkmann
  2015-02-27 22:55       ` David Miller
  2 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2015-02-26 15:36 UTC (permalink / raw)
  To: 'Eric Dumazet', Patrick McHardy
  Cc: Alexei Starovoitov, Daniel Borkmann, davem, tgraf, pablo, johunt, netdev

From: Eric Dumazet 
Sent: 26 February 2015 15:21
> If a hash table has 128 slots and 16384 elems, expand to 256 slots
> takes more than one second. For larger sets, a soft lockup is detected.

What on earth is it doing?
Presumably something to do with the rcu actions needed to allow
lockless lookup during resize.

There has to be a better solution?
Perhaps even two sets of chain pointers down the hash lists.
Then the old hash table can be kept completely valid while the
whole 'unzip' action is done.

	David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:36       ` David Laight
@ 2015-02-26 15:46         ` Patrick McHardy
  2015-02-26 16:03           ` David Laight
  0 siblings, 1 reply; 11+ messages in thread
From: Patrick McHardy @ 2015-02-26 15:46 UTC (permalink / raw)
  To: David Laight
  Cc: 'Eric Dumazet',
	Alexei Starovoitov, Daniel Borkmann, davem, tgraf, pablo, johunt,
	netdev

On 26.02, David Laight wrote:
> From: Eric Dumazet 
> Sent: 26 February 2015 15:21
> > If a hash table has 128 slots and 16384 elems, expand to 256 slots
> > takes more than one second. For larger sets, a soft lockup is detected.
> 
> What on earth is it doing?
> Presumably something to do with the rcu actions needed to allow
> lockless lookup during resize.
> 
> There has to be a better solution?
> Perhaps even two sets of chain pointers down the hash lists.
> Then the old hash table can be kept completely valid while the
> whole 'unzip' action is done.

One of the main points of rhashtable is that you don't need those.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
  2015-02-26 15:36       ` David Laight
@ 2015-02-26 15:50       ` Daniel Borkmann
  2015-02-27 22:55       ` David Miller
  2 siblings, 0 replies; 11+ messages in thread
From: Daniel Borkmann @ 2015-02-26 15:50 UTC (permalink / raw)
  To: Eric Dumazet, Patrick McHardy
  Cc: Alexei Starovoitov, David Laight, davem, tgraf, pablo, johunt, netdev

On 02/26/2015 04:20 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> If a hash table has 128 slots and 16384 elems, expand to 256 slots
> takes more than one second. For larger sets, a soft lockup is detected.
>
> Holding cpu for that long, even in a work queue is a show stopper
> for non preemptable kernels.
>
> cond_resched() at strategic points to allow process scheduler
> to reschedule us.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:46         ` Patrick McHardy
@ 2015-02-26 16:03           ` David Laight
  0 siblings, 0 replies; 11+ messages in thread
From: David Laight @ 2015-02-26 16:03 UTC (permalink / raw)
  To: 'Patrick McHardy'
  Cc: 'Eric Dumazet',
	Alexei Starovoitov, Daniel Borkmann, davem, tgraf, pablo, johunt,
	netdev

From: Patrick McHardy [mailto:kaber@trash.net]
> On 26.02, David Laight wrote:
> > From: Eric Dumazet
> > Sent: 26 February 2015 15:21
> > > If a hash table has 128 slots and 16384 elems, expand to 256 slots
> > > takes more than one second. For larger sets, a soft lockup is detected.
> >
> > What on earth is it doing?
> > Presumably something to do with the rcu actions needed to allow
> > lockless lookup during resize.
> >
> > There has to be a better solution?
> > Perhaps even two sets of chain pointers down the hash lists.
> > Then the old hash table can be kept completely valid while the
> > whole 'unzip' action is done.
> 
> One of the main points of rhashtable is that you don't need those.

Would it be possible to keep the hash chains sorted into 'raw hash' order?
I think that would mean the 'unzip' only had to change a single linkage.

	David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] rhashtable: use cond_resched()
  2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
  2015-02-26 15:36       ` David Laight
  2015-02-26 15:50       ` Daniel Borkmann
@ 2015-02-27 22:55       ` David Miller
  2 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2015-02-27 22:55 UTC (permalink / raw)
  To: eric.dumazet
  Cc: kaber, alexei.starovoitov, daniel, David.Laight, tgraf, pablo,
	johunt, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 26 Feb 2015 07:20:34 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> If a hash table has 128 slots and 16384 elems, expand to 256 slots
> takes more than one second. For larger sets, a soft lockup is detected.
> 
> Holding cpu for that long, even in a work queue is a show stopper
> for non preemptable kernels.
> 
> cond_resched() at strategic points to allow process scheduler
> to reschedule us.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-02-27 22:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-25 22:09 [PATCH net 2/2] rhashtable: remove indirection for grow/shrink decision functions Alexei Starovoitov
2015-02-26  7:53 ` Patrick McHardy
2015-02-26  8:54   ` Daniel Borkmann
2015-02-26 14:50   ` Eric Dumazet
2015-02-26 14:54     ` Patrick McHardy
2015-02-26 15:20     ` [PATCH net] rhashtable: use cond_resched() Eric Dumazet
2015-02-26 15:36       ` David Laight
2015-02-26 15:46         ` Patrick McHardy
2015-02-26 16:03           ` David Laight
2015-02-26 15:50       ` Daniel Borkmann
2015-02-27 22:55       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.