All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
@ 2012-06-26  7:21 David Miller
  2012-06-26  7:39 ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2012-06-26  7:21 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet


IPv4 routing cache entries no longer use dst->expires, because the
metrics, PMTU, and redirect information are stored in the inetpeer
cache.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

Eric, when you did commit 9f28a2fc0bd77511f649c0a788c7bf9a5fd04edb
(ipv4: reintroduce route cache garbage collector) do you remember
if the thing we needed was the real expiry or both the
rt_is_expired() and the rt_may_expire() cases?

I really want to remove rt_may_expire() from this conditional because
it results in absolutely stupid behavior.  If your system is idle
for 5 minutes, all of your input routing cache entries are purged.

 net/ipv4/route.c |   34 +++++++++++-----------------------
 1 file changed, 11 insertions(+), 23 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8d62d85..846961c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -870,34 +870,22 @@ static void rt_check_expire(void)
 		while ((rth = rcu_dereference_protected(*rthp,
 					lockdep_is_held(rt_hash_lock_addr(i)))) != NULL) {
 			prefetch(rth->dst.rt_next);
-			if (rt_is_expired(rth)) {
+			if (rt_is_expired(rth) ||
+			    rt_may_expire(rth, tmo, ip_rt_gc_timeout)) {
 				*rthp = rth->dst.rt_next;
 				rt_free(rth);
 				continue;
 			}
-			if (rth->dst.expires) {
-				/* Entry is expired even if it is in use */
-				if (time_before_eq(jiffies, rth->dst.expires)) {
-nofree:
-					tmo >>= 1;
-					rthp = &rth->dst.rt_next;
-					/*
-					 * We only count entries on
-					 * a chain with equal hash inputs once
-					 * so that entries for different QOS
-					 * levels, and other non-hash input
-					 * attributes don't unfairly skew
-					 * the length computation
-					 */
-					length += has_noalias(rt_hash_table[i].chain, rth);
-					continue;
-				}
-			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout))
-				goto nofree;
 
-			/* Cleanup aged off entries. */
-			*rthp = rth->dst.rt_next;
-			rt_free(rth);
+			/* We only count entries on a chain with equal
+			 * hash inputs once so that entries for
+			 * different QOS levels, and other non-hash
+			 * input attributes don't unfairly skew the
+			 * length computation
+			 */
+			tmo >>= 1;
+			rthp = &rth->dst.rt_next;
+			length += has_noalias(rt_hash_table[i].chain, rth);
 		}
 		spin_unlock_bh(rt_hash_lock_addr(i));
 		sum += length;
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
  2012-06-26  7:21 [PATCH] ipv4: Remove unnecessary code from rt_check_expire() David Miller
@ 2012-06-26  7:39 ` Eric Dumazet
  2012-06-26  7:46   ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-06-26  7:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, 2012-06-26 at 00:21 -0700, David Miller wrote:
> IPv4 routing cache entries no longer use dst->expires, because the
> metrics, PMTU, and redirect information are stored in the inetpeer
> cache.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
> 
> Eric, when you did commit 9f28a2fc0bd77511f649c0a788c7bf9a5fd04edb
> (ipv4: reintroduce route cache garbage collector) do you remember
> if the thing we needed was the real expiry or both the
> rt_is_expired() and the rt_may_expire() cases?
> 
> I really want to remove rt_may_expire() from this conditional because
> it results in absolutely stupid behavior.  If your system is idle
> for 5 minutes, all of your input routing cache entries are purged.

Hmm, after a DDOS, purging all those routing cache entries in 5 minutes
is good to recover some Mbytes of kernel memory.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
  2012-06-26  7:39 ` Eric Dumazet
@ 2012-06-26  7:46   ` David Miller
  2012-06-26  7:58     ` Eric Dumazet
  2012-06-26  8:23     ` Eric Dumazet
  0 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2012-06-26  7:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 26 Jun 2012 09:39:58 +0200

> On Tue, 2012-06-26 at 00:21 -0700, David Miller wrote:
>> IPv4 routing cache entries no longer use dst->expires, because the
>> metrics, PMTU, and redirect information are stored in the inetpeer
>> cache.
>> 
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>> ---
>> 
>> Eric, when you did commit 9f28a2fc0bd77511f649c0a788c7bf9a5fd04edb
>> (ipv4: reintroduce route cache garbage collector) do you remember
>> if the thing we needed was the real expiry or both the
>> rt_is_expired() and the rt_may_expire() cases?
>> 
>> I really want to remove rt_may_expire() from this conditional because
>> it results in absolutely stupid behavior.  If your system is idle
>> for 5 minutes, all of your input routing cache entries are purged.
> 
> Hmm, after a DDOS, purging all those routing cache entries in 5 minutes
> is good to recover some Mbytes of kernel memory.

And for legitimate traffic it's completely the wrong thing to do.

There is absolutely zero reason to pure valid entries when hash chains
average length of one.

I've been monitoring routing cache activity, and it's the height of
stupidity.  Every 5 minutes we pure, and then they all get regenerated
again.

Routing cache entries are expensive to recreate, it's much easier to
just keep them around then to potentially eat four full trie lookups
because that's what it will cost to reconstitute those guys.

But regardless, could you actually answer the question I asked of you?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
  2012-06-26  7:46   ` David Miller
@ 2012-06-26  7:58     ` Eric Dumazet
  2012-06-26  8:23     ` Eric Dumazet
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2012-06-26  7:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, 2012-06-26 at 00:46 -0700, David Miller wrote:

> But regardless, could you actually answer the question I asked of you?

I did a revert, no more no less.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
  2012-06-26  7:46   ` David Miller
  2012-06-26  7:58     ` Eric Dumazet
@ 2012-06-26  8:23     ` Eric Dumazet
  2012-06-26  8:37       ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-06-26  8:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, 2012-06-26 at 00:46 -0700, David Miller wrote:

> And for legitimate traffic it's completely the wrong thing to do.
> 
> There is absolutely zero reason to pure valid entries when hash chains
> average length of one.
> 
> I've been monitoring routing cache activity, and it's the height of
> stupidity.  Every 5 minutes we pure, and then they all get regenerated
> again.
> 

Thats because gc_interval (60) is big compared to ip_rt_gc_timeout
(300)

So each time rt_check_expire() triggers, we handle a big part of the
cache. On big servers I had to lower gc_interval to smooth things.

Garbage collect is needed to not waste kernel memory, even on legitimate
traffic on a typical web server.

Taken from my 8GB machine : 

# cat /proc/sys/net/ipv4/route/gc_thresh
262144

320 bytes per dst : 262144*320 = 83886080 bytes to store one dst per hash chain.

Also, why keeping a dst in cache if no traffic uses it in a 5 minutes period ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
  2012-06-26  8:23     ` Eric Dumazet
@ 2012-06-26  8:37       ` David Miller
  2012-06-26  8:46         ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2012-06-26  8:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 26 Jun 2012 10:23:04 +0200

> Thats because gc_interval (60) is big compared to ip_rt_gc_timeout
> (300)
> 
> So each time rt_check_expire() triggers, we handle a big part of the
> cache. On big servers I had to lower gc_interval to smooth things.

I know it's stupid, that's why I want to eventually kill this off
completely.

> Garbage collect is needed to not waste kernel memory, even on legitimate
> traffic on a typical web server.
> 
> Taken from my 8GB machine : 
> 
> # cat /proc/sys/net/ipv4/route/gc_thresh
> 262144
> 
> 320 bytes per dst : 262144*320 = 83886080 bytes to store one dst per hash chain.

83MB on a machine with 8GB of ram that does enough networking to fill
the routing cache.  This sounds absolutely reasonable to me.

> Also, why keeping a dst in cache if no traffic uses it in a 5 minutes period ?

Traffic is bursty and periodic.

And think, we don't do any stupidity like this for the inetpeer cache
and no small cute animals have died as a result.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
  2012-06-26  8:37       ` David Miller
@ 2012-06-26  8:46         ` Eric Dumazet
  2012-06-26  8:56           ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-06-26  8:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, 2012-06-26 at 01:37 -0700, David Miller wrote:

> And think, we don't do any stupidity like this for the inetpeer cache
> and no small cute animals have died as a result.

Thats because inetpeer is kept small.

We do have a smart gc on inetpeer cache (inet_peer_gc()),
and inetpeer threshold is 65536 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
  2012-06-26  8:46         ` Eric Dumazet
@ 2012-06-26  8:56           ` David Miller
  2012-06-26  9:11             ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2012-06-26  8:56 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 26 Jun 2012 10:46:48 +0200

> On Tue, 2012-06-26 at 01:37 -0700, David Miller wrote:
> 
>> And think, we don't do any stupidity like this for the inetpeer cache
>> and no small cute animals have died as a result.
> 
> Thats because inetpeer is kept small.
> 
> We do have a smart gc on inetpeer cache (inet_peer_gc()),
> and inetpeer threshold is 65536 

Fair enough.

Keep in mind that rt_check_expire() was written before we had realtime
GC.  It used to be main component which kept hash chains of reasonable
size before real ->gc() triggers.

I have about 15 entries in my routing cache, there is no reason they
should be purged.

And consider TCP before early demux, so a connection doing financial
trades is idle for 5 minutes.  Do you really think the next trade
should have this pointless added latency just because we can't be
bothered to make rt_check_expired() smarter?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
  2012-06-26  8:56           ` David Miller
@ 2012-06-26  9:11             ` Eric Dumazet
  2012-06-26  9:43               ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-06-26  9:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, 2012-06-26 at 01:56 -0700, David Miller wrote:

> Keep in mind that rt_check_expire() was written before we had realtime
> GC.  It used to be main component which kept hash chains of reasonable
> size before real ->gc() triggers.
> 
> I have about 15 entries in my routing cache, there is no reason they
> should be purged.
> 
> And consider TCP before early demux, so a connection doing financial
> trades is idle for 5 minutes.  Do you really think the next trade
> should have this pointless added latency just because we can't be
> bothered to make rt_check_expired() smarter?

We did a lot of work in this area, for example making this gc running in
process context instead of timer, and removing the ip_rt_secret_interval
that was invalidating the whole cache every 10 minutes.

I think I stopped trying to improve route cache because your intention
was to get rid of it.

Now it seems we should keep it for a while, so it makes sense to add
more fuel on it ;)

About financial guys, they probably are smart enough to :

echo bigvalue >/proc/sys/net/ipv4/route/gc_timeout

(although the cache misses on old cached dst might kill their latency
anyway)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
  2012-06-26  9:11             ` Eric Dumazet
@ 2012-06-26  9:43               ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2012-06-26  9:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 26 Jun 2012 11:11:33 +0200

> I think I stopped trying to improve route cache because your intention
> was to get rid of it.
> 
> Now it seems we should keep it for a while, so it makes sense to add
> more fuel on it ;)

Removal is a long term project, so we can still make some short-
term tweaks :-)

> About financial guys, they probably are smart enough to :
> 
> echo bigvalue >/proc/sys/net/ipv4/route/gc_timeout

Every excess knob adjustment is a failure on our part.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-06-26  9:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-26  7:21 [PATCH] ipv4: Remove unnecessary code from rt_check_expire() David Miller
2012-06-26  7:39 ` Eric Dumazet
2012-06-26  7:46   ` David Miller
2012-06-26  7:58     ` Eric Dumazet
2012-06-26  8:23     ` Eric Dumazet
2012-06-26  8:37       ` David Miller
2012-06-26  8:46         ` Eric Dumazet
2012-06-26  8:56           ` David Miller
2012-06-26  9:11             ` Eric Dumazet
2012-06-26  9:43               ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.