From: Eric Dumazet <dada1@cosmosbay.com>
To: paulmck@us.ibm.com
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
Linus Torvalds <torvalds@osdl.org>,
linux-kernel@vger.kernel.org,
"David S. Miller" <davem@davemloft.net>,
Dipankar Sarma <dipankar@in.ibm.com>,
Manfred Spraul <manfred@colorfullife.com>,
netdev@vger.kernel.org
Subject: Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
Date: Fri, 06 Jan 2006 18:19:15 +0100 [thread overview]
Message-ID: <43BEA693.5010509@cosmosbay.com> (raw)
In-Reply-To: <20060106164702.GA5087@us.ibm.com>
Paul E. McKenney a écrit :
> On Fri, Jan 06, 2006 at 01:37:12PM +0000, Alan Cox wrote:
>> On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
>>> I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest
>>> entry cannot still be in use by another CPU. This might sounds as a violation
>>> of RCU rules, (I'm not an RCU expert) but seems quite reasonable.
>> Fixing the real problem in the routing code would be the real fix.
>>
>> The underlying problem of RCU and memory usage could be solved more
>> safely by making sure that the sleeping memory allocator path always
>> waits until at least one RCU cleanup has occurred after it fails an
>> allocation before it starts trying harder. That ought to also naturally
>> throttle memory consumers more in the situation which is the right
>> behaviour.
>
> A quick look at rt_garbage_collect() leads me to believe that although
> the IP route cache does try to limit its use of memory, it does not
> fully account for memory that it has released to RCU, but that RCU has
> not yet freed due to a grace period not having elapsed.
>
> The following appears to be possible:
>
> 1. rt_garbage_collect() sees that there are too many entries,
> and sets "goal" to the number to free up, based on a
> computed "equilibrium" value.
>
> 2. The number of entries is (correctly) decremented only when
> the corresponding RCU callback is invoked, which actually
> frees the entry.
>
> 3. Between the time that rt_garbage_collect() is invoked the
> first time and when the RCU grace period ends, rt_garbage_collect()
> is invoked again. It still sees too many entries (since
> RCU has not yet freed the ones released by the earlier
> invocation in step (1) above), so frees a bunch more.
>
> 4. Packets routed now miss the route cache, because the corresponding
> entries are waiting for a grace period, slowing the system down.
> Therefore, even more entries are freed to make room for new
> entries corresponding to the new packets.
>
> If my (likely quite naive) reading of the IP route cache code is correct,
> it would be possible to end up in a steady state with most of the entries
> always being in RCU rather than in the route cache.
>
> Eric, could this be what is happening to your system?
>
> If it is, one straightforward fix would be to keep a count of the number
> of route-cache entries waiting on RCU, and for rt_garbage_collect()
> to subtract this number of entries from its goal. Does this make sense?
>
Hi Paul
Thanks for reviewing route code :)
As I said, the problem comes from 'route flush cache', that is periodically
done by rt_run_flush(), triggered by rt_flush_timer.
The 10% of LOWMEM ram that was used by route-cache entries are pushed into rcu
queues (with call_rcu_bh()) and network continue to receive
packets from *many* sources that want their route-cache entry.
Eric
next prev parent reply other threads:[~2006-01-06 17:23 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20060105235845.967478000@sorel.sous-sol.org>
2006-01-05 21:47 ` [PATCH 0/6] -stable review Chris Wright
2006-01-06 0:45 ` [PATCH 1/6] drivers/net/sungem.c: gem_remove_one mustnt be __devexit Chris Wright
2006-01-06 0:45 ` [PATCH 2/6] ieee80211_crypt_tkip depends on NET_RADIO Chris Wright
2006-01-06 0:45 ` [PATCH 3/6] Insanity avoidance in /proc (CVE-2005-4605) Chris Wright
2006-01-06 0:45 ` [PATCH 4/6] sysctl: dont overflow the user-supplied buffer with 0 Chris Wright
2006-01-06 1:30 ` Linus Torvalds
2006-01-06 3:40 ` Chris Wright
2006-01-06 10:17 ` [PATCH, RFC] RCU : OOM avoidance and lower latency Eric Dumazet
2006-01-06 12:52 ` [PATCH, RFC] RCU : OOM avoidance and lower latency (Version 2), HOTPLUG_CPU fix Eric Dumazet
2006-01-06 12:58 ` [PATCH, RFC] RCU : OOM avoidance and lower latency Andi Kleen
2006-01-06 13:09 ` Eric Dumazet
2006-01-06 19:26 ` Lee Revell
2006-01-06 22:18 ` Andi Kleen
2006-01-06 13:37 ` Alan Cox
2006-01-06 14:00 ` Eric Dumazet
2006-01-06 14:45 ` Alan Cox
2006-01-06 16:47 ` Paul E. McKenney
2006-01-06 17:19 ` Eric Dumazet [this message]
2006-01-06 20:26 ` Paul E. McKenney
2006-01-06 20:33 ` David S. Miller
2006-01-06 20:57 ` Andi Kleen
2006-01-07 0:17 ` David S. Miller
2006-01-07 1:09 ` Andi Kleen
2006-01-07 7:10 ` David S. Miller
2006-01-07 7:34 ` Eric Dumazet
2006-01-07 7:44 ` David S. Miller
2006-01-07 7:53 ` Eric Dumazet
2006-01-07 8:36 ` David S. Miller
2006-01-07 20:30 ` Paul E. McKenney
2006-01-07 8:30 ` Eric Dumazet
2006-01-06 19:24 ` Lee Revell
2006-01-06 0:46 ` [PATCH 5/6] UFS: inode->i_sem is not released in error path Chris Wright
2006-01-06 0:46 ` [PATCH 6/6] [ATYFB]: Fix onboard video on SPARC Blade 100 for 2.6.{13,14,15} Chris Wright
2006-01-06 0:53 ` [PATCH 0/6] -stable review Chris Wright
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43BEA693.5010509@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=davem@davemloft.net \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=netdev@vger.kernel.org \
--cc=paulmck@us.ibm.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).