linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: paulmck@us.ibm.com
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Linus Torvalds <torvalds@osdl.org>,
	linux-kernel@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Dipankar Sarma <dipankar@in.ibm.com>,
	Manfred Spraul <manfred@colorfullife.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
Date: Fri, 06 Jan 2006 18:19:15 +0100	[thread overview]
Message-ID: <43BEA693.5010509@cosmosbay.com> (raw)
In-Reply-To: <20060106164702.GA5087@us.ibm.com>

Paul E. McKenney a écrit :
> On Fri, Jan 06, 2006 at 01:37:12PM +0000, Alan Cox wrote:
>> On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
>>> I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest 
>>> entry cannot still be in use by another CPU. This might sounds as a violation 
>>> of RCU rules, (I'm not an RCU expert) but seems quite reasonable.
>> Fixing the real problem in the routing code would be the real fix. 
>>
>> The underlying problem of RCU and memory usage could be solved more
>> safely by making sure that the sleeping memory allocator path always
>> waits until at least one RCU cleanup has occurred after it fails an
>> allocation before it starts trying harder. That ought to also naturally
>> throttle memory consumers more in the situation which is the right
>> behaviour.
> 
> A quick look at rt_garbage_collect() leads me to believe that although
> the IP route cache does try to limit its use of memory, it does not
> fully account for memory that it has released to RCU, but that RCU has
> not yet freed due to a grace period not having elapsed.
> 
> The following appears to be possible:
> 
> 1.	rt_garbage_collect() sees that there are too many entries,
> 	and sets "goal" to the number to free up, based on a
> 	computed "equilibrium" value.
> 
> 2.	The number of entries is (correctly) decremented only when
> 	the corresponding RCU callback is invoked, which actually
> 	frees the entry.
> 
> 3.	Between the time that rt_garbage_collect() is invoked the
> 	first time and when the RCU grace period ends, rt_garbage_collect()
> 	is invoked again.  It still sees too many entries (since
> 	RCU has not yet freed the ones released by the earlier
> 	invocation in step (1) above), so frees a bunch more.
> 
> 4.	Packets routed now miss the route cache, because the corresponding
> 	entries are waiting for a grace period, slowing the system down.
> 	Therefore, even more entries are freed to make room for new
> 	entries corresponding to the new packets.
> 
> If my (likely quite naive) reading of the IP route cache code is correct,
> it would be possible to end up in a steady state with most of the entries
> always being in RCU rather than in the route cache.
> 
> Eric, could this be what is happening to your system?
> 
> If it is, one straightforward fix would be to keep a count of the number
> of route-cache entries waiting on RCU, and for rt_garbage_collect()
> to subtract this number of entries from its goal.  Does this make sense?
> 

Hi Paul

Thanks for reviewing route code :)

As I said, the problem comes from 'route flush cache', that is periodically 
done by rt_run_flush(), triggered by rt_flush_timer.

The 10% of LOWMEM ram that was used by route-cache entries are pushed into rcu 
queues (with call_rcu_bh()) and network continue to receive
packets from *many* sources that want their route-cache entry.


Eric


  reply	other threads:[~2006-01-06 17:23 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060105235845.967478000@sorel.sous-sol.org>
2006-01-05 21:47 ` [PATCH 0/6] -stable review Chris Wright
2006-01-06  0:45   ` [PATCH 1/6] drivers/net/sungem.c: gem_remove_one mustnt be __devexit Chris Wright
2006-01-06  0:45   ` [PATCH 2/6] ieee80211_crypt_tkip depends on NET_RADIO Chris Wright
2006-01-06  0:45   ` [PATCH 3/6] Insanity avoidance in /proc (CVE-2005-4605) Chris Wright
2006-01-06  0:45   ` [PATCH 4/6] sysctl: dont overflow the user-supplied buffer with 0 Chris Wright
2006-01-06  1:30     ` Linus Torvalds
2006-01-06  3:40       ` Chris Wright
2006-01-06 10:17       ` [PATCH, RFC] RCU : OOM avoidance and lower latency Eric Dumazet
2006-01-06 12:52         ` [PATCH, RFC] RCU : OOM avoidance and lower latency (Version 2), HOTPLUG_CPU fix Eric Dumazet
2006-01-06 12:58         ` [PATCH, RFC] RCU : OOM avoidance and lower latency Andi Kleen
2006-01-06 13:09           ` Eric Dumazet
2006-01-06 19:26           ` Lee Revell
2006-01-06 22:18             ` Andi Kleen
2006-01-06 13:37         ` Alan Cox
2006-01-06 14:00           ` Eric Dumazet
2006-01-06 14:45             ` Alan Cox
2006-01-06 16:47           ` Paul E. McKenney
2006-01-06 17:19             ` Eric Dumazet [this message]
2006-01-06 20:26               ` Paul E. McKenney
2006-01-06 20:33                 ` David S. Miller
2006-01-06 20:57                 ` Andi Kleen
2006-01-07  0:17                   ` David S. Miller
2006-01-07  1:09                     ` Andi Kleen
2006-01-07  7:10                       ` David S. Miller
2006-01-07  7:34                       ` Eric Dumazet
2006-01-07  7:44                         ` David S. Miller
2006-01-07  7:53                           ` Eric Dumazet
2006-01-07  8:36                             ` David S. Miller
2006-01-07 20:30                               ` Paul E. McKenney
2006-01-07  8:30                     ` Eric Dumazet
2006-01-06 19:24         ` Lee Revell
2006-01-06  0:46   ` [PATCH 5/6] UFS: inode->i_sem is not released in error path Chris Wright
2006-01-06  0:46   ` [PATCH 6/6] [ATYFB]: Fix onboard video on SPARC Blade 100 for 2.6.{13,14,15} Chris Wright
2006-01-06  0:53   ` [PATCH 0/6] -stable review Chris Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43BEA693.5010509@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@davemloft.net \
    --cc=dipankar@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@us.ibm.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).