linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Linus Torvalds <torvalds@osdl.org>,
	linux-kernel@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Dipankar Sarma <dipankar@in.ibm.com>,
	Manfred Spraul <manfred@colorfullife.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
Date: Fri, 6 Jan 2006 12:26:26 -0800	[thread overview]
Message-ID: <20060106202626.GA5677@us.ibm.com> (raw)
In-Reply-To: <43BEA693.5010509@cosmosbay.com>

On Fri, Jan 06, 2006 at 06:19:15PM +0100, Eric Dumazet wrote:
> Paul E. McKenney a écrit :
> >On Fri, Jan 06, 2006 at 01:37:12PM +0000, Alan Cox wrote:
> >>On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
> >>>I assume that if a CPU queued 10.000 items in its RCU queue, then the 
> >>>oldest entry cannot still be in use by another CPU. This might sounds as 
> >>>a violation of RCU rules, (I'm not an RCU expert) but seems quite 
> >>>reasonable.
> >>Fixing the real problem in the routing code would be the real fix. 
> >>
> >>The underlying problem of RCU and memory usage could be solved more
> >>safely by making sure that the sleeping memory allocator path always
> >>waits until at least one RCU cleanup has occurred after it fails an
> >>allocation before it starts trying harder. That ought to also naturally
> >>throttle memory consumers more in the situation which is the right
> >>behaviour.
> >
> >A quick look at rt_garbage_collect() leads me to believe that although
> >the IP route cache does try to limit its use of memory, it does not
> >fully account for memory that it has released to RCU, but that RCU has
> >not yet freed due to a grace period not having elapsed.
> >
> >The following appears to be possible:
> >
> >1.	rt_garbage_collect() sees that there are too many entries,
> >	and sets "goal" to the number to free up, based on a
> >	computed "equilibrium" value.
> >
> >2.	The number of entries is (correctly) decremented only when
> >	the corresponding RCU callback is invoked, which actually
> >	frees the entry.
> >
> >3.	Between the time that rt_garbage_collect() is invoked the
> >	first time and when the RCU grace period ends, rt_garbage_collect()
> >	is invoked again.  It still sees too many entries (since
> >	RCU has not yet freed the ones released by the earlier
> >	invocation in step (1) above), so frees a bunch more.
> >
> >4.	Packets routed now miss the route cache, because the corresponding
> >	entries are waiting for a grace period, slowing the system down.
> >	Therefore, even more entries are freed to make room for new
> >	entries corresponding to the new packets.
> >
> >If my (likely quite naive) reading of the IP route cache code is correct,
> >it would be possible to end up in a steady state with most of the entries
> >always being in RCU rather than in the route cache.
> >
> >Eric, could this be what is happening to your system?
> >
> >If it is, one straightforward fix would be to keep a count of the number
> >of route-cache entries waiting on RCU, and for rt_garbage_collect()
> >to subtract this number of entries from its goal.  Does this make sense?
> >
> 
> Hi Paul
> 
> Thanks for reviewing route code :)
> 
> As I said, the problem comes from 'route flush cache', that is periodically 
> done by rt_run_flush(), triggered by rt_flush_timer.
> 
> The 10% of LOWMEM ram that was used by route-cache entries are pushed into 
> rcu queues (with call_rcu_bh()) and network continue to receive
> packets from *many* sources that want their route-cache entry.

Hello, Eric,

The rt_run_flush() function could indeed be suffering from the same
problem.  Dipankar's recent patch should help RCU grace periods proceed
more quickly, does that help?

If not, it may be worthwhile to limit the number of times that
rt_run_flush() runs per RCU grace period.

						Thanx, Paul

  reply	other threads:[~2006-01-06 20:25 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060105235845.967478000@sorel.sous-sol.org>
2006-01-05 21:47 ` [PATCH 0/6] -stable review Chris Wright
2006-01-06  0:45   ` [PATCH 1/6] drivers/net/sungem.c: gem_remove_one mustnt be __devexit Chris Wright
2006-01-06  0:45   ` [PATCH 2/6] ieee80211_crypt_tkip depends on NET_RADIO Chris Wright
2006-01-06  0:45   ` [PATCH 3/6] Insanity avoidance in /proc (CVE-2005-4605) Chris Wright
2006-01-06  0:45   ` [PATCH 4/6] sysctl: dont overflow the user-supplied buffer with 0 Chris Wright
2006-01-06  1:30     ` Linus Torvalds
2006-01-06  3:40       ` Chris Wright
2006-01-06 10:17       ` [PATCH, RFC] RCU : OOM avoidance and lower latency Eric Dumazet
2006-01-06 12:52         ` [PATCH, RFC] RCU : OOM avoidance and lower latency (Version 2), HOTPLUG_CPU fix Eric Dumazet
2006-01-06 12:58         ` [PATCH, RFC] RCU : OOM avoidance and lower latency Andi Kleen
2006-01-06 13:09           ` Eric Dumazet
2006-01-06 19:26           ` Lee Revell
2006-01-06 22:18             ` Andi Kleen
2006-01-06 13:37         ` Alan Cox
2006-01-06 14:00           ` Eric Dumazet
2006-01-06 14:45             ` Alan Cox
2006-01-06 16:47           ` Paul E. McKenney
2006-01-06 17:19             ` Eric Dumazet
2006-01-06 20:26               ` Paul E. McKenney [this message]
2006-01-06 20:33                 ` David S. Miller
2006-01-06 20:57                 ` Andi Kleen
2006-01-07  0:17                   ` David S. Miller
2006-01-07  1:09                     ` Andi Kleen
2006-01-07  7:10                       ` David S. Miller
2006-01-07  7:34                       ` Eric Dumazet
2006-01-07  7:44                         ` David S. Miller
2006-01-07  7:53                           ` Eric Dumazet
2006-01-07  8:36                             ` David S. Miller
2006-01-07 20:30                               ` Paul E. McKenney
2006-01-07  8:30                     ` Eric Dumazet
2006-01-06 19:24         ` Lee Revell
2006-01-06  0:46   ` [PATCH 5/6] UFS: inode->i_sem is not released in error path Chris Wright
2006-01-06  0:46   ` [PATCH 6/6] [ATYFB]: Fix onboard video on SPARC Blade 100 for 2.6.{13,14,15} Chris Wright
2006-01-06  0:53   ` [PATCH 0/6] -stable review Chris Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060106202626.GA5677@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=dipankar@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).