archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <>
To: William Lee Irwin III <>
Cc: Dave Hansen <>,
	"Martin J. Bligh" <>,,
Subject: Re: [PATCH] per-zone kswapd process
Date: Thu, 12 Sep 2002 22:46:49 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

William Lee Irwin III wrote:
> On Thu, Sep 12, 2002 at 09:06:20PM -0700, Andrew Morton wrote:
> > I still don't see why it's per zone and not per node.  It seems strange
> > that a wee little laptop would be running two kswapds?
> > kswapd can get a ton of work done in the development VM and one per
> > node would, I expect, suffice?
> Machines without observable NUMA effects can benefit from it if it's
> per-zone. It also follows that if there's more than one task doing this,
> page replacement is less likely to block entirely. Last, but not least,
> when I devised it, "per-zone" was the theme.

Maybe, marginally.  You could pass a gfp mask to sys_kswapd to select
the zones if that's really a benefit.  But if this _is_ a benefit then
it's a VM bug.  

Because if a single kswapd cannot service three zones then it cannot
service one zone. (Maybe.  We need to do per-zone throttling soon to
fix your OOM problems properly, but then, that shouldn't throttle

> On Thu, Sep 12, 2002 at 09:06:20PM -0700, Andrew Morton wrote:
> > Also, I'm wondering why the individual kernel threads don't have
> > their affinity masks set to make them run on the CPUs to which the
> > zone (or zones) are local?
> > Isn't it the case that with this code you could end up with a kswapd
> > on node 0 crunching on node 1's pages while a kswapd on node 1 crunches
> > on node 0's pages?
> Without some architecture-neutral method of topology detection, there's
> no way to do this. A follow-up when it's there should fix it.

Sorry, I don't buy that.

a) It does not need to be architecture neutral.  

b) You surely need a way of communicating the discovered topology
   to userspace anyway.

c) $EDITOR /etc/numa-layouf.conf

d) $EDITOR /etc/kswapd.conf
> On Thu, Sep 12, 2002 at 09:06:20PM -0700, Andrew Morton wrote:
> > If I'm not totally out to lunch on this, I'd have thought that a
> > better approach would be
> >       int sys_kswapd(int nid)
> >       {
> >               return kernel_thread(kswapd, ...);
> >       }
> > Userspace could then set up the CPU affinity based on some topology
> > or config information and would then parent a kswapd instance.  That
> > kswapd instance would then be bound to the CPUs which were on the
> > node identified by `nid'.
> > Or something like that?
> I'm very very scared of handing things like that to userspace, largely
> because I don't trust userspace at all.

Me either.  I've seen workloads in which userspace consumes
over 50% of the CPU resources.  It should be banned!

> At this point, we need to enumerate nodes and provide a cpu to node
> correspondence to userspace, and the kernel can obey, aside from the
> question of "What do we do if we need to scan a node without a kswapd
> started yet?".

kswapd is completely optional.  Put a `do_exit(0)' into the current
one and watch.   You'll get crappy dbench numbers, but it stays up.

> I think mbligh recently got the long-needed arch code in
> for cpu to node... But I'm just not able to make the leap of faith that
> memory detection is something that can ever comfortably be given to
> userspace.

A simple syscall which alows you to launch a kswapd instance against
a group of zones on any group of CPUs provides complete generality 
and flexibility to userspace.  And it is architecture neutral.

If it really is incredibly hard to divine the topology from userspace
then you need to fix that up.  Provide the topology to userspace.
Which has the added benefit of providing, umm, the topology to userspace ;)

  parent reply	other threads:[~2002-09-13  5:26 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-13  3:33 Dave Hansen
2002-09-13  4:06 ` Andrew Morton
2002-09-13  4:59   ` William Lee Irwin III
2002-09-13  5:10     ` Martin J. Bligh
     [not found]       ` <>
     [not found]         ` <>
     [not found]           ` <>
     [not found]             ` <>
2002-09-13 22:52               ` [PATCH] per-zone^Wnode " Dave Hansen
2002-09-13 23:24                 ` Matthew Dobson
2002-09-13 23:29                 ` Matthew Dobson
2002-09-13 23:46                 ` William Lee Irwin III
2002-09-14  0:02                   ` Andrew Morton
2002-09-14  0:12                     ` William Lee Irwin III
2002-09-14  1:19                       ` Andrew Morton
2002-09-13  5:46     ` Andrew Morton [this message]
2002-09-13  5:38       ` [PATCH] per-zone " Martin J. Bligh
2002-09-13  6:03         ` Andrew Morton
2002-09-13 13:05     ` Alan Cox
2002-09-13 21:30       ` William Lee Irwin III
2002-09-18 16:07         ` [PATCH] recognize MAP_LOCKED in mmap() call Hubertus Franke
2002-09-18 16:29           ` Andrew Morton
2002-09-16  5:44     ` [PATCH] per-zone kswapd process Daniel Phillips
2002-09-16  7:46       ` William Lee Irwin III
2002-09-16 15:12         ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \
    --subject='Re: [PATCH] per-zone kswapd process' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).