linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin J. Bligh" <mbligh@aracnet.com>
To: Erich Focht <efocht@hpce.nec.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	LSE <lse-tech@lists.sourceforge.net>
Cc: Andi Kleen <ak@muc.de>, torvalds@osdl.org
Subject: Re: [patch] scheduler fix for 1cpu/node case
Date: Mon, 28 Jul 2003 12:55:54 -0700	[thread overview]
Message-ID: <3900670000.1059422153@[10.10.2.4]> (raw)
In-Reply-To: <200307280548.53976.efocht@gmx.net>

> after talking to several people at OLS about the current NUMA
> scheduler the conclusion was:
> (1) it sucks (for particular workloads),
> (2) on x86_64 (embarassingly simple NUMA) it's useless, goto (1).

I really feel there's no point in a NUMA scheduler for the Hammer
style architectures. A config option to turn it off would seem like
a simpler way to go, unless people can see some advantage of the 
full NUMA code? 

The interesting thing is probably whether we want balance on exec
or not ... but that probably applies to UMA SMP as well ...
 
> Fact is that the current separation of local and global balancing,
> where global balancing is done only in the timer interrupt at a fixed
> rate is way too unflexible. A CPU going idle inside a well balanced
> node will stay idle for a while even if there's a lot of work to
> do. Especially in the corner case of one CPU per node this is
> condemning that CPU to idleness for at least 5 ms. 

Surely it'll hit the idle local balancer and rebalance within the node? 
Or are you thinking of a case with 3 tasks on a 4 cpu/node system?

> So x86_64 platforms
> (but not only those!) suffer and whish to switch off the NUMA
> scheduler while keeping NUMA memory management on.

Right - I have a patch to make it a config option (CONFIG_NUMA_SCHED) 
... I'll feed that upstream this week.
 
> The attached patch is a simple solution which
> - solves the 1 CPU / node problem,
> - lets other systems behave (almost) as before,
> - opens the way to other optimisations like multi-level node
>   hierarchies (by tuning the retry rate)
> - simpifies the NUMA scheduler and deletes more lines of code than it
>   adds.

Looks simple, I'll test it out.

> The timer interrupt based global rebalancing might appear to be a
> simple and good idea but it takes the scheduler a lot of
> flexibility. In the patch the global rebalancing is done after a
> certain number of failed attempts to locally balance. The number of
> attempts is proportional to the number of CPUs in the current
> node.

Seems like a good plan.

M.


  reply	other threads:[~2003-07-28 19:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-28 19:16 [patch] scheduler fix for 1cpu/node case Erich Focht
2003-07-28 19:55 ` Martin J. Bligh [this message]
2003-07-28 20:18   ` Erich Focht
2003-07-28 20:37     ` Martin J. Bligh
2003-07-29  2:24       ` Andrew Theurer
2003-07-29 10:08         ` Erich Focht
2003-07-29 13:33           ` [Lse-tech] " Andrew Theurer
2003-07-30 15:23             ` Erich Focht
2003-07-30 15:44               ` Andrew Theurer
2003-07-29 14:27           ` Martin J. Bligh
2003-08-13 20:49         ` Bill Davidsen
2003-08-22 15:46           ` [Lse-tech] " Andrew Theurer
2003-08-22 22:56             ` Nick Piggin
2003-08-23  0:12               ` Andrew Theurer
2003-08-23  0:29                 ` Nick Piggin
2003-08-23  0:47                   ` William Lee Irwin III
2003-08-23  8:48                     ` Nick Piggin
2003-08-23 14:32                   ` Andrew Theurer
2003-08-23  1:31                 ` Martin J. Bligh
2003-07-29 10:08       ` Erich Focht
2003-07-29 14:41     ` Andi Kleen
2003-07-31 15:05 ` Martin J. Bligh
2003-07-31 21:45   ` Erich Focht
2003-08-01  0:26     ` Martin J. Bligh
2003-08-01 16:30       ` [Lse-tech] " Erich Focht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='3900670000.1059422153@[10.10.2.4]' \
    --to=mbligh@aracnet.com \
    --cc=ak@muc.de \
    --cc=efocht@hpce.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).