linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/13] Multiprocessor CPU scheduler patches
@ 2005-02-24  7:14 Nick Piggin
  2005-02-24  7:16 ` [PATCH 1/13] timestamp fixes Nick Piggin
  0 siblings, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2005-02-24  7:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Hi,

I hope that you can include the following set of CPU scheduler
patches in -mm soon, if you have no other significant performance
work going on.

There are some fairly significant changes, with a few basic aims:
* Improve SMT behaviour
* Improve CMP behaviour, CMP/NUMA scheduling (ie. Opteron)
* Reduce task movement, esp over NUMA nodes.

They are not going to be very well tuned for most usages at the
moment (unfortunately dbt2/3-pgsql on OSDL isn't working, which
is a good one). So hopefully I can address regressions as they
come up.

There are a few problems with the scheduler currently:

Problem #1:
It has _very_ aggressive idle CPU pulling. Not only does it not
really obey imbalances, it is also wrong for eg. an SMT CPU
who's sibling is not idle. The reason this was done really is to
bring down idle time on some workloads (dbt2-pgsql, other
database stuff).

So I address this in the following ways; reduce special casing
for idle balancing, revert some of the recent moves toward even
more aggressive balancing.

Then provide a range of averaging levels for CPU "load averages",
and we choose which to use in which situation on a sched-domain
basis. This allows idle balancing to use a more instantaneous value
for calculating load, so idle CPUs need not wait many timer ticks
for the load averages to catch up. This can hopefully solve our
idle time problems.

Also, further moderate "affine wakeups", which can tend to move
most tasks to one CPU on some workloads and cause idle problems.

Problem #2:
The second problem is that balance-on-exec is not sched-domains
aware. This means it will tend to (for example) fill up two cores
of a CPU on one socket, then fill up two cores on the next socket,
etc. What we want is to try to spread load evenly across memory
controllers.

So make that sched-domains aware following the same pattern as
find_busiest_group / find_busiest_queue.

Problem #3:
Lastly, implement balance-on-fork/clone again. I have come to the
realisation that for NUMA, this is probably the best solution.
Run-cloned-child-last has run out of steam on CMP systems. What
it was supposed to do was provide a period where the child could
be pulled to another CPU before it starts running and allocating
memory. Unfortunately on CMP systems, this tends to just be to the
other sibling.

Also, having such a difference between thread and process creation
was not really ideal, so we balance on all types of fork/clone.
This really helps some things (like STREAM) on CMP Opterons, but
also hurts others, so naturally it is settable per-domain.

Problem #4:
Sched domains isn't very useful to me in its current form. Bring
it up to date with what I've been using. I don't think anyone other
than myself uses it so that should be OK.

Nick




^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2005-03-08 20:27 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-24  7:14 [PATCH 0/13] Multiprocessor CPU scheduler patches Nick Piggin
2005-02-24  7:16 ` [PATCH 1/13] timestamp fixes Nick Piggin
2005-02-24  7:16   ` [PATCH 2/13] improve pinned task handling Nick Piggin
2005-02-24  7:18     ` [PATCH 3/13] rework schedstats Nick Piggin
2005-02-24  7:19       ` [PATCH 4/13] find_busiest_group fixlets Nick Piggin
2005-02-24  7:20         ` [PATCH 5/13] find_busiest_group cleanup Nick Piggin
2005-02-24  7:21           ` [PATCH 6/13] no aggressive idle balancing Nick Piggin
2005-02-24  7:22             ` [PATCH 7/13] better active balancing heuristic Nick Piggin
2005-02-24  7:24               ` [PATCH 8/13] generalised CPU load averaging Nick Piggin
2005-02-24  7:25                 ` [PATCH 9/13] less affine wakups Nick Piggin
2005-02-24  7:27                   ` [PATCH 10/13] remove aggressive idle balancing Nick Piggin
2005-02-24  7:28                     ` [PATCH 11/13] sched-domains aware balance-on-fork Nick Piggin
2005-02-24  7:28                       ` [PATCH 12/13] schedstats additions for sched-balance-fork Nick Piggin
2005-02-24  7:30                         ` [PATCH 13/13] basic tuning Nick Piggin
2005-02-24  8:46                         ` [PATCH 12/13] schedstats additions for sched-balance-fork Ingo Molnar
2005-02-24 22:13                           ` Nick Piggin
2005-02-25 11:07                           ` Rick Lindsley
2005-02-25 11:21                             ` Nick Piggin
2005-02-24  8:41                     ` [PATCH 10/13] remove aggressive idle balancing Ingo Molnar
2005-02-24 12:13                       ` Nick Piggin
2005-02-24 12:16                         ` Ingo Molnar
2005-03-06  5:43                         ` Siddha, Suresh B
2005-03-07  5:34                           ` Nick Piggin
2005-03-07  8:04                             ` Siddha, Suresh B
2005-03-07  8:28                               ` Nick Piggin
2005-03-08  7:22                                 ` Siddha, Suresh B
2005-03-08  8:17                                   ` Nick Piggin
2005-03-08 19:36                                     ` Siddha, Suresh B
2005-02-24  8:39               ` [PATCH 7/13] better active balancing heuristic Ingo Molnar
2005-02-24  8:36         ` [PATCH 4/13] find_busiest_group fixlets Ingo Molnar
2005-02-24  8:07       ` [PATCH 3/13] rework schedstats Ingo Molnar
2005-02-25 10:50       ` Rick Lindsley
2005-02-25 11:10         ` Nick Piggin
2005-02-25 11:25           ` DHCP on multi homed host! Ravindra Nadgauda
2005-02-24  8:04     ` [PATCH 2/13] improve pinned task handling Ingo Molnar
2005-02-24  7:46   ` [PATCH 1/13] timestamp fixes Ingo Molnar
2005-02-24  7:56     ` Nick Piggin
2005-02-24  8:34       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).