[PATCH V2 0/4] sched: add new 'book' scheduling domain

* [PATCH V2 0/4] sched: add new 'book' scheduling domain
@ 2010-08-31  8:28 Heiko Carstens
  2010-08-31  8:28 ` [PATCH V2 1/4] sched: merge cpu_to_core_group functions Heiko Carstens
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Heiko Carstens @ 2010-08-31  8:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Mike Galbraith, Suresh Siddha, Andreas Herrmann, linux-kernel,
	Martin Schwidefsky, Gautham R Shenoy

This patch set adds (yet) another scheduling domain to the scheduler. The
reason for this is that the recent (s390) z196 architecture has four cache
levels and uniform memory access (sort of -- see below).
The cpu/cache/memory hierarchy is as follows:

Each cpu has its private L1 (64KB I-cache + 128KB D-cache) and L2 (1.5MB)
cache.
A core consists of four cpus with a 24MB shared L3 cache.
A book consists of six cores with a 192MB shared L4 cache.

The z196 architecture has no SMT.
Also the statement that we have uniform memory access is not entirely
correct. Actually the machine uses memory striping, so it "looks" like
we have UMA until the next slice of memory gets accessed.
However there is no interface which tells us which piece of memory is local
or remote. So we (have to) simplify and assume that the cost of each memory
access with L4 cache miss is the same.

In order to somehow use the information about the cache hierarchy so that
the scheduler can make some decisions that improves cache hits I added the
'BOOK' scheduling domain between the MC and CPU domains.

Also please note that the s390 arch scheduling domain initializers need
tuning:
The line
#define SD_BOOK_INIT SD_CPU_INIT
within the arch support patch is just there so it compiles and until we have
something that really works.

Changes since V1:

Removed powersavings sysfs knob for the new scheduling domain since Peter
objected to it ;)
Actually adding a third sysfs powersavings knob would increase the config
space to 27 possible settings. That's simply too much and indeed no admin
would care about fine tuning that.
What is needed is a single knob which configures the scheduler to do the
'right thing'.
It's up to the powersavings guys to come up with a viable solution here ;)

^ permalink raw reply	[flat|nested] 10+ messages in thread