linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Comparison of power-efficient scheduling patch sets
@ 2013-05-30 13:47 Morten Rasmussen
  2013-05-31  1:17 ` Alex Shi
  2013-05-31 10:52 ` power-efficient scheduling design Ingo Molnar
  0 siblings, 2 replies; 58+ messages in thread
From: Morten Rasmussen @ 2013-05-30 13:47 UTC (permalink / raw)
  To: alex.shi, peterz, mingo, preeti, vincent.guittot, efault, pjt,
	linux-kernel, linaro-kernel
  Cc: arjan, len.brown, corbet, tglx

Hi,

A number of patch sets related to power-efficient scheduling have been
posted over the last couple of months. Most of them do not have much
data to back them up, so I decided to do some testing.

Common for all of the patch sets that I have tested, except one, is that
they attempt to pack tasks on as few cpus as possible to allow the
remaining cpus to enter deeper sleep states - a strategy that should
make sense on most platforms that support per-cpu power gating and
multi-socket machines.

Kernel: 3.9

Patch sets:
rlb-v4: sched: use runnable load based balance (Alex Shi)
        <https://lkml.org/lkml/2013/4/27/13>
pas-v7: sched: power aware scheduling (Alex Shi)
        <https://lkml.org/lkml/2013/4/3/732>
pst-v3: sched: packing small tasks (Vincent Guittot)
        <https://lkml.org/lkml/2013/3/22/183>
pst-v4: sched: packing small tasks (Vincent Guittot)
        <https://lkml.org/lkml/2013/4/25/396>

Configuration:
pas-v7: Set to "powersaving" mode.
pst-v4: Set to "Full" packing mode.

Platform:
ARM TC2 (test-chip), 2xCortex-A15 + 3xCortex-A7. Cortex-A15s disabled.

Measurement technique:
Time spent non-idle (not in idle state) for each cpu based on cpuidle
ftrace events. TC2 does not have per-core power-gating, so packing
inside the A7 cluster does not lead to any significant power savings.
Note that any product grade hardware (TC2 is a test-chip) will very
likely have per-core power-gating, so in those cases packing will have
an appreciable effect on power savings.
Measuring non-idle time rather than power should give a more clear idea
about the effect of the patch sets given that the idle back-end is
highly implementation specific.

Benchmarks:
audio playback (Android): 30s mp3 file playback on Android.
bbench+audio (Android): Web page rendering while doing mp3 playback.
andebench_native (Android): Android benchmark running in native mode.
cyclictest: Short periodic tasks.

Results:
Two runs for each patch set.

audio playback (Android) SMP
non-idle %  cpu 0  cpu 1  cpu 2
3.9_1       11.96   2.86   2.48
3.9_2       12.64   2.81   1.88
rlb-v4_1    12.61   2.44   1.90
rlb-v4_2    12.45   2.44   1.90
pas-v7_1    16.17   0.03   0.24
pas-v7_2    16.08   0.28   0.07
pst-v3_1    15.18   2.76   1.70
pst-v3_2    15.13   0.80   0.38
pst-v4_1    16.14   0.05   0.00
pst-v4_2    16.34   0.06   0.00

bbench+audio (Android) SMP
non-idle %  cpu 0  cpu 1  cpu 2  render time
3.9_1       25.00  20.73  21.22   812
3.9_2       24.29  19.78  22.34   795
rlb-v4_1    23.84  19.36  22.74   782
rlb-v4_2    24.07  19.36  22.74   797
pas-v7_1    28.29  17.86  16.01   869
pas-v7_2    28.62  18.54  15.05   908
pst-v3_1    29.14  20.59  21.72   830
pst-v3_2    27.69  18.81  20.06   830
pst-v4_1    42.20  13.63   2.29   880
pst-v4_2    41.56  14.40   2.17   935

andebench_native (8 threads) (Android) SMP
non-idle %  cpu 0  cpu 1  cpu 2  Score
3.9_1       99.22  98.88  99.61   4139
3.9_2       99.56  99.31  99.46   4148
rlb-v4_1    99.49  99.61  99.53   4153
rlb-v4_2    99.56  99.61  99.53   4149
pas-v7_1    99.53  99.59  99.29   4149
pas-v7_2    99.42  99.63  99.48   4150
pst-v3_1    97.89  99.33  99.42   4097
pst-v3_2    99.16  99.62  99.42   4097
pst-v4_1    99.34  99.01  99.59   4146
pst-v4_2    99.49  99.52  99.20   4146

cyclictest SMP
non-idle %  cpu 0  cpu 1  cpu 2
3.9_1        9.13   8.88   8.41
3.9_2       10.27   8.02   6.30
rlb-v4_1     8.88   8.09   8.11
rlb-v4_2     8.49   8.09   8.11
pas-v7_1    10.20   0.02  11.50
pas-v7_2     7.86  14.31   0.02
pst-v3_1    20.44   8.68   7.97
pst-v3_2    20.41   0.78   1.00
pst-v4_1    21.32   0.21   0.05
pst-v4_2    21.56   0.21   0.04

Overall, pas-v7 seems to do a fairly good job at packing. The idle time
distribution seems to be somewhere between pst-v3 and the more
aggressive pst-v4 for all the benchmarks. pst-v4 manages to keep two
cpus nearly idle (<0.25% non-idle) for both cyclictest and audio, which
is better than both pst-v3 and pas-v7. pas-v7 fails to pack cyclictest.
Packing does come at at cost which can be seen for bbench+audio, where
pst-v3 and rlb-v4 get better render times than pas-v7 and pst-v4 which
do more aggressive packing. rlb-v4 does not pack, it is only included
for reference.

>From a packing perspective pst-v4 seems to do the best job for the
workloads that I have tested on ARM TC2. The less aggressive packing in
pst-v3 may be a better choice for in terms of performance.

I'm well aware that these tests are heavily focused on mobile workloads.
I would therefore encourage people to share your test results for your
workloads on your platforms to complete the picture. Comments are also
welcome.

Thanks,
Morten



^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2013-06-24 23:10 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-30 13:47 [RFC] Comparison of power-efficient scheduling patch sets Morten Rasmussen
2013-05-31  1:17 ` Alex Shi
2013-05-31  8:23   ` Alex Shi
2013-05-31 10:52 ` power-efficient scheduling design Ingo Molnar
2013-06-03 14:59   ` Arjan van de Ven
2013-06-03 15:43     ` Ingo Molnar
2013-06-04 15:03   ` Morten Rasmussen
2013-06-07  6:26     ` Preeti U Murthy
2013-06-20 15:23     ` Ingo Molnar
2013-06-05  9:56   ` Amit Kucheria
2013-06-07  6:03   ` Preeti U Murthy
2013-06-07 14:51     ` Catalin Marinas
2013-06-07 18:08       ` Preeti U Murthy
2013-06-07 17:36         ` David Lang
2013-06-09  4:33           ` Preeti U Murthy
2013-06-08 11:28         ` Catalin Marinas
2013-06-08 14:02           ` Rafael J. Wysocki
2013-06-09  3:42             ` Preeti U Murthy
2013-06-09 22:53               ` Catalin Marinas
2013-06-10 16:25               ` Daniel Lezcano
2013-06-12  0:27                 ` David Lang
2013-06-12  1:48                   ` Arjan van de Ven
2013-06-12  9:48                     ` Amit Kucheria
2013-06-12 16:22                       ` David Lang
2013-06-12 10:20                     ` Catalin Marinas
2013-06-12 15:24                       ` Arjan van de Ven
2013-06-12 17:04                         ` Catalin Marinas
2013-06-12  9:50                   ` Daniel Lezcano
2013-06-12 16:30                     ` David Lang
2013-06-11  0:50               ` Rafael J. Wysocki
2013-06-13  4:32                 ` Preeti U Murthy
2013-06-09  4:23           ` Preeti U Murthy
2013-06-07 15:23     ` Arjan van de Ven
2013-06-14 16:05   ` Morten Rasmussen
2013-06-17 11:23     ` Catalin Marinas
2013-06-18  1:37     ` David Lang
2013-06-18 10:23       ` Morten Rasmussen
2013-06-18 17:39         ` David Lang
2013-06-19 12:39           ` Morten Rasmussen
2013-06-18 15:20     ` Arjan van de Ven
2013-06-18 17:47       ` David Lang
2013-06-18 19:36         ` Arjan van de Ven
2013-06-19 15:39         ` Arjan van de Ven
2013-06-19 17:00           ` Morten Rasmussen
2013-06-19 17:08             ` Arjan van de Ven
2013-06-21  8:50               ` Morten Rasmussen
2013-06-21 15:29                 ` Arjan van de Ven
2013-06-21 15:38                 ` Arjan van de Ven
2013-06-21 21:23                   ` Catalin Marinas
2013-06-21 21:34                     ` Arjan van de Ven
2013-06-23 23:32                       ` Benjamin Herrenschmidt
2013-06-24 10:07                         ` Catalin Marinas
2013-06-24 15:26                         ` Arjan van de Ven
2013-06-24 21:59                           ` Benjamin Herrenschmidt
2013-06-24 23:10                             ` Arjan van de Ven
2013-06-18 19:06       ` Catalin Marinas
2013-06-21 15:06       ` Morten Rasmussen
2013-06-23 10:55         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).