[RFCv7 PATCH 00/10] sched: scheduler-driven CPU frequency selection

* [RFCv7 PATCH 00/10] sched: scheduler-driven CPU frequency selection
@ 2016-02-23  1:22 Steve Muckle
  2016-02-23  1:22 ` [RFCv7 PATCH 01/10] sched: Compute cpu capacity available at current frequency Steve Muckle
                   ` (11 more replies)
  0 siblings, 12 replies; 51+ messages in thread
From: Steve Muckle @ 2016-02-23  1:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

Scheduler-driven CPU frequency selection hopes to exploit both
per-task and global information in the scheduler to improve frequency
selection policy and achieve lower power consumption, improved
responsiveness/performance, and less reliance on heuristics and
tunables. For further discussion of this integration see [0].

This patch series implements a cpufreq governor which collects CPU
capacity requests from the fair, realtime, and deadline scheduling
classes. The fair and realtime scheduling classes are modified to make
these requests. The deadline class is not yet modified to make CPU
capacity requests.

Changes in this series since RFCv6 [1], posted December 9, 2015:
  Patch 3, sched: scheduler-driven cpu frequency selection
  - Added Kconfig dependency on IRQ_WORK.
  - Reworked locking.
  - Make throttling optional - it is not required in order to ensure that
    the previous frequency transition is complete.
  - Some fixes in cpufreq_sched_thread related to the task state.
  - Changes to support mixed fast and slow path operation.
  Patch 7: sched/fair: jump to max OPP when crossing UP threshold
  - move sched_freq_tick() call so rq lock is still held
  Patch 9: sched/deadline: split rt_avg in 2 distincts metrics
  - RFCv6 calculated DL capacity from DL task parameters, RFCv7 restores
    the original method of calculation but keeps DL capacity separate
  Patch 10: sched: rt scheduler sets capacity requirement
  - change #ifdef from CONFIG_SMP, trivial cleanup

Profiling results:
Performance profiling has been done by using rt-app [2] to generate
various periodic workloads with a particular duty cycle. The time to
complete the busy portion of the duty cycle is measured and overhead
is calculated as

overhead = (busy_duration_test_gov - busy_duration_perf_gov)/
         (busy_duration_pwrsave_gov - busy_duration_perf_gov)

This shows as a percentage how close the governor is to running the
workload at fmin (100%) or fmax (0%). The number of times the busy
duration exceeds the period of the periodic workload (an "overrun") is
also recorded. In the table below the performance of the ondemand
(sampling_rate = 20ms), interactive (default tunables), and
scheduler-driven governors are evaluated using these metrics. The test
platform is a Samsung Chromebook 2 ("Peach Pi"). The workload is
affined to CPU0, an A15 with an fmin of 200MHz and an fmax of
1.8GHz. The interactive governor was incorporated/adapted from [3]. A
branch with the interactive governor and a few required dependency
patches for ARM is available at [4].

More detailed explanation of the columns below:
run: duration at fmax of the busy portion of the periodic workload in msec
period: duration of the entire period of the periodic workload in msec
loops: number of iterations of the periodic workload tested
OR: number of instances of overrun as described above
OH: overhead as calculated above

SCHED_OTHER workload:
 wload parameters	  ondemand        interactive     sched	
run	period	loops	OR	OH	OR	OH	OR	OH
1	100	100	0	62.07%	0	100.02%	0	78.49%
10	1000	10	0	21.80%	0	22.74%	0	72.56%
1	10	1000	0	21.72%	0	63.08%	0	52.40%
10	100	100	0	8.09%	0	15.53%	0	17.33%
100	1000	10	0	1.83%	0	1.77%	0	0.29%
6	33	300	0	15.32%	0	8.60%	0	17.34%
66	333	30	0	0.79%	0	3.18%	0	12.26%
4	10	1000	0	5.87%	0	10.21%	0	6.15%
40	100	100	0	0.41%	0	0.04%	0	2.68%
400	1000	10	0	0.42%	0	0.50%	0	1.22%
5	9	1000	2	3.82%	1	6.10%	0	2.51%
50	90	100	0	0.19%	0	0.05%	0	1.71%
500	900	10	0	0.37%	0	0.38%	0	1.82%
9	12	1000	6	1.79%	1	0.77%	0	0.26%
90	120	100	0	0.16%	1	0.05%	0	0.49%
900	1200	10	0	0.09%	0	0.26%	0	0.62%

SCHED_FIFO workload:
 wload parameters	  ondemand        interactive     sched	
run	period	loops	OR	OH	OR	OH	OR	OH
1	100	100	0	39.61%	0	100.49%	0	99.57%
10	1000	10	0	73.51%	0	21.09%	0	96.66%
1	10	1000	0	18.01%	0	61.46%	0	67.68%
10	100	100	0	31.31%	0	18.62%	0	77.01%
100	1000	10	0	58.80%	0	1.90%	0	15.40%
6	33	300	251	85.99%	0	9.20%	1	30.09%
66	333	30	24	84.03%	0	3.38%	0	33.23%
4	10	1000	0	6.23%	0	12.21%	10	11.54%
40	100	100	100	62.08%	0	0.11%	1	11.85%
400	1000	10	10	62.09%	0	0.51%	0	7.00%
5	9	1000	999	12.29%	1	6.03%	0	0.04%
50	90	100	99	61.47%	0	0.05%	2	6.53%
500	900	10	10	43.37%	0	0.39%	0	6.30%
9	12	1000	999	9.83%	0	0.01%	14	1.69%
90	120	100	99	61.47%	0	0.01%	28	2.29%
900	1200	10	10	43.31%	0	0.22%	0	2.15%

Note that at this point RT CPU capacity is measured via rt_avg. For
the above results sched_time_avg_ms has been set to 50ms.

Known issues:
 - More testing with real world type workloads, such as UI workloads and
   benchmarks, is required.
 - The power side of the characterization is in progress.
 - Deadline scheduling class does not yet make CPU capacity requests.
 - Not sure what's going on yet with the ondemand numbers above, it seems like
   there may a regression with ondemand and RT tasks.

Dependencies:
Frequency invariant load tracking is required. For heterogeneous
systems such as big.Little, CPU invariant load tracking is required as
well. The required support for ARM platforms along with a patch
creating tracepoints for cpufreq_sched is located in [5].

References:
[0] http://article.gmane.org/gmane.linux.kernel/1499836
[1] http://thread.gmane.org/gmane.linux.power-management.general/69176
[2] https://git.linaro.org/power/rt-app.git
[3] https://lkml.org/lkml/2015/10/28/782
[4] https://git.linaro.org/people/steve.muckle/kernel.git/shortlog/refs/heads/interactive
[5] https://git.linaro.org/people/steve.muckle/kernel.git/shortlog/refs/heads/sched-freq-rfcv7

Juri Lelli (3):
  sched/fair: add triggers for OPP change requests
  sched/{core,fair}: trigger OPP change request on fork()
  sched/fair: cpufreq_sched triggers for load balancing

Michael Turquette (2):
  cpufreq: introduce cpufreq_driver_is_slow
  sched: scheduler-driven cpu frequency selection

Morten Rasmussen (1):
  sched: Compute cpu capacity available at current frequency

Steve Muckle (1):
  sched/fair: jump to max OPP when crossing UP threshold

Vincent Guittot (3):
  sched: remove call of sched_avg_update from sched_rt_avg_update
  sched/deadline: split rt_avg in 2 distincts metrics
  sched: rt scheduler sets capacity requirement

 drivers/cpufreq/Kconfig      |  21 ++
 drivers/cpufreq/cpufreq.c    |   6 +
 include/linux/cpufreq.h      |  12 ++
 include/linux/sched.h        |   8 +
 kernel/sched/Makefile        |   1 +
 kernel/sched/core.c          |  43 +++-
 kernel/sched/cpufreq_sched.c | 459 +++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/deadline.c      |   2 +-
 kernel/sched/fair.c          | 108 +++++-----
 kernel/sched/rt.c            |  48 ++++-
 kernel/sched/sched.h         | 120 ++++++++++-
 11 files changed, 777 insertions(+), 51 deletions(-)
 create mode 100644 kernel/sched/cpufreq_sched.c

-- 
2.4.10

^ permalink raw reply	[flat|nested] 51+ messages in thread