[RFC PATCH 0/6] Energy Aware Scheduling

* [RFC PATCH 0/6] Energy Aware Scheduling
@ 2018-03-20  9:43 Dietmar Eggemann
  2018-03-20  9:43 ` [RFC PATCH 1/6] sched/fair: Create util_fits_capacity() Dietmar Eggemann
                   ` (5 more replies)
  0 siblings, 6 replies; 55+ messages in thread
From: Dietmar Eggemann @ 2018-03-20  9:43 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra, Quentin Perret, Thara Gopinath
  Cc: linux-pm, Morten Rasmussen, Chris Redpath, Patrick Bellasi,
	Valentin Schneider, Rafael J . Wysocki, Greg Kroah-Hartman,
	Vincent Guittot, Viresh Kumar, Todd Kjos, Joel Fernandes

1. Overview

The Energy Aware Scheduler (EAS) based on Morten Rasmussen's posting on
LKML [1] is currently part of the AOSP Common Kernel and runs on
today's smartphones with Arm's big.LITTLE CPUs.
Based on the experience gained over the last two and a half years in
product development, we propose an energy model based task placement
for CPUs with asymmetric core capacities (e.g. Arm big.LITTLE or
DynamIQ), to align with the EAS adopted by the AOSP Common Kernel. We
have developed a simplified energy model, based on the physical
active power/performance curve of each core type using existing SoC
power/performance data already known to the kernel. The energy model
is used to select the most energy-efficient CPU to place each task,
taking utilization into account.

1.1 Energy Model

A CPU with asymmetric core capacities features cores with significantly
different energy and performance characteristics. As the configurations
can vary greatly from one SoC to another, designing an energy-efficient
scheduling heuristic that performs well on a broad spectrum of platforms
appears to be particularly hard.
This proposal attempts to solve this issue by providing the scheduler
with an energy model of the platform which enables energy impact
estimation of scheduling decisions in a generic way. The energy model is
kept very simple as it represents only the active power of CPUs at all
available P-states and relies on existing data in the kernel (only used
by the thermal subsystem so far).
This proposal does not include the power consumption of C-states and
cluster-level resources which were originally introduced in [1] since
firstly, their impact on task placement decisions appears to be
neglectible on modern asymmetric platforms and secondly, they require
additional infrastructure and data (e.g new DT entries).
The scheduler is also informed of the span of frequency domains, hence
enabling an accurate accounting of the energy costs of frequency
changes. This appears to be especially important for future Arm CPU
topologies (DynamIQ) where the span of scheduling domains can be
different from the span of frequency domains.

1.2 Overutilization/Tipping Point

The primary job for the task scheduler is to deliver the highest
possible throughput with minimal latency. With increasing utilization
the opportunities to save energy for the scheduler become rarer. There
must be spare CPU time available to place tasks based on utilization
in an energy-aware fashion, i.e. to pack tasks on energy-efficient CPUs
with unnecessary constraining of the task throughput. This spare CPU
time decreases towards zero when the utilization of the system rises.  
To cope with this situation, we introduce the concept of overutilization
in order to enable/disable EAS depending on system utilization.
The point in which a system switches from being not overutilized to
being overutilized or vice versa is called the tipping point. A per
sched domain tipping point indicator implementation is introduced here.

1.3 Wakeup path

On a system which has an energy model, the energy-aware wakeup path
trumps affine and capacity based wake up in case the lowest sched
domain of the task's previous CPU is not overutilized. The energy-aware
algorithm tries to find a new target CPU among the CPUs of the highest
non-overutilized domain which includes previous and current CPU, for
which the placement of the task would contribute a minimum on energy
consumption. The energy model is only enabled on CPUs with asymmetric
core capacities (SD_ASYM_CPUCAPACITY). These systems typically have
less than or equal 8 cores.

2. Tests

Two fundamentally different tests were executed. Firstly the energy
test case shows the impact on energy consumption this patch-set has
using a synthetic set of tasks. Secondly the performance test case
provides the conventional hackbench metric numbers.

The tests run on two arm64 big.LITTLE platforms: Hikey960 (4xA73 +
4xA53) and Juno r0 (2xA57 + 4xA53).

Base kernel is tip/sched/core (4.16-rc4), with some Hikey960 and
Juno specific patches, the SD_ASYM_CPUCAPACITY flag set at DIE sched
domain level for arm64 and schedutil as cpufreq governor [2].

2.1 Energy test case

10 iterations of between 10 and 50 periodic rt-app tasks (16ms period,
5% duty-cycle) for 30 seconds with energy measurement. Unit is Joules.
The goal is to save energy, so lower is better.

2.1.1 Hikey960

Energy is measured with an ACME Cape on an instrumented board. Numbers
include consumption of big and little CPUs, LPDDR memory, GPU and most
of the other small components on the board. They do not include
consumption of the radio chip (turned-off anyway) and external
connectors.

+----------+-----------------+------------------------+
|          | Without patches | With patches           |
+----------+---------+-------+-----------------+------+ 
| Tasks nb | Mean   | RSD*   | Mean            | RSD* |
+----------+---------+-------+-----------------+------+
|       10 |   41.50 |  1.1% |  37.43 (-9.81%) | 2.0% |
|       20 |   55.51 |  0.7% |  50.74 (-8.59%) | 1.5% |
|       30 |   75.39 |  0.4% |  70.36 (-6.67%) | 7.3% |
|       40 |   95.82 |  0.3% |  89.90 (-6.18%) | 1.5% |
|       50 |  121.53 |  0.9% | 112.61 (-7.34%) | 0.9% |
+----------+---------+-------+-----------------+------+

2.1.2 Juno r0

Energy is measured with the onboard energy meter. Numbers include
consumption of big and little CPUs.

+----------+-----------------+------------------------+
|          | Without patches | With patches           |
+----------+--------+--------+-----------------+------+
| Tasks nb | Mean   | RSD*   | Mean            | RSD* |
+----------+--------+--------+-----------------+------+
|       10 |  11.52 |  1.1%  |  7.67 (-33.42%) | 2.8% |
|       20 |  19.25 |  0.9%  | 13.39 (-30.44%) | 1.8% |
|       30 |  28.73 |  1.3%  | 21.85 (-31.49%) | 0.6% |
|       40 |  37.58 |  0.9%  | 31.40 (-16.44%) | 0.4% |
|       50 |  47.24 |  0.6%  | 45.37 ( -3.96%) | 0.6% |
+----------+--------+--------+-----------------+------+

2.2 Performance test case

30 iterations of perf bench sched messaging --pipe --thread --group G
--loop L with G=[1 2 4 8] and L=50000 (Hikey960)/16000 (Juno r0).

2.2.1 Hikey960

The impact of thermal capping was mitigated thanks to a heatsink, a
fan, and a 10 sec delay between two successive executions.

+----------------+-----------------+------------------------+
|                | Without patches | With patches           |
+--------+-------+---------+-------+----------------+-------+
| Groups | Tasks | Mean    | RSD*  | Mean           | RSD*  |
+--------+-------+---------+-------+----------------+-------+
|      1 |    40 |    8.01 | 1.70% |  8.16 (+1.90%) | 1.79% |
|      2 |    80 |   15.59 | 0.76% | 15.79 (+1.33%) | 0.92% |
|      4 |   160 |   32.23 | 0.70% | 32.46 (+0.72%) | 0.55% |
|      8 |   320 |   66.93 | 0.46% | 67.40 (+0.69%) | 0.37% |
+--------+-------+---------+-------+----------------+-------+

2.2.2 Juno r0

+----------------+-----------------+------------------------+
|                | Without patches | With patches           |
+--------+-------+---------+-------+----------------+-------+
| Groups | Tasks | Mean    | RSD*  | Mean           | RSD*  |
+--------+-------+---------+-------+----------------+-------+
|      1 |    40 |    8.37 | 0.12% |  8.33 ( 0.00%) | 0.08% |
|      2 |    80 |   14.63 | 0.12% | 14.49 (-0.01%) | 0.14% |
|      4 |   160 |   27.17 | 0.14% | 26.80 (-0.01%) | 0.14% |
|      8 |   320 |   52.50 | 0.25% | 51.54 (-0.02%) | 0.23% |
+--------+-------+---------+-------+----------------+-------+

*RSD: Relative Standard Deviation (std dev / mean)

3. Dependencies

This series depends on additional infrastructure being merged in the
OPP core. As this infrastructure can also be useful for other clients,
the related patches have been posted separately [3].

[1] https://lkml.org/lkml/2015/7/7/754
[2] http://www.linux-arm.org/git?p=linux-de.git;a=shortlog;h=refs/heads/upstream/eas_v1_base
[3] https://marc.info/?l=linux-pm&m=151635516419249&w=2

Dietmar Eggemang (1):
  sched/fair: Create util_fits_capacity()

Quentin Perret (4):
  sched: Introduce energy models of CPUs
  sched/fair: Introduce an energy estimation helper function
  sched/fair: Select an energy-efficient CPU on task wake-up
  drivers: base: arch_topology.c: Enable EAS for arm/arm64 platforms

Thara Gopinath (1):
  sched: Add over-utilization/tipping point indicator

 drivers/base/arch_topology.c   |   2 +
 include/linux/sched/energy.h   |  31 ++++++
 include/linux/sched/topology.h |   1 +
 kernel/sched/Makefile          |   2 +-
 kernel/sched/energy.c          | 190 ++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c            | 226 +++++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h           |   1 +
 kernel/sched/topology.c        |  12 +--
 8 files changed, 449 insertions(+), 16 deletions(-)
 create mode 100644 include/linux/sched/energy.h
 create mode 100644 kernel/sched/energy.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 55+ messages in thread