linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Documentation: Explain EAS and EM
@ 2019-01-10 11:05 Quentin Perret
  2019-01-10 11:05 ` [PATCH 1/2] PM / EM: Document the Energy Model framework Quentin Perret
  2019-01-10 11:05 ` [PATCH 2/2] sched: Document Energy Aware Scheduling Quentin Perret
  0 siblings, 2 replies; 22+ messages in thread
From: Quentin Perret @ 2019-01-10 11:05 UTC (permalink / raw)
  To: corbet, peterz, rjw
  Cc: mingo, morten.rasmussen, qais.yousef, patrick.bellasi,
	dietmar.eggemann, linux-doc, linux-pm, linux-kernel,
	quentin.perret

The recently introduced Energy Aware Scheduling (EAS) feature relies on
a large set of concepts, assumptions, and design choices that are
probably not obvious for an outsider. Moreover, enabling EAS on a
particular platform isn't straightforward because of all its
dependencies. This series tries to address this by introducing proper
documentation files for the scheduler's part of EAS and for the newly
introduced Energy Model (EM) framework. These are meant to explain not
only the design choices of EAS but also to list its dependencies in a
human-readable location.

The two new doc files are simple .txt to be consistent with the existing
documentation of the relevant subsystems, but I'm happy to translate to
.rst if deemed necessary.

All feedback is welcome.

Thanks !
Quentin

Quentin Perret (2):
  PM / EM: Document the Energy Model framework
  sched: Document Energy Aware Scheduling

 Documentation/power/energy-model.txt     | 144 ++++++++
 Documentation/scheduler/sched-energy.txt | 425 +++++++++++++++++++++++
 2 files changed, 569 insertions(+)
 create mode 100644 Documentation/power/energy-model.txt
 create mode 100644 Documentation/scheduler/sched-energy.txt

-- 
2.20.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] PM / EM: Document the Energy Model framework
  2019-01-10 11:05 [PATCH 0/2] Documentation: Explain EAS and EM Quentin Perret
@ 2019-01-10 11:05 ` Quentin Perret
  2019-01-17 14:47   ` Juri Lelli
                     ` (3 more replies)
  2019-01-10 11:05 ` [PATCH 2/2] sched: Document Energy Aware Scheduling Quentin Perret
  1 sibling, 4 replies; 22+ messages in thread
From: Quentin Perret @ 2019-01-10 11:05 UTC (permalink / raw)
  To: corbet, peterz, rjw
  Cc: mingo, morten.rasmussen, qais.yousef, patrick.bellasi,
	dietmar.eggemann, linux-doc, linux-pm, linux-kernel,
	quentin.perret

Introduce a documentation file summarizing the key design points and
APIs of the newly introduced Energy Model framework.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
---
 Documentation/power/energy-model.txt | 144 +++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)
 create mode 100644 Documentation/power/energy-model.txt

diff --git a/Documentation/power/energy-model.txt b/Documentation/power/energy-model.txt
new file mode 100644
index 000000000000..a2b0ae4c76bd
--- /dev/null
+++ b/Documentation/power/energy-model.txt
@@ -0,0 +1,144 @@
+                           ====================
+                           Energy Model of CPUs
+                           ====================
+
+1. Overview
+-----------
+
+The Energy Model (EM) framework serves as an interface between drivers knowing
+the power consumed by CPUs at various performance levels, and the kernel
+subsystems willing to use that information to make energy-aware decisions.
+
+The source of the information about the power consumed by CPUs can vary greatly
+from one platform to another. These power costs can be estimated using
+devicetree data in some cases. In others, the firmware will know better.
+Alternatively, userspace might be best positioned. And so on. In order to avoid
+each and every client subsystem to re-implement support for each and every
+possible source of information on its own, the EM framework intervenes as an
+abstraction layer which standardizes the format of power cost tables in the
+kernel, hence enabling to avoid redundant work.
+
+The figure below depicts an example of drivers (Arm-specific here, but the
+approach is applicable to any architecture) providing power costs to the EM
+framework, and interested clients reading the data from it.
+
+       +---------------+  +-----------------+  +---------------+
+       | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
+       +---------------+  +-----------------+  +---------------+
+               |                   | em_pd_energy()    |
+               |                   | em_cpu_get()      |
+               +---------+         |         +---------+
+                         |         |         |
+                         v         v         v
+                        +---------------------+
+                        |    Energy Model     |
+                        |     Framework       |
+                        +---------------------+
+                           ^       ^       ^
+                           |       |       | em_register_perf_domain()
+                +----------+       |       +---------+
+                |                  |                 |
+        +---------------+  +---------------+  +--------------+
+        |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
+        +---------------+  +---------------+  +--------------+
+                ^                  ^                 ^
+                |                  |                 |
+        +--------------+   +---------------+  +--------------+
+        | Device Tree  |   |   Firmware    |  |      ?       |
+        +--------------+   +---------------+  +--------------+
+
+The EM framework manages power cost tables per 'performance domain' in the
+system. A performance domain is a group of CPUs whose performance is scaled
+together. Performance domains generally have a 1-to-1 mapping with CPUFreq
+policies. All CPUs in a performance domain are required to have the same
+micro-architecture. CPUs in different performance domains can have different
+micro-architectures.
+
+
+2. Core APIs
+------------
+
+  2.1 Config options
+
+CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
+
+
+  2.2 Registration of performance domains
+
+Drivers are expected to register performance domains into the EM framework by
+calling the following API:
+
+  int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
+			      struct em_data_callback *cb);
+
+Drivers must specify the CPUs of the performance domains using the cpumask
+argument, and provide a callback function returning <frequency, power> tuples
+for each capacity state. The callback function provided by the driver is free
+to fetch data from any relevant location (DT, firmware, ...), and by any mean
+deemed necessary. See Section 3. for an example of driver implementing this
+callback, and kernel/power/energy_model.c for further documentation on this
+API.
+
+
+  2.3 Accessing performance domains
+
+Subsystems interested in the energy model of a CPU can retrieve it using the
+em_cpu_get() API. The energy model tables are allocated once upon creation of
+the performance domains, and kept in memory untouched.
+
+The energy consumed by a performance domain can be estimated using the
+em_pd_energy() API. The estimation is performed assuming that the schedutil
+CPUfreq governor is in use.
+
+More details about the above APIs can be found in include/linux/energy_model.h.
+
+
+3. Example driver
+-----------------
+
+This section provides a simple example of a CPUFreq driver registering a
+performance domain in the Energy Model framework using the (fake) 'foo'
+protocol. The driver implements an est_power() function to be provided to the
+EM framework.
+
+ -> drivers/cpufreq/foo_cpufreq.c
+
+01	static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
+02	{
+03		long freq, power;
+04
+05		/* Use the 'foo' protocol to ceil the frequency */
+06		freq = foo_get_freq_ceil(cpu, *KHz);
+07		if (freq < 0);
+08			return freq;
+09
+10		/* Estimate the power cost for the CPU at the relevant freq. */
+11		power = foo_estimate_power(cpu, freq);
+12		if (power < 0);
+13			return power;
+14
+15		/* Return the values to the EM framework */
+16		*mW = power;
+17		*KHz = freq;
+18
+19		return 0;
+20	}
+21
+22	static int foo_cpufreq_init(struct cpufreq_policy *policy)
+23	{
+24		struct em_data_callback em_cb = EM_DATA_CB(est_power);
+25		int nr_opp, ret;
+26
+27		/* Do the actual CPUFreq init work ... */
+28		ret = do_foo_cpufreq_init(policy);
+29		if (ret)
+30			return ret;
+31
+32		/* Find the number of OPPs for this policy */
+33		nr_opp = foo_get_nr_opp(policy);
+34
+35		/* And register the new performance domain */
+36		em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
+37
+38	        return 0;
+39	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-10 11:05 [PATCH 0/2] Documentation: Explain EAS and EM Quentin Perret
  2019-01-10 11:05 ` [PATCH 1/2] PM / EM: Document the Energy Model framework Quentin Perret
@ 2019-01-10 11:05 ` Quentin Perret
  2019-01-17 15:51   ` Juri Lelli
                     ` (3 more replies)
  1 sibling, 4 replies; 22+ messages in thread
From: Quentin Perret @ 2019-01-10 11:05 UTC (permalink / raw)
  To: corbet, peterz, rjw
  Cc: mingo, morten.rasmussen, qais.yousef, patrick.bellasi,
	dietmar.eggemann, linux-doc, linux-pm, linux-kernel,
	quentin.perret

Add some documentation detailing the main design points of EAS, as well
as a list of its dependencies.

Parts of this documentation are taken from Morten Rasmussen's original
EAS posting: https://lkml.org/lkml/2015/7/7/754

Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Co-authored-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
---
 Documentation/scheduler/sched-energy.txt | 425 +++++++++++++++++++++++
 1 file changed, 425 insertions(+)
 create mode 100644 Documentation/scheduler/sched-energy.txt

diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
new file mode 100644
index 000000000000..197d81f4b836
--- /dev/null
+++ b/Documentation/scheduler/sched-energy.txt
@@ -0,0 +1,425 @@
+			   =======================
+			   Energy Aware Scheduling
+			   =======================
+
+1. Introduction
+---------------
+
+Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict
+the impact of its decisions on the energy consumed by CPUs. EAS relies on an
+Energy Model (EM) of the CPUs to select an energy efficient CPU for each task,
+with a minimal impact on throughput. This document aims at providing an
+introduction on how EAS works, what are the main design decisions behind it, and
+details what is needed to get it to run.
+
+Before going any further, please note that at the time of writing:
+
+   /!\ EAS does not support platforms with symmetric CPU topologies /!\
+
+EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
+because this is where the potential for saving energy through scheduling is
+the highest.
+
+The actual EM used by EAS is _not_ maintained by the scheduler, but by a
+dedicated framework. For details about this framework and what it provides,
+please refer to its documentation (see Documentation/power/energy-model.txt).
+
+
+2. Background and Terminology
+-----------------------------
+
+To make it clear from the start:
+ - energy = [joule] (resource like a battery on powered devices)
+ - power = energy/time = [joule/second] = [watt]
+
+The goal of EAS is to minimize energy, while still getting the job done. That
+is, we want to maximize:
+
+	performance [inst/s]
+	--------------------
+	    power [W]
+
+which is equivalent to minimizing:
+
+	energy [J]
+	-----------
+	instruction
+
+while still getting 'good' performance. It is essentially an alternative
+optimization objective to the current performance-only objective for the
+scheduler. This alternative considers two objectives: energy-efficiency and
+performance.
+
+The idea behind introducing an EM is to allow the scheduler to evaluate the
+implications of its decisions rather than blindly applying energy-saving
+techniques that may have positive effects only on some platforms. At the same
+time, the EM must be as simple as possible to minimize the scheduler latency
+impact.
+
+In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
+for the scheduler to decide where a task should run (during wake-up), the EM
+is used to break the tie between several good CPU candidates and pick the one
+that is predicted to yield the best energy consumption without harming the
+system's throughput. The predictions made by EAS rely on specific elements of
+knowledge about the platform's topology, which include the 'capacity' of CPUs,
+and their respective energy costs.
+
+
+3. Topology information
+-----------------------
+
+EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
+differentiate CPUs with different computing throughput. The 'capacity' of a CPU
+represents the amount of work it can absorb when running at its highest
+frequency compared to the most capable CPU of the system. Capacity values are
+normalized in a 1024 range, and are comparable with the utilization signals of
+tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
+to capacity and utilization values, EAS is able to estimate how big/busy a
+task/CPU is, and to take this into consideration when evaluating performance vs
+energy trade-offs. The capacity of CPUs is provided via arch-specific code
+through the arch_scale_cpu_capacity() callback.
+
+The rest of platform knowledge used by EAS is directly read from the Energy
+Model (EM) framework. The EM of a platform is composed of a power cost table
+per 'performance domain' in the system (see Documentation/power/energy-model.txt
+for futher details about performance domains).
+
+The scheduler manages references to the EM objects in the topology code when the
+scheduling domains are built, or re-built. For each root domain (rd), the
+scheduler maintains a singly linked list of all performance domains intersecting
+the current rd->span. Each node in the list contains a pointer to a struct
+em_perf_domain as provided by the EM framework.
+
+The lists are attached to the root domains in order to cope with exclusive
+cpuset configurations. Since the boundaries of exclusive cpusets do not
+necessarily match those of performance domains, the lists of different root
+domains can contain duplicate elements.
+
+Example 1.
+    Let us consider a platform with 12 CPUs, split in 3 performance domains
+    (pd0, pd4 and pd8), organized as follows:
+
+	          CPUs:   0 1 2 3 4 5 6 7 8 9 10 11
+	          PDs:   |--pd0--|--pd4--|---pd8---|
+	          RDs:   |----rd1----|-----rd2-----|
+
+    Now, consider that userspace decided to split the system with two
+    exclusive cpusets, hence creating two independent root domains, each
+    containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the
+    above figure. Since pd4 intersects with both rd1 and rd2, it will be
+    present in the linked list '->pd' attached to each of them:
+       * rd1->pd: pd0 -> pd4
+       * rd2->pd: pd4 -> pd8
+
+    Please note that the scheduler will create two duplicate list nodes for
+    pd4 (one for each list). However, both just hold a pointer to the same
+    shared data structure of the EM framework.
+
+Since the access to these lists can happen concurrently with hotplug and other
+things, they are protected by RCU, like the rest of topology structures
+manipulated by the scheduler.
+
+EAS also maintains a static key (sched_energy_present) which is enabled when at
+least one root domain meets all conditions for EAS to start. Those conditions
+are summarized in Section 6.
+
+
+4. Energy-Aware task placement
+------------------------------
+
+EAS overrides the CFS task wake-up balancing code. It uses the EM of the
+platform and the PELT signals to choose an energy-efficient target CPU during
+wake-up balance. When EAS is enabled, select_task_rq_fair() calls
+find_energy_efficient_cpu() to do the placement decision. This function looks
+for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in
+each performance domain since it is the one which will allow us to keep the
+frequency the lowest. Then, the function checks if placing the task there could
+save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran
+in its previous activation.
+
+find_energy_efficient_cpu() uses compute_energy() to estimate what will be the
+energy consumed by the system if the waking task was migrated. compute_energy()
+looks at the current utilization landscape of the CPUs and adjusts it to
+'simulate' the task migration. The EM framework provides the em_pd_energy() API
+which computes the expected energy consumption of each performance domain for
+the given utilization landscape.
+
+An example of energy-optimized task placement decision is detailed below.
+
+Example 2.
+    Let us consider a (fake) platform with 2 independent performance domains
+    composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3
+    are big.
+
+    The scheduler must decide where to place a task P whose util_avg = 200
+    and prev_cpu = 0.
+
+    The current utilization landscape of the CPUs is depicted on the graph
+    below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively
+    Each performance domain has three Operating Performance Points (OPPs).
+    The CPU capacity and power cost associated with each OPP is listed in
+    the Energy Model table. The util_avg of P is shown on the figures
+    below as 'PP'.
+
+    CPU util.
+      1024                 - - - - - - -              Energy Model
+                                               +-----------+-------------+
+                                               |  Little   |     Big     |
+       768                 =============       +-----+-----+------+------+
+                                               | Cap | Pwr | Cap  | Pwr  |
+                                               +-----+-----+------+------+
+       512  ===========    - ##- - - - -       | 170 | 50  | 512  | 400  |
+                             ##     ##         | 341 | 150 | 768  | 800  |
+       341  -PP - - - -      ##     ##         | 512 | 300 | 1024 | 1700 |
+             PP              ##     ##         +-----+-----+------+------+
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+      Current OPP: =====       Other OPP: - - -     util_avg (100 each): ##
+
+
+    find_energy_efficient_cpu() will first look for the CPUs with the
+    maximum spare capacity in the two performance domains. In this example,
+    CPU1 and CPU3. Then it will estimate the energy of the system if P was
+    placed on either of them, and check if that would save some energy
+    compared to leaving P on CPU0. EAS assumes that OPPs follow utilization
+    (which is coherent with the behaviour of the schedutil CPUFreq
+    governor, see Section 6. for more details on this topic).
+
+    Case 1. P is migrated to CPU1
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 300 / 341 * 150 = 131
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1364
+       341  ===========      ##     ##
+                    PP       ##     ##
+       170  -## - - PP-      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    Case 2. P is migrated to CPU3
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 100 / 341 * 150 = 43
+                                    PP       * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - -PP -     * CPU3: 700 / 768 * 800 = 729
+                             ##     ##          => total_energy = 1485
+       341  ===========      ##     ##
+                             ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    Case 3. P stays on prev_cpu / CPU 0
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 400 / 512 * 300 = 234
+                                             * CPU1: 100 / 512 * 300 = 58
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  ===========    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1437
+       341  -PP - - - -      ##     ##
+             PP              ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    From these calculations, the Case 1 has the lowest total energy. So CPU 1
+    is be the best candidate from an energy-efficiency standpoint.
+
+Big CPUs are generally more power hungry than the little ones and are thus used
+mainly when a task doesn't fit the littles. However, little CPUs aren't always
+necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
+of the little CPUs can be less energy-efficient than the lowest OPPs of the
+bigs, for example. So, if the little CPUs happen to have enough utilization at
+a specific point in time, a small task waking up at that moment could be better
+of executing on the big side in order to save energy, even though it would fit
+on the little side.
+
+And even in the case where all OPPs of the big CPUs are less energy-efficient
+than those of the little, using the big CPUs for a small task might still, under
+specific conditions, save energy. Indeed, placing a task on a little CPU can
+result in raising the OPP of the entire performance domain, and that will
+increase the cost of the tasks already running there. If the waking task is
+placed on a big CPU, its own execution cost might be higher than if it was
+running on a little, but it won't impact the other tasks of the little CPUs
+which will keep running at a lower OPP. So, when considering the total energy
+consumed by CPUs, the extra cost of running that one task on a big core can be
+smaller than the cost of raising the OPP on the little CPUs for all the other
+tasks.
+
+The examples above would be nearly impossible to get right in a generic way, and
+for all platforms, without knowing the cost of running at different OPPs on all
+CPUs of the system. Thanks to its EM-based design, EAS should cope with them
+correctly without too many troubles. However, in order to ensure a minimal
+impact on throughput for high-utilization scenarios, EAS also implements another
+mechanism called 'over-utilization'.
+
+
+5. Over-utilization
+-------------------
+
+From a general standpoint, the use-cases where EAS can help the most are those
+involving a light/medium CPU utilization. Whenever long CPU-bound tasks are
+being run, they will require all of the available CPU capacity, and there isn't
+much that can be done by the scheduler to save energy without severly harming
+throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
+'over-utilized' as soon as they are used at more than 80% of their compute
+capacity. As long as no CPUs are over-utilized in a root domain, load balancing
+is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
+the most energy efficient CPUs of the system more than the others if that can be
+done without harming throughput. So, the load-balancer is disabled to prevent
+it from breaking the energy-efficient task placement found by EAS. It is safe to
+do so when the system isn't overutilized since being below the 80% tipping point
+implies that:
+
+    a. there is some idle time on all CPUs, so the utilization signals used by
+       EAS are likely to accurately represent the 'size' of the various tasks
+       in the system;
+    b. all tasks should already be provided with enough CPU capacity,
+       regardless of their nice values;
+    c. since there is spare capacity all tasks must be blocking/sleeping
+       regularly and balancing at wake-up is sufficient.
+
+As soon as one CPU goes above the 80% tipping point, at least one of the three
+assumptions above becomes incorrect. In this scenario, the 'overutilized' flag
+is raised for the entire root domain, EAS is disabled, and the load-balancer is
+re-enabled. By doing so, the scheduler falls back onto load-based algorithms for
+wake-up and load balance under CPU-bound conditions. This provides a better
+respect of the nice values of tasks.
+
+Since the notion of overutilization largely relies on detecting whether or not
+there is some idle time in the system, the CPU capacity 'stolen' by higher
+(than CFS) scheduling classes (as well as IRQ) must be taken into account. As
+such, the detection of overutilization accounts for the capacity used not only
+by CFS tasks, but also by the other scheduling classes and IRQ.
+
+
+6. Dependencies and requirements for EAS
+----------------------------------------
+
+Energy Aware Scheduling depends on the CPUs of the system having specific
+hardware properties and on other features of the kernel being enabled. This
+section lists these dependencies and provides hints as to how they can be met.
+
+
+  6.1 - Asymmetric CPU topology
+
+As mentioned in the introduction, EAS is only supported on platforms with
+asymmetric CPU topologies for now. This requirement is checked at run-time by
+looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
+domains are built.
+
+The flag is set/cleared automatically by the scheduler topology code whenever
+there are CPUs with different capacities in a root domain. The capacities of
+CPUs are provided by arch-specific code through the arch_scale_cpu_capacity()
+callback. As an example, arm and arm64 share an implementation of this callback
+which uses a combination of CPUFreq data and device-tree bindings to compute the
+capacity of CPUs (see drivers/base/arch_topology.c for more details).
+
+So, in order to use EAS on your platform your architecture must implement the
+arch_scale_cpu_capacity() callback, and some of the CPUs must have a lower
+capacity than others.
+
+Please note that EAS is not fundamentally incompatible with SMP, but no
+significant savings on SMP platforms have been observed yet. This restriction
+could be amended in the future if proven otherwise.
+
+
+  6.2 - Energy Model presence
+
+EAS uses the EM of a platform to estimate the impact of scheduling decisions on
+energy. So, your platform must provide power cost tables to the EM framework in
+order to make EAS start. To do so, please refer to documentation of the
+independent EM framework in Documentation/power/energy-model.txt.
+
+Please also note that the scheduling domains need to be re-built after the
+EM has been registered in order to start EAS.
+
+
+  6.3 - Energy Model complexity
+
+The task wake-up path is very latency-sensitive. When the EM of a platform is
+too complex (too many CPUs, too many performance domains, too many performance
+states, ...), the cost of using it in the wake-up path can become prohibitive.
+The energy-aware wake-up algorithm has a complexity of:
+
+	C = Nd * (Nc + Ns)
+
+with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
+total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
+
+A complexity check is performed at the root domain level, when scheduling
+domains are built. EAS will not start on a root domain if its C happens to be
+higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
+time of writing).
+
+If you really want to use EAS but the complexity of your platform's Energy
+Model is too high to be used with a single root domain, you're left with only
+two possible options:
+
+    1. split your system into separate, smaller, root domains using exclusive
+       cpusets and enable EAS locally on each of them. This option has the
+       benefit to work out of the box but the drawback of preventing load
+       balance between root domains, which can result in an unbalanced system
+       overall;
+    2. submit patches to reduce the complexity of the EAS wake-up algorithm,
+       hence enabling it to cope with larger EMs in reasonable time.
+
+
+  6.4 - Schedutil governor
+
+EAS tries to predict at which OPP will the CPUs be running in the close future
+in order to estimate their energy consumption. To do so, it is assumed that OPPs
+of CPUs follow their utilization.
+
+Although it is very difficult to provide hard guarantees regarding the accuracy
+of this assumption in practice (because the hardware might not do what it is
+told to do, for example), schedutil as opposed to other CPUFreq governors at
+least _requests_ frequencies calculated using the utilization signals.
+Consequently, the only sane governor to use together with EAS is schedutil,
+because it is the only one providing some degree of consistency between
+frequency requests and energy predictions.
+
+Using EAS with any other governor than schedutil is not supported.
+
+
+  6.5 Scale-invariant utilization signals
+
+In order to make accurate prediction across CPUs and for all performance
+states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
+be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
+callbacks.
+
+Using EAS on a platform that doesn't implement these two callbacks is not
+supported.
+
+
+  6.6 Multithreading (SMT)
+
+EAS in its current form is SMT unaware and is not able to leverage
+multithreaded hardware to save energy. EAS considers threads as independent
+CPUs, which can actually be counter-productive for both performance and energy.
+
+EAS on SMT is not supported.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] PM / EM: Document the Energy Model framework
  2019-01-10 11:05 ` [PATCH 1/2] PM / EM: Document the Energy Model framework Quentin Perret
@ 2019-01-17 14:47   ` Juri Lelli
  2019-01-17 14:53     ` Quentin Perret
  2019-01-21 11:37   ` [tip:sched/core] PM/EM: " tip-bot for Quentin Perret
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Juri Lelli @ 2019-01-17 14:47 UTC (permalink / raw)
  To: Quentin Perret
  Cc: corbet, peterz, rjw, mingo, morten.rasmussen, qais.yousef,
	patrick.bellasi, dietmar.eggemann, linux-doc, linux-pm,
	linux-kernel

Hi,

On 10/01/19 11:05, Quentin Perret wrote:
> Introduce a documentation file summarizing the key design points and
> APIs of the newly introduced Energy Model framework.
> 
> Signed-off-by: Quentin Perret <quentin.perret@arm.com>

Looks good to me.

Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

Best,

- Juri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] PM / EM: Document the Energy Model framework
  2019-01-17 14:47   ` Juri Lelli
@ 2019-01-17 14:53     ` Quentin Perret
  0 siblings, 0 replies; 22+ messages in thread
From: Quentin Perret @ 2019-01-17 14:53 UTC (permalink / raw)
  To: Juri Lelli
  Cc: corbet, peterz, rjw, mingo, morten.rasmussen, qais.yousef,
	patrick.bellasi, dietmar.eggemann, linux-doc, linux-pm,
	linux-kernel

On Thursday 17 Jan 2019 at 15:47:44 (+0100), Juri Lelli wrote:
> Hi,
> 
> On 10/01/19 11:05, Quentin Perret wrote:
> > Introduce a documentation file summarizing the key design points and
> > APIs of the newly introduced Energy Model framework.
> > 
> > Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> 
> Looks good to me.
> 
> Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

Thanks !
Quentin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-10 11:05 ` [PATCH 2/2] sched: Document Energy Aware Scheduling Quentin Perret
@ 2019-01-17 15:51   ` Juri Lelli
  2019-01-18  9:16     ` Quentin Perret
  2019-01-21 11:38   ` [tip:sched/core] sched/doc: " tip-bot for Quentin Perret
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Juri Lelli @ 2019-01-17 15:51 UTC (permalink / raw)
  To: Quentin Perret
  Cc: corbet, peterz, rjw, mingo, morten.rasmussen, qais.yousef,
	patrick.bellasi, dietmar.eggemann, linux-doc, linux-pm,
	linux-kernel

Hi,

On 10/01/19 11:05, Quentin Perret wrote:
> Add some documentation detailing the main design points of EAS, as well
> as a list of its dependencies.
> 
> Parts of this documentation are taken from Morten Rasmussen's original
> EAS posting: https://lkml.org/lkml/2015/7/7/754
> 
> Reviewed-by: Qais Yousef <qais.yousef@arm.com>
> Co-authored-by: Morten Rasmussen <morten.rasmussen@arm.com>
> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> ---
>  Documentation/scheduler/sched-energy.txt | 425 +++++++++++++++++++++++
>  1 file changed, 425 insertions(+)
>  create mode 100644 Documentation/scheduler/sched-energy.txt
> 
> diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
> new file mode 100644
> index 000000000000..197d81f4b836
> --- /dev/null
> +++ b/Documentation/scheduler/sched-energy.txt
> @@ -0,0 +1,425 @@
> +			   =======================
> +			   Energy Aware Scheduling
> +			   =======================
> +
> +1. Introduction
> +---------------
> +
> +Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict
> +the impact of its decisions on the energy consumed by CPUs. EAS relies on an
> +Energy Model (EM) of the CPUs to select an energy efficient CPU for each task,
> +with a minimal impact on throughput. This document aims at providing an
> +introduction on how EAS works, what are the main design decisions behind it, and
> +details what is needed to get it to run.
> +
> +Before going any further, please note that at the time of writing:
> +
> +   /!\ EAS does not support platforms with symmetric CPU topologies /!\
> +
> +EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
> +because this is where the potential for saving energy through scheduling is
> +the highest.
> +
> +The actual EM used by EAS is _not_ maintained by the scheduler, but by a
> +dedicated framework. For details about this framework and what it provides,
> +please refer to its documentation (see Documentation/power/energy-model.txt).
> +
> +
> +2. Background and Terminology
> +-----------------------------
> +
> +To make it clear from the start:
> + - energy = [joule] (resource like a battery on powered devices)
> + - power = energy/time = [joule/second] = [watt]
> +
> +The goal of EAS is to minimize energy, while still getting the job done. That
> +is, we want to maximize:
> +
> +	performance [inst/s]
> +	--------------------
> +	    power [W]
> +
> +which is equivalent to minimizing:
> +
> +	energy [J]
> +	-----------
> +	instruction
> +
> +while still getting 'good' performance. It is essentially an alternative
> +optimization objective to the current performance-only objective for the
> +scheduler. This alternative considers two objectives: energy-efficiency and
> +performance.
> +
> +The idea behind introducing an EM is to allow the scheduler to evaluate the
> +implications of its decisions rather than blindly applying energy-saving
> +techniques that may have positive effects only on some platforms. At the same
> +time, the EM must be as simple as possible to minimize the scheduler latency
> +impact.
> +
> +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time

Not sure if we want to remark the fact that EAS is looking at CFS tasks
only ATM.

> +for the scheduler to decide where a task should run (during wake-up), the EM
> +is used to break the tie between several good CPU candidates and pick the one
> +that is predicted to yield the best energy consumption without harming the
> +system's throughput. The predictions made by EAS rely on specific elements of
> +knowledge about the platform's topology, which include the 'capacity' of CPUs,

Add a reference to DT bindings docs defining 'capacity' (or define it
somewhere)?

> +and their respective energy costs.
> +
> +
> +3. Topology information
> +-----------------------
> +
> +EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
> +differentiate CPUs with different computing throughput. The 'capacity' of a CPU
> +represents the amount of work it can absorb when running at its highest
> +frequency compared to the most capable CPU of the system. Capacity values are
> +normalized in a 1024 range, and are comparable with the utilization signals of
> +tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
> +to capacity and utilization values, EAS is able to estimate how big/busy a
> +task/CPU is, and to take this into consideration when evaluating performance vs
> +energy trade-offs. The capacity of CPUs is provided via arch-specific code
> +through the arch_scale_cpu_capacity() callback.

Ah, it's here, mmm, maybe still introduce it before (or point here from
above) and still point to dt bindings doc?

> +
> +The rest of platform knowledge used by EAS is directly read from the Energy
> +Model (EM) framework. The EM of a platform is composed of a power cost table
> +per 'performance domain' in the system (see Documentation/power/energy-model.txt
> +for futher details about performance domains).
> +
> +The scheduler manages references to the EM objects in the topology code when the
> +scheduling domains are built, or re-built. For each root domain (rd), the
> +scheduler maintains a singly linked list of all performance domains intersecting
> +the current rd->span. Each node in the list contains a pointer to a struct
> +em_perf_domain as provided by the EM framework.
> +
> +The lists are attached to the root domains in order to cope with exclusive
> +cpuset configurations. Since the boundaries of exclusive cpusets do not
> +necessarily match those of performance domains, the lists of different root
> +domains can contain duplicate elements.
> +
> +Example 1.
> +    Let us consider a platform with 12 CPUs, split in 3 performance domains
> +    (pd0, pd4 and pd8), organized as follows:
> +
> +	          CPUs:   0 1 2 3 4 5 6 7 8 9 10 11
> +	          PDs:   |--pd0--|--pd4--|---pd8---|
> +	          RDs:   |----rd1----|-----rd2-----|
> +
> +    Now, consider that userspace decided to split the system with two
> +    exclusive cpusets, hence creating two independent root domains, each
> +    containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the
> +    above figure. Since pd4 intersects with both rd1 and rd2, it will be
> +    present in the linked list '->pd' attached to each of them:
> +       * rd1->pd: pd0 -> pd4
> +       * rd2->pd: pd4 -> pd8
> +
> +    Please note that the scheduler will create two duplicate list nodes for
> +    pd4 (one for each list). However, both just hold a pointer to the same
> +    shared data structure of the EM framework.
> +
> +Since the access to these lists can happen concurrently with hotplug and other
> +things, they are protected by RCU, like the rest of topology structures
> +manipulated by the scheduler.
> +
> +EAS also maintains a static key (sched_energy_present) which is enabled when at
> +least one root domain meets all conditions for EAS to start. Those conditions
> +are summarized in Section 6.
> +
> +
> +4. Energy-Aware task placement
> +------------------------------
> +
> +EAS overrides the CFS task wake-up balancing code. It uses the EM of the
> +platform and the PELT signals to choose an energy-efficient target CPU during
> +wake-up balance. When EAS is enabled, select_task_rq_fair() calls
> +find_energy_efficient_cpu() to do the placement decision. This function looks
> +for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in
> +each performance domain since it is the one which will allow us to keep the
> +frequency the lowest. Then, the function checks if placing the task there could
> +save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran
> +in its previous activation.
> +
> +find_energy_efficient_cpu() uses compute_energy() to estimate what will be the
> +energy consumed by the system if the waking task was migrated. compute_energy()
> +looks at the current utilization landscape of the CPUs and adjusts it to
> +'simulate' the task migration. The EM framework provides the em_pd_energy() API
> +which computes the expected energy consumption of each performance domain for
> +the given utilization landscape.
> +
> +An example of energy-optimized task placement decision is detailed below.
> +
> +Example 2.
> +    Let us consider a (fake) platform with 2 independent performance domains
> +    composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3
> +    are big.
> +
> +    The scheduler must decide where to place a task P whose util_avg = 200
> +    and prev_cpu = 0.
> +
> +    The current utilization landscape of the CPUs is depicted on the graph
> +    below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively
> +    Each performance domain has three Operating Performance Points (OPPs).
> +    The CPU capacity and power cost associated with each OPP is listed in
> +    the Energy Model table. The util_avg of P is shown on the figures
> +    below as 'PP'.
> +
> +    CPU util.
> +      1024                 - - - - - - -              Energy Model
> +                                               +-----------+-------------+
> +                                               |  Little   |     Big     |
> +       768                 =============       +-----+-----+------+------+
> +                                               | Cap | Pwr | Cap  | Pwr  |
> +                                               +-----+-----+------+------+
> +       512  ===========    - ##- - - - -       | 170 | 50  | 512  | 400  |
> +                             ##     ##         | 341 | 150 | 768  | 800  |
> +       341  -PP - - - -      ##     ##         | 512 | 300 | 1024 | 1700 |
> +             PP              ##     ##         +-----+-----+------+------+
> +       170  -## - - - -      ##     ##
> +             ##     ##       ##     ##
> +           ------------    -------------
> +            CPU0   CPU1     CPU2   CPU3
> +
> +      Current OPP: =====       Other OPP: - - -     util_avg (100 each): ##
> +
> +
> +    find_energy_efficient_cpu() will first look for the CPUs with the
> +    maximum spare capacity in the two performance domains. In this example,
> +    CPU1 and CPU3. Then it will estimate the energy of the system if P was
> +    placed on either of them, and check if that would save some energy
> +    compared to leaving P on CPU0. EAS assumes that OPPs follow utilization
> +    (which is coherent with the behaviour of the schedutil CPUFreq
> +    governor, see Section 6. for more details on this topic).
> +
> +    Case 1. P is migrated to CPU1
> +    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +      1024                 - - - - - - -
> +
> +                                            Energy calculation:
> +       768                 =============     * CPU0: 200 / 341 * 150 = 88
> +                                             * CPU1: 300 / 341 * 150 = 131
> +                                             * CPU2: 600 / 768 * 800 = 625
> +       512  - - - - - -    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
> +                             ##     ##          => total_energy = 1364
> +       341  ===========      ##     ##
> +                    PP       ##     ##
> +       170  -## - - PP-      ##     ##
> +             ##     ##       ##     ##
> +           ------------    -------------
> +            CPU0   CPU1     CPU2   CPU3
> +
> +
> +    Case 2. P is migrated to CPU3
> +    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +      1024                 - - - - - - -
> +
> +                                            Energy calculation:
> +       768                 =============     * CPU0: 200 / 341 * 150 = 88
> +                                             * CPU1: 100 / 341 * 150 = 43
> +                                    PP       * CPU2: 600 / 768 * 800 = 625
> +       512  - - - - - -    - ##- - -PP -     * CPU3: 700 / 768 * 800 = 729
> +                             ##     ##          => total_energy = 1485
> +       341  ===========      ##     ##
> +                             ##     ##
> +       170  -## - - - -      ##     ##
> +             ##     ##       ##     ##
> +           ------------    -------------
> +            CPU0   CPU1     CPU2   CPU3
> +
> +
> +    Case 3. P stays on prev_cpu / CPU 0
> +    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +      1024                 - - - - - - -
> +
> +                                            Energy calculation:
> +       768                 =============     * CPU0: 400 / 512 * 300 = 234
> +                                             * CPU1: 100 / 512 * 300 = 58
> +                                             * CPU2: 600 / 768 * 800 = 625
> +       512  ===========    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
> +                             ##     ##          => total_energy = 1437
> +       341  -PP - - - -      ##     ##
> +             PP              ##     ##
> +       170  -## - - - -      ##     ##
> +             ##     ##       ##     ##
> +           ------------    -------------
> +            CPU0   CPU1     CPU2   CPU3
> +
> +
> +    From these calculations, the Case 1 has the lowest total energy. So CPU 1
> +    is be the best candidate from an energy-efficiency standpoint.
> +
> +Big CPUs are generally more power hungry than the little ones and are thus used
> +mainly when a task doesn't fit the littles. However, little CPUs aren't always
> +necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
> +of the little CPUs can be less energy-efficient than the lowest OPPs of the
> +bigs, for example. So, if the little CPUs happen to have enough utilization at
> +a specific point in time, a small task waking up at that moment could be better
> +of executing on the big side in order to save energy, even though it would fit
     ^ +f

> +on the little side.
> +
> +And even in the case where all OPPs of the big CPUs are less energy-efficient
> +than those of the little, using the big CPUs for a small task might still, under
> +specific conditions, save energy. Indeed, placing a task on a little CPU can
> +result in raising the OPP of the entire performance domain, and that will
> +increase the cost of the tasks already running there. If the waking task is
> +placed on a big CPU, its own execution cost might be higher than if it was
> +running on a little, but it won't impact the other tasks of the little CPUs
> +which will keep running at a lower OPP. So, when considering the total energy
> +consumed by CPUs, the extra cost of running that one task on a big core can be
> +smaller than the cost of raising the OPP on the little CPUs for all the other
> +tasks.
> +
> +The examples above would be nearly impossible to get right in a generic way, and
> +for all platforms, without knowing the cost of running at different OPPs on all
> +CPUs of the system. Thanks to its EM-based design, EAS should cope with them
> +correctly without too many troubles. However, in order to ensure a minimal
> +impact on throughput for high-utilization scenarios, EAS also implements another
> +mechanism called 'over-utilization'.
> +
> +
> +5. Over-utilization
> +-------------------
> +
> +From a general standpoint, the use-cases where EAS can help the most are those
> +involving a light/medium CPU utilization. Whenever long CPU-bound tasks are
> +being run, they will require all of the available CPU capacity, and there isn't
> +much that can be done by the scheduler to save energy without severly harming
                                                                  +e ^

> +throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
> +'over-utilized' as soon as they are used at more than 80% of their compute
> +capacity. As long as no CPUs are over-utilized in a root domain, load balancing
> +is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
                                ^ -s  :-)

> +the most energy efficient CPUs of the system more than the others if that can be
> +done without harming throughput. So, the load-balancer is disabled to prevent

Load-balancing being disabled in EAS mode it's quite an important thing
to notice IMHO. Maybe state it clearly somewhere above?

> +it from breaking the energy-efficient task placement found by EAS. It is safe to
> +do so when the system isn't overutilized since being below the 80% tipping point
> +implies that:
> +
> +    a. there is some idle time on all CPUs, so the utilization signals used by
> +       EAS are likely to accurately represent the 'size' of the various tasks
> +       in the system;
> +    b. all tasks should already be provided with enough CPU capacity,
> +       regardless of their nice values;
> +    c. since there is spare capacity all tasks must be blocking/sleeping
> +       regularly and balancing at wake-up is sufficient.
> +
> +As soon as one CPU goes above the 80% tipping point, at least one of the three
> +assumptions above becomes incorrect. In this scenario, the 'overutilized' flag
> +is raised for the entire root domain, EAS is disabled, and the load-balancer is
> +re-enabled. By doing so, the scheduler falls back onto load-based algorithms for
> +wake-up and load balance under CPU-bound conditions. This provides a better
> +respect of the nice values of tasks.
> +
> +Since the notion of overutilization largely relies on detecting whether or not
> +there is some idle time in the system, the CPU capacity 'stolen' by higher
> +(than CFS) scheduling classes (as well as IRQ) must be taken into account. As
> +such, the detection of overutilization accounts for the capacity used not only
> +by CFS tasks, but also by the other scheduling classes and IRQ.
> +
> +
> +6. Dependencies and requirements for EAS
> +----------------------------------------
> +
> +Energy Aware Scheduling depends on the CPUs of the system having specific
> +hardware properties and on other features of the kernel being enabled. This
> +section lists these dependencies and provides hints as to how they can be met.
> +
> +
> +  6.1 - Asymmetric CPU topology
> +
> +As mentioned in the introduction, EAS is only supported on platforms with
> +asymmetric CPU topologies for now. This requirement is checked at run-time by
> +looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
> +domains are built.
> +
> +The flag is set/cleared automatically by the scheduler topology code whenever
> +there are CPUs with different capacities in a root domain. The capacities of
> +CPUs are provided by arch-specific code through the arch_scale_cpu_capacity()
> +callback. As an example, arm and arm64 share an implementation of this callback
> +which uses a combination of CPUFreq data and device-tree bindings to compute the
> +capacity of CPUs (see drivers/base/arch_topology.c for more details).
> +
> +So, in order to use EAS on your platform your architecture must implement the

Mmm, using 'your' form is a change of 'style', no?

> +arch_scale_cpu_capacity() callback, and some of the CPUs must have a lower
> +capacity than others.
> +
> +Please note that EAS is not fundamentally incompatible with SMP, but no
> +significant savings on SMP platforms have been observed yet. This restriction
> +could be amended in the future if proven otherwise.
> +
> +
> +  6.2 - Energy Model presence
> +
> +EAS uses the EM of a platform to estimate the impact of scheduling decisions on
> +energy. So, your platform must provide power cost tables to the EM framework in
> +order to make EAS start. To do so, please refer to documentation of the
> +independent EM framework in Documentation/power/energy-model.txt.
> +
> +Please also note that the scheduling domains need to be re-built after the
> +EM has been registered in order to start EAS.
> +
> +
> +  6.3 - Energy Model complexity
> +
> +The task wake-up path is very latency-sensitive. When the EM of a platform is
> +too complex (too many CPUs, too many performance domains, too many performance
> +states, ...), the cost of using it in the wake-up path can become prohibitive.
> +The energy-aware wake-up algorithm has a complexity of:
> +
> +	C = Nd * (Nc + Ns)
> +
> +with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
> +total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
> +
> +A complexity check is performed at the root domain level, when scheduling
> +domains are built. EAS will not start on a root domain if its C happens to be
> +higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
> +time of writing).
> +
> +If you really want to use EAS but the complexity of your platform's Energy
> +Model is too high to be used with a single root domain, you're left with only
> +two possible options:
> +
> +    1. split your system into separate, smaller, root domains using exclusive
> +       cpusets and enable EAS locally on each of them. This option has the
> +       benefit to work out of the box but the drawback of preventing load
> +       balance between root domains, which can result in an unbalanced system
> +       overall;
> +    2. submit patches to reduce the complexity of the EAS wake-up algorithm,
> +       hence enabling it to cope with larger EMs in reasonable time.
> +
> +
> +  6.4 - Schedutil governor
> +
> +EAS tries to predict at which OPP will the CPUs be running in the close future
> +in order to estimate their energy consumption. To do so, it is assumed that OPPs
> +of CPUs follow their utilization.
> +
> +Although it is very difficult to provide hard guarantees regarding the accuracy
> +of this assumption in practice (because the hardware might not do what it is
> +told to do, for example), schedutil as opposed to other CPUFreq governors at
> +least _requests_ frequencies calculated using the utilization signals.
> +Consequently, the only sane governor to use together with EAS is schedutil,
> +because it is the only one providing some degree of consistency between
> +frequency requests and energy predictions.
> +
> +Using EAS with any other governor than schedutil is not supported.
> +
> +
> +  6.5 Scale-invariant utilization signals
> +
> +In order to make accurate prediction across CPUs and for all performance
> +states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
> +be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
> +callbacks.
> +
> +Using EAS on a platform that doesn't implement these two callbacks is not
> +supported.
> +
> +
> +  6.6 Multithreading (SMT)
> +
> +EAS in its current form is SMT unaware and is not able to leverage
> +multithreaded hardware to save energy. EAS considers threads as independent
> +CPUs, which can actually be counter-productive for both performance and energy.
> +
> +EAS on SMT is not supported.

Best,

- Juri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-17 15:51   ` Juri Lelli
@ 2019-01-18  9:16     ` Quentin Perret
  2019-01-18  9:57       ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Quentin Perret @ 2019-01-18  9:16 UTC (permalink / raw)
  To: Juri Lelli
  Cc: corbet, peterz, rjw, mingo, morten.rasmussen, qais.yousef,
	patrick.bellasi, dietmar.eggemann, linux-doc, linux-pm,
	linux-kernel

Hi Juri,

On Thursday 17 Jan 2019 at 16:51:17 (+0100), Juri Lelli wrote:
> On 10/01/19 11:05, Quentin Perret wrote:
[...]
> > +The idea behind introducing an EM is to allow the scheduler to evaluate the
> > +implications of its decisions rather than blindly applying energy-saving
> > +techniques that may have positive effects only on some platforms. At the same
> > +time, the EM must be as simple as possible to minimize the scheduler latency
> > +impact.
> > +
> > +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
> 
> Not sure if we want to remark the fact that EAS is looking at CFS tasks
> only ATM.

Oh, what's wrong about mentioning it ? I mean, it is a fact ATM ...

> > +for the scheduler to decide where a task should run (during wake-up), the EM
> > +is used to break the tie between several good CPU candidates and pick the one
> > +that is predicted to yield the best energy consumption without harming the
> > +system's throughput. The predictions made by EAS rely on specific elements of
> > +knowledge about the platform's topology, which include the 'capacity' of CPUs,
> 
> Add a reference to DT bindings docs defining 'capacity' (or define it
> somewhere)?

Right, I can mention this is defined in the next section. But are you
sure about the reference to the DT bindings ? They're arm-specific right ?
Maybe I can give that as an example or something ...

> > +and their respective energy costs.
> > +
> > +
> > +3. Topology information
> > +-----------------------
> > +
> > +EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
> > +differentiate CPUs with different computing throughput. The 'capacity' of a CPU
> > +represents the amount of work it can absorb when running at its highest
> > +frequency compared to the most capable CPU of the system. Capacity values are
> > +normalized in a 1024 range, and are comparable with the utilization signals of
> > +tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
> > +to capacity and utilization values, EAS is able to estimate how big/busy a
> > +task/CPU is, and to take this into consideration when evaluating performance vs
> > +energy trade-offs. The capacity of CPUs is provided via arch-specific code
> > +through the arch_scale_cpu_capacity() callback.
> 
> Ah, it's here, mmm, maybe still introduce it before (or point here from
> above) and still point to dt bindings doc?

Yep, I'll add a pointer above.


[...]
> > +the most energy efficient CPUs of the system more than the others if that can be
> > +done without harming throughput. So, the load-balancer is disabled to prevent
> 
> Load-balancing being disabled in EAS mode it's quite an important thing
> to notice IMHO. Maybe state it clearly somewhere above?

Right, this needs to be emphasized more. I'll mention it in the
introduction.


[...]
> > +So, in order to use EAS on your platform your architecture must implement the
> 
> Mmm, using 'your' form is a change of 'style', no?

Good point, I will try to unify this section with the rest of the doc.
There are loads of 'your platform' below too.


Thank you very much for the typos as well :-)
Quentin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-18  9:16     ` Quentin Perret
@ 2019-01-18  9:57       ` Rafael J. Wysocki
  2019-01-18 10:34         ` Quentin Perret
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-01-18  9:57 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Juri Lelli, Jonathan Corbet, Peter Zijlstra, Rafael J. Wysocki,
	Ingo Molnar, Morten Rasmussen, qais.yousef, Patrick Bellasi,
	Dietmar Eggemann, open list:DOCUMENTATION, Linux PM,
	Linux Kernel Mailing List

On Fri, Jan 18, 2019 at 10:16 AM Quentin Perret <quentin.perret@arm.com> wrote:
>
> Hi Juri,
>
> On Thursday 17 Jan 2019 at 16:51:17 (+0100), Juri Lelli wrote:
> > On 10/01/19 11:05, Quentin Perret wrote:
> [...]
> > > +The idea behind introducing an EM is to allow the scheduler to evaluate the
> > > +implications of its decisions rather than blindly applying energy-saving
> > > +techniques that may have positive effects only on some platforms. At the same
> > > +time, the EM must be as simple as possible to minimize the scheduler latency
> > > +impact.
> > > +
> > > +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
> >
> > Not sure if we want to remark the fact that EAS is looking at CFS tasks
> > only ATM.
>
> Oh, what's wrong about mentioning it ? I mean, it is a fact ATM ...

But it won't hurt to mention that it may cover other scheduling
classes in the future.  IOW, the scope limit is not fundamental.

> > > +for the scheduler to decide where a task should run (during wake-up), the EM
> > > +is used to break the tie between several good CPU candidates and pick the one
> > > +that is predicted to yield the best energy consumption without harming the
> > > +system's throughput. The predictions made by EAS rely on specific elements of
> > > +knowledge about the platform's topology, which include the 'capacity' of CPUs,
> >
> > Add a reference to DT bindings docs defining 'capacity' (or define it
> > somewhere)?
>
> Right, I can mention this is defined in the next section. But are you
> sure about the reference to the DT bindings ? They're arm-specific right ?
> Maybe I can give that as an example or something ...

Example sounds right.

You also can point to the section below from here.

Side note: If the doc is in the .rst format (which Peter won't like
I'm sure :-)), you can actually use cross-references in it and you get
a translation to an HTML doc (hosted at kernel.org) for free and the
cross-references become clickable links in that.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-18  9:57       ` Rafael J. Wysocki
@ 2019-01-18 10:34         ` Quentin Perret
  2019-01-18 10:58           ` Rafael J. Wysocki
  2019-01-18 12:34           ` Juri Lelli
  0 siblings, 2 replies; 22+ messages in thread
From: Quentin Perret @ 2019-01-18 10:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Juri Lelli, Jonathan Corbet, Peter Zijlstra, Rafael J. Wysocki,
	Ingo Molnar, Morten Rasmussen, qais.yousef, Patrick Bellasi,
	Dietmar Eggemann, open list:DOCUMENTATION, Linux PM,
	Linux Kernel Mailing List

Hi Rafael,

On Friday 18 Jan 2019 at 10:57:08 (+0100), Rafael J. Wysocki wrote:
> On Fri, Jan 18, 2019 at 10:16 AM Quentin Perret <quentin.perret@arm.com> wrote:
> >
> > Hi Juri,
> >
> > On Thursday 17 Jan 2019 at 16:51:17 (+0100), Juri Lelli wrote:
> > > On 10/01/19 11:05, Quentin Perret wrote:
> > [...]
> > > > +The idea behind introducing an EM is to allow the scheduler to evaluate the
> > > > +implications of its decisions rather than blindly applying energy-saving
> > > > +techniques that may have positive effects only on some platforms. At the same
> > > > +time, the EM must be as simple as possible to minimize the scheduler latency
> > > > +impact.
> > > > +
> > > > +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
> > >
> > > Not sure if we want to remark the fact that EAS is looking at CFS tasks
> > > only ATM.
> >
> > Oh, what's wrong about mentioning it ? I mean, it is a fact ATM ...
> 
> But it won't hurt to mention that it may cover other scheduling
> classes in the future.  IOW, the scope limit is not fundamental.

Agreed, I can do that.

> > > > +for the scheduler to decide where a task should run (during wake-up), the EM
> > > > +is used to break the tie between several good CPU candidates and pick the one
> > > > +that is predicted to yield the best energy consumption without harming the
> > > > +system's throughput. The predictions made by EAS rely on specific elements of
> > > > +knowledge about the platform's topology, which include the 'capacity' of CPUs,
> > >
> > > Add a reference to DT bindings docs defining 'capacity' (or define it
> > > somewhere)?
> >
> > Right, I can mention this is defined in the next section. But are you
> > sure about the reference to the DT bindings ? They're arm-specific right ?
> > Maybe I can give that as an example or something ...
> 
> Example sounds right.
> 
> You also can point to the section below from here.

Sounds good.

> Side note: If the doc is in the .rst format (which Peter won't like
> I'm sure :-)), you can actually use cross-references in it and you get
> a translation to an HTML doc (hosted at kernel.org) for free and the
> cross-references become clickable links in that.

Right, I personally don't mind the .rst format, but the existing files
in Documentation/power/ and Documentation/scheduler/ are good old txt
files so I just wanted to keep things consistent. I don't mind
converting to rst if necessary :-)

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-18 10:34         ` Quentin Perret
@ 2019-01-18 10:58           ` Rafael J. Wysocki
  2019-01-18 11:01             ` Rafael J. Wysocki
  2019-01-18 12:34           ` Juri Lelli
  1 sibling, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-01-18 10:58 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Rafael J. Wysocki, Juri Lelli, Jonathan Corbet, Peter Zijlstra,
	Rafael J. Wysocki, Ingo Molnar, Morten Rasmussen, qais.yousef,
	Patrick Bellasi, Dietmar Eggemann, open list:DOCUMENTATION,
	Linux PM, Linux Kernel Mailing List

On Fri, Jan 18, 2019 at 11:34 AM Quentin Perret <quentin.perret@arm.com> wrote:
>
> Hi Rafael,
>
> On Friday 18 Jan 2019 at 10:57:08 (+0100), Rafael J. Wysocki wrote:
> > On Fri, Jan 18, 2019 at 10:16 AM Quentin Perret <quentin.perret@arm.com> wrote:
> > >
> > > Hi Juri,
> > >
> > > On Thursday 17 Jan 2019 at 16:51:17 (+0100), Juri Lelli wrote:
> > > > On 10/01/19 11:05, Quentin Perret wrote:
> > > [...]
> > > > > +The idea behind introducing an EM is to allow the scheduler to evaluate the
> > > > > +implications of its decisions rather than blindly applying energy-saving
> > > > > +techniques that may have positive effects only on some platforms. At the same
> > > > > +time, the EM must be as simple as possible to minimize the scheduler latency
> > > > > +impact.
> > > > > +
> > > > > +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
> > > >
> > > > Not sure if we want to remark the fact that EAS is looking at CFS tasks
> > > > only ATM.
> > >
> > > Oh, what's wrong about mentioning it ? I mean, it is a fact ATM ...
> >
> > But it won't hurt to mention that it may cover other scheduling
> > classes in the future.  IOW, the scope limit is not fundamental.
>
> Agreed, I can do that.
>
> > > > > +for the scheduler to decide where a task should run (during wake-up), the EM
> > > > > +is used to break the tie between several good CPU candidates and pick the one
> > > > > +that is predicted to yield the best energy consumption without harming the
> > > > > +system's throughput. The predictions made by EAS rely on specific elements of
> > > > > +knowledge about the platform's topology, which include the 'capacity' of CPUs,
> > > >
> > > > Add a reference to DT bindings docs defining 'capacity' (or define it
> > > > somewhere)?
> > >
> > > Right, I can mention this is defined in the next section. But are you
> > > sure about the reference to the DT bindings ? They're arm-specific right ?
> > > Maybe I can give that as an example or something ...
> >
> > Example sounds right.
> >
> > You also can point to the section below from here.
>
> Sounds good.
>
> > Side note: If the doc is in the .rst format (which Peter won't like
> > I'm sure :-)), you can actually use cross-references in it and you get
> > a translation to an HTML doc (hosted at kernel.org) for free and the
> > cross-references become clickable links in that.
>
> Right, I personally don't mind the .rst format, but the existing files
> in Documentation/power/ and Documentation/scheduler/ are good old txt
> files so I just wanted to keep things consistent.

In fact, Documentation/power/ is under a slow on-going transition to
.rst (due to the benefits mentioned above).

> I don't mind converting to rst if necessary :-)

It is not necessary, but maybe worth considering.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-18 10:58           ` Rafael J. Wysocki
@ 2019-01-18 11:01             ` Rafael J. Wysocki
  2019-01-18 11:11               ` Quentin Perret
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-01-18 11:01 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Rafael J. Wysocki, Juri Lelli, Jonathan Corbet, Peter Zijlstra,
	Rafael J. Wysocki, Ingo Molnar, Morten Rasmussen, qais.yousef,
	Patrick Bellasi, Dietmar Eggemann, open list:DOCUMENTATION,
	Linux PM, Linux Kernel Mailing List

On Fri, Jan 18, 2019 at 11:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Jan 18, 2019 at 11:34 AM Quentin Perret <quentin.perret@arm.com> wrote:
> >
> > Hi Rafael,
> >
> > On Friday 18 Jan 2019 at 10:57:08 (+0100), Rafael J. Wysocki wrote:
> > > On Fri, Jan 18, 2019 at 10:16 AM Quentin Perret <quentin.perret@arm.com> wrote:
> > > >
> > > > Hi Juri,
> > > >
> > > > On Thursday 17 Jan 2019 at 16:51:17 (+0100), Juri Lelli wrote:
> > > > > On 10/01/19 11:05, Quentin Perret wrote:
> > > > [...]
> > > > > > +The idea behind introducing an EM is to allow the scheduler to evaluate the
> > > > > > +implications of its decisions rather than blindly applying energy-saving
> > > > > > +techniques that may have positive effects only on some platforms. At the same
> > > > > > +time, the EM must be as simple as possible to minimize the scheduler latency
> > > > > > +impact.
> > > > > > +
> > > > > > +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
> > > > >
> > > > > Not sure if we want to remark the fact that EAS is looking at CFS tasks
> > > > > only ATM.
> > > >
> > > > Oh, what's wrong about mentioning it ? I mean, it is a fact ATM ...
> > >
> > > But it won't hurt to mention that it may cover other scheduling
> > > classes in the future.  IOW, the scope limit is not fundamental.
> >
> > Agreed, I can do that.
> >
> > > > > > +for the scheduler to decide where a task should run (during wake-up), the EM
> > > > > > +is used to break the tie between several good CPU candidates and pick the one
> > > > > > +that is predicted to yield the best energy consumption without harming the
> > > > > > +system's throughput. The predictions made by EAS rely on specific elements of
> > > > > > +knowledge about the platform's topology, which include the 'capacity' of CPUs,
> > > > >
> > > > > Add a reference to DT bindings docs defining 'capacity' (or define it
> > > > > somewhere)?
> > > >
> > > > Right, I can mention this is defined in the next section. But are you
> > > > sure about the reference to the DT bindings ? They're arm-specific right ?
> > > > Maybe I can give that as an example or something ...
> > >
> > > Example sounds right.
> > >
> > > You also can point to the section below from here.
> >
> > Sounds good.
> >
> > > Side note: If the doc is in the .rst format (which Peter won't like
> > > I'm sure :-)), you can actually use cross-references in it and you get
> > > a translation to an HTML doc (hosted at kernel.org) for free and the
> > > cross-references become clickable links in that.
> >
> > Right, I personally don't mind the .rst format, but the existing files
> > in Documentation/power/ and Documentation/scheduler/ are good old txt
> > files so I just wanted to keep things consistent.
>
> In fact, Documentation/power/ is under a slow on-going transition to
> .rst (due to the benefits mentioned above).
>
> > I don't mind converting to rst if necessary :-)
>
> It is not necessary, but maybe worth considering.

That said, as this is targeted at Documentation/scheduler/, being
consistent with the other material in there is probably more
important.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-18 11:01             ` Rafael J. Wysocki
@ 2019-01-18 11:11               ` Quentin Perret
  2019-01-18 11:20                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Quentin Perret @ 2019-01-18 11:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Juri Lelli, Jonathan Corbet, Peter Zijlstra, Rafael J. Wysocki,
	Ingo Molnar, Morten Rasmussen, qais.yousef, Patrick Bellasi,
	Dietmar Eggemann, open list:DOCUMENTATION, Linux PM,
	Linux Kernel Mailing List

On Friday 18 Jan 2019 at 12:01:35 (+0100), Rafael J. Wysocki wrote:
> On Fri, Jan 18, 2019 at 11:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Fri, Jan 18, 2019 at 11:34 AM Quentin Perret <quentin.perret@arm.com> wrote:
> > >
> > > Hi Rafael,
> > >
> > > On Friday 18 Jan 2019 at 10:57:08 (+0100), Rafael J. Wysocki wrote:
> > > > On Fri, Jan 18, 2019 at 10:16 AM Quentin Perret <quentin.perret@arm.com> wrote:
> > > > >
> > > > > Hi Juri,
> > > > >
> > > > > On Thursday 17 Jan 2019 at 16:51:17 (+0100), Juri Lelli wrote:
> > > > > > On 10/01/19 11:05, Quentin Perret wrote:
> > > > > [...]
> > > > > > > +The idea behind introducing an EM is to allow the scheduler to evaluate the
> > > > > > > +implications of its decisions rather than blindly applying energy-saving
> > > > > > > +techniques that may have positive effects only on some platforms. At the same
> > > > > > > +time, the EM must be as simple as possible to minimize the scheduler latency
> > > > > > > +impact.
> > > > > > > +
> > > > > > > +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
> > > > > >
> > > > > > Not sure if we want to remark the fact that EAS is looking at CFS tasks
> > > > > > only ATM.
> > > > >
> > > > > Oh, what's wrong about mentioning it ? I mean, it is a fact ATM ...
> > > >
> > > > But it won't hurt to mention that it may cover other scheduling
> > > > classes in the future.  IOW, the scope limit is not fundamental.
> > >
> > > Agreed, I can do that.
> > >
> > > > > > > +for the scheduler to decide where a task should run (during wake-up), the EM
> > > > > > > +is used to break the tie between several good CPU candidates and pick the one
> > > > > > > +that is predicted to yield the best energy consumption without harming the
> > > > > > > +system's throughput. The predictions made by EAS rely on specific elements of
> > > > > > > +knowledge about the platform's topology, which include the 'capacity' of CPUs,
> > > > > >
> > > > > > Add a reference to DT bindings docs defining 'capacity' (or define it
> > > > > > somewhere)?
> > > > >
> > > > > Right, I can mention this is defined in the next section. But are you
> > > > > sure about the reference to the DT bindings ? They're arm-specific right ?
> > > > > Maybe I can give that as an example or something ...
> > > >
> > > > Example sounds right.
> > > >
> > > > You also can point to the section below from here.
> > >
> > > Sounds good.
> > >
> > > > Side note: If the doc is in the .rst format (which Peter won't like
> > > > I'm sure :-)), you can actually use cross-references in it and you get
> > > > a translation to an HTML doc (hosted at kernel.org) for free and the
> > > > cross-references become clickable links in that.
> > >
> > > Right, I personally don't mind the .rst format, but the existing files
> > > in Documentation/power/ and Documentation/scheduler/ are good old txt
> > > files so I just wanted to keep things consistent.
> >
> > In fact, Documentation/power/ is under a slow on-going transition to
> > .rst (due to the benefits mentioned above).
> >
> > > I don't mind converting to rst if necessary :-)
> >
> > It is not necessary, but maybe worth considering.
> 
> That said, as this is targeted at Documentation/scheduler/, being
> consistent with the other material in there is probably more
> important.

Right. Patch 01/02 is targeted at Documentation/power/ though. So if
that makes your life easier I can turn that one into a .rst file, no
problem at all.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-18 11:11               ` Quentin Perret
@ 2019-01-18 11:20                 ` Rafael J. Wysocki
  2019-01-18 11:26                   ` Quentin Perret
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-01-18 11:20 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Rafael J. Wysocki, Juri Lelli, Jonathan Corbet, Peter Zijlstra,
	Rafael J. Wysocki, Ingo Molnar, Morten Rasmussen, qais.yousef,
	Patrick Bellasi, Dietmar Eggemann, open list:DOCUMENTATION,
	Linux PM, Linux Kernel Mailing List

On Fri, Jan 18, 2019 at 12:11 PM Quentin Perret <quentin.perret@arm.com> wrote:
>
> On Friday 18 Jan 2019 at 12:01:35 (+0100), Rafael J. Wysocki wrote:
> > On Fri, Jan 18, 2019 at 11:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Fri, Jan 18, 2019 at 11:34 AM Quentin Perret <quentin.perret@arm.com> wrote:
> > > >
> > > > Hi Rafael,
> > > >
> > > > On Friday 18 Jan 2019 at 10:57:08 (+0100), Rafael J. Wysocki wrote:
> > > > > On Fri, Jan 18, 2019 at 10:16 AM Quentin Perret <quentin.perret@arm.com> wrote:
> > > > > >
> > > > > > Hi Juri,
> > > > > >
> > > > > > On Thursday 17 Jan 2019 at 16:51:17 (+0100), Juri Lelli wrote:
> > > > > > > On 10/01/19 11:05, Quentin Perret wrote:
> > > > > > [...]
> > > > > > > > +The idea behind introducing an EM is to allow the scheduler to evaluate the
> > > > > > > > +implications of its decisions rather than blindly applying energy-saving
> > > > > > > > +techniques that may have positive effects only on some platforms. At the same
> > > > > > > > +time, the EM must be as simple as possible to minimize the scheduler latency
> > > > > > > > +impact.
> > > > > > > > +
> > > > > > > > +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
> > > > > > >
> > > > > > > Not sure if we want to remark the fact that EAS is looking at CFS tasks
> > > > > > > only ATM.
> > > > > >
> > > > > > Oh, what's wrong about mentioning it ? I mean, it is a fact ATM ...
> > > > >
> > > > > But it won't hurt to mention that it may cover other scheduling
> > > > > classes in the future.  IOW, the scope limit is not fundamental.
> > > >
> > > > Agreed, I can do that.
> > > >
> > > > > > > > +for the scheduler to decide where a task should run (during wake-up), the EM
> > > > > > > > +is used to break the tie between several good CPU candidates and pick the one
> > > > > > > > +that is predicted to yield the best energy consumption without harming the
> > > > > > > > +system's throughput. The predictions made by EAS rely on specific elements of
> > > > > > > > +knowledge about the platform's topology, which include the 'capacity' of CPUs,
> > > > > > >
> > > > > > > Add a reference to DT bindings docs defining 'capacity' (or define it
> > > > > > > somewhere)?
> > > > > >
> > > > > > Right, I can mention this is defined in the next section. But are you
> > > > > > sure about the reference to the DT bindings ? They're arm-specific right ?
> > > > > > Maybe I can give that as an example or something ...
> > > > >
> > > > > Example sounds right.
> > > > >
> > > > > You also can point to the section below from here.
> > > >
> > > > Sounds good.
> > > >
> > > > > Side note: If the doc is in the .rst format (which Peter won't like
> > > > > I'm sure :-)), you can actually use cross-references in it and you get
> > > > > a translation to an HTML doc (hosted at kernel.org) for free and the
> > > > > cross-references become clickable links in that.
> > > >
> > > > Right, I personally don't mind the .rst format, but the existing files
> > > > in Documentation/power/ and Documentation/scheduler/ are good old txt
> > > > files so I just wanted to keep things consistent.
> > >
> > > In fact, Documentation/power/ is under a slow on-going transition to
> > > .rst (due to the benefits mentioned above).
> > >
> > > > I don't mind converting to rst if necessary :-)
> > >
> > > It is not necessary, but maybe worth considering.
> >
> > That said, as this is targeted at Documentation/scheduler/, being
> > consistent with the other material in there is probably more
> > important.
>
> Right. Patch 01/02 is targeted at Documentation/power/ though. So if
> that makes your life easier I can turn that one into a .rst file, no
> problem at all.

Yes, I'd prefer it that way.  And please put it into
Documentation/driver-api/pm/.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-18 11:20                 ` Rafael J. Wysocki
@ 2019-01-18 11:26                   ` Quentin Perret
  0 siblings, 0 replies; 22+ messages in thread
From: Quentin Perret @ 2019-01-18 11:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Juri Lelli, Jonathan Corbet, Peter Zijlstra, Rafael J. Wysocki,
	Ingo Molnar, Morten Rasmussen, qais.yousef, Patrick Bellasi,
	Dietmar Eggemann, open list:DOCUMENTATION, Linux PM,
	Linux Kernel Mailing List

On Friday 18 Jan 2019 at 12:20:57 (+0100), Rafael J. Wysocki wrote:
> On Fri, Jan 18, 2019 at 12:11 PM Quentin Perret <quentin.perret@arm.com> wrote:
> >
> > On Friday 18 Jan 2019 at 12:01:35 (+0100), Rafael J. Wysocki wrote:
> > > On Fri, Jan 18, 2019 at 11:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Fri, Jan 18, 2019 at 11:34 AM Quentin Perret <quentin.perret@arm.com> wrote:
> > > > >
> > > > > Hi Rafael,
> > > > >
> > > > > On Friday 18 Jan 2019 at 10:57:08 (+0100), Rafael J. Wysocki wrote:
> > > > > > On Fri, Jan 18, 2019 at 10:16 AM Quentin Perret <quentin.perret@arm.com> wrote:
> > > > > > >
> > > > > > > Hi Juri,
> > > > > > >
> > > > > > > On Thursday 17 Jan 2019 at 16:51:17 (+0100), Juri Lelli wrote:
> > > > > > > > On 10/01/19 11:05, Quentin Perret wrote:
> > > > > > > [...]
> > > > > > > > > +The idea behind introducing an EM is to allow the scheduler to evaluate the
> > > > > > > > > +implications of its decisions rather than blindly applying energy-saving
> > > > > > > > > +techniques that may have positive effects only on some platforms. At the same
> > > > > > > > > +time, the EM must be as simple as possible to minimize the scheduler latency
> > > > > > > > > +impact.
> > > > > > > > > +
> > > > > > > > > +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
> > > > > > > >
> > > > > > > > Not sure if we want to remark the fact that EAS is looking at CFS tasks
> > > > > > > > only ATM.
> > > > > > >
> > > > > > > Oh, what's wrong about mentioning it ? I mean, it is a fact ATM ...
> > > > > >
> > > > > > But it won't hurt to mention that it may cover other scheduling
> > > > > > classes in the future.  IOW, the scope limit is not fundamental.
> > > > >
> > > > > Agreed, I can do that.
> > > > >
> > > > > > > > > +for the scheduler to decide where a task should run (during wake-up), the EM
> > > > > > > > > +is used to break the tie between several good CPU candidates and pick the one
> > > > > > > > > +that is predicted to yield the best energy consumption without harming the
> > > > > > > > > +system's throughput. The predictions made by EAS rely on specific elements of
> > > > > > > > > +knowledge about the platform's topology, which include the 'capacity' of CPUs,
> > > > > > > >
> > > > > > > > Add a reference to DT bindings docs defining 'capacity' (or define it
> > > > > > > > somewhere)?
> > > > > > >
> > > > > > > Right, I can mention this is defined in the next section. But are you
> > > > > > > sure about the reference to the DT bindings ? They're arm-specific right ?
> > > > > > > Maybe I can give that as an example or something ...
> > > > > >
> > > > > > Example sounds right.
> > > > > >
> > > > > > You also can point to the section below from here.
> > > > >
> > > > > Sounds good.
> > > > >
> > > > > > Side note: If the doc is in the .rst format (which Peter won't like
> > > > > > I'm sure :-)), you can actually use cross-references in it and you get
> > > > > > a translation to an HTML doc (hosted at kernel.org) for free and the
> > > > > > cross-references become clickable links in that.
> > > > >
> > > > > Right, I personally don't mind the .rst format, but the existing files
> > > > > in Documentation/power/ and Documentation/scheduler/ are good old txt
> > > > > files so I just wanted to keep things consistent.
> > > >
> > > > In fact, Documentation/power/ is under a slow on-going transition to
> > > > .rst (due to the benefits mentioned above).
> > > >
> > > > > I don't mind converting to rst if necessary :-)
> > > >
> > > > It is not necessary, but maybe worth considering.
> > >
> > > That said, as this is targeted at Documentation/scheduler/, being
> > > consistent with the other material in there is probably more
> > > important.
> >
> > Right. Patch 01/02 is targeted at Documentation/power/ though. So if
> > that makes your life easier I can turn that one into a .rst file, no
> > problem at all.
> 
> Yes, I'd prefer it that way.  And please put it into
> Documentation/driver-api/pm/.

Will do in v2.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] sched: Document Energy Aware Scheduling
  2019-01-18 10:34         ` Quentin Perret
  2019-01-18 10:58           ` Rafael J. Wysocki
@ 2019-01-18 12:34           ` Juri Lelli
  1 sibling, 0 replies; 22+ messages in thread
From: Juri Lelli @ 2019-01-18 12:34 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Rafael J. Wysocki, Jonathan Corbet, Peter Zijlstra,
	Rafael J. Wysocki, Ingo Molnar, Morten Rasmussen, qais.yousef,
	Patrick Bellasi, Dietmar Eggemann, open list:DOCUMENTATION,
	Linux PM, Linux Kernel Mailing List

Hi,

On 18/01/19 10:34, Quentin Perret wrote:
> Hi Rafael,
> 
> On Friday 18 Jan 2019 at 10:57:08 (+0100), Rafael J. Wysocki wrote:
> > On Fri, Jan 18, 2019 at 10:16 AM Quentin Perret <quentin.perret@arm.com> wrote:
> > >
> > > Hi Juri,
> > >
> > > On Thursday 17 Jan 2019 at 16:51:17 (+0100), Juri Lelli wrote:
> > > > On 10/01/19 11:05, Quentin Perret wrote:
> > > [...]
> > > > > +The idea behind introducing an EM is to allow the scheduler to evaluate the
> > > > > +implications of its decisions rather than blindly applying energy-saving
> > > > > +techniques that may have positive effects only on some platforms. At the same
> > > > > +time, the EM must be as simple as possible to minimize the scheduler latency
> > > > > +impact.
> > > > > +
> > > > > +In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
> > > >
> > > > Not sure if we want to remark the fact that EAS is looking at CFS tasks
> > > > only ATM.
> > >
> > > Oh, what's wrong about mentioning it ? I mean, it is a fact ATM ...
> > 
> > But it won't hurt to mention that it may cover other scheduling
> > classes in the future.  IOW, the scope limit is not fundamental.
> 
> Agreed, I can do that.

Oh, sorry, bad phrasing from my side. I meant that we should probably
state clearly somewhere that EAS deals with CFS only ATM, but extending
it to other classes (DEADLINE in particular) makes certainly sense and
people are welcome to experiment with that.

So, yeah, I agree with both of you. :-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [tip:sched/core] PM/EM: Document the Energy Model framework
  2019-01-10 11:05 ` [PATCH 1/2] PM / EM: Document the Energy Model framework Quentin Perret
  2019-01-17 14:47   ` Juri Lelli
@ 2019-01-21 11:37   ` tip-bot for Quentin Perret
  2019-01-21 13:53   ` tip-bot for Quentin Perret
  2019-01-27 11:36   ` tip-bot for Quentin Perret
  3 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Quentin Perret @ 2019-01-21 11:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, quentin.perret, peterz, torvalds, tglx, hpa

Commit-ID:  c8c12bb2c6c1808b1d9a47bb4b260073e5caaf2f
Gitweb:     https://git.kernel.org/tip/c8c12bb2c6c1808b1d9a47bb4b260073e5caaf2f
Author:     Quentin Perret <quentin.perret@arm.com>
AuthorDate: Thu, 10 Jan 2019 11:05:45 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 21 Jan 2019 11:27:56 +0100

PM/EM: Document the Energy Model framework

Introduce a documentation file summarizing the key design points and
APIs of the newly introduced Energy Model framework.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: dietmar.eggemann@arm.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: qais.yousef@arm.com
Cc: rjw@rjwysocki.net
Link: https://lkml.kernel.org/r/20190110110546.8101-2-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/power/energy-model.txt | 144 +++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/Documentation/power/energy-model.txt b/Documentation/power/energy-model.txt
new file mode 100644
index 000000000000..a2b0ae4c76bd
--- /dev/null
+++ b/Documentation/power/energy-model.txt
@@ -0,0 +1,144 @@
+                           ====================
+                           Energy Model of CPUs
+                           ====================
+
+1. Overview
+-----------
+
+The Energy Model (EM) framework serves as an interface between drivers knowing
+the power consumed by CPUs at various performance levels, and the kernel
+subsystems willing to use that information to make energy-aware decisions.
+
+The source of the information about the power consumed by CPUs can vary greatly
+from one platform to another. These power costs can be estimated using
+devicetree data in some cases. In others, the firmware will know better.
+Alternatively, userspace might be best positioned. And so on. In order to avoid
+each and every client subsystem to re-implement support for each and every
+possible source of information on its own, the EM framework intervenes as an
+abstraction layer which standardizes the format of power cost tables in the
+kernel, hence enabling to avoid redundant work.
+
+The figure below depicts an example of drivers (Arm-specific here, but the
+approach is applicable to any architecture) providing power costs to the EM
+framework, and interested clients reading the data from it.
+
+       +---------------+  +-----------------+  +---------------+
+       | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
+       +---------------+  +-----------------+  +---------------+
+               |                   | em_pd_energy()    |
+               |                   | em_cpu_get()      |
+               +---------+         |         +---------+
+                         |         |         |
+                         v         v         v
+                        +---------------------+
+                        |    Energy Model     |
+                        |     Framework       |
+                        +---------------------+
+                           ^       ^       ^
+                           |       |       | em_register_perf_domain()
+                +----------+       |       +---------+
+                |                  |                 |
+        +---------------+  +---------------+  +--------------+
+        |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
+        +---------------+  +---------------+  +--------------+
+                ^                  ^                 ^
+                |                  |                 |
+        +--------------+   +---------------+  +--------------+
+        | Device Tree  |   |   Firmware    |  |      ?       |
+        +--------------+   +---------------+  +--------------+
+
+The EM framework manages power cost tables per 'performance domain' in the
+system. A performance domain is a group of CPUs whose performance is scaled
+together. Performance domains generally have a 1-to-1 mapping with CPUFreq
+policies. All CPUs in a performance domain are required to have the same
+micro-architecture. CPUs in different performance domains can have different
+micro-architectures.
+
+
+2. Core APIs
+------------
+
+  2.1 Config options
+
+CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
+
+
+  2.2 Registration of performance domains
+
+Drivers are expected to register performance domains into the EM framework by
+calling the following API:
+
+  int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
+			      struct em_data_callback *cb);
+
+Drivers must specify the CPUs of the performance domains using the cpumask
+argument, and provide a callback function returning <frequency, power> tuples
+for each capacity state. The callback function provided by the driver is free
+to fetch data from any relevant location (DT, firmware, ...), and by any mean
+deemed necessary. See Section 3. for an example of driver implementing this
+callback, and kernel/power/energy_model.c for further documentation on this
+API.
+
+
+  2.3 Accessing performance domains
+
+Subsystems interested in the energy model of a CPU can retrieve it using the
+em_cpu_get() API. The energy model tables are allocated once upon creation of
+the performance domains, and kept in memory untouched.
+
+The energy consumed by a performance domain can be estimated using the
+em_pd_energy() API. The estimation is performed assuming that the schedutil
+CPUfreq governor is in use.
+
+More details about the above APIs can be found in include/linux/energy_model.h.
+
+
+3. Example driver
+-----------------
+
+This section provides a simple example of a CPUFreq driver registering a
+performance domain in the Energy Model framework using the (fake) 'foo'
+protocol. The driver implements an est_power() function to be provided to the
+EM framework.
+
+ -> drivers/cpufreq/foo_cpufreq.c
+
+01	static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
+02	{
+03		long freq, power;
+04
+05		/* Use the 'foo' protocol to ceil the frequency */
+06		freq = foo_get_freq_ceil(cpu, *KHz);
+07		if (freq < 0);
+08			return freq;
+09
+10		/* Estimate the power cost for the CPU at the relevant freq. */
+11		power = foo_estimate_power(cpu, freq);
+12		if (power < 0);
+13			return power;
+14
+15		/* Return the values to the EM framework */
+16		*mW = power;
+17		*KHz = freq;
+18
+19		return 0;
+20	}
+21
+22	static int foo_cpufreq_init(struct cpufreq_policy *policy)
+23	{
+24		struct em_data_callback em_cb = EM_DATA_CB(est_power);
+25		int nr_opp, ret;
+26
+27		/* Do the actual CPUFreq init work ... */
+28		ret = do_foo_cpufreq_init(policy);
+29		if (ret)
+30			return ret;
+31
+32		/* Find the number of OPPs for this policy */
+33		nr_opp = foo_get_nr_opp(policy);
+34
+35		/* And register the new performance domain */
+36		em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
+37
+38	        return 0;
+39	}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:sched/core] sched/doc: Document Energy Aware Scheduling
  2019-01-10 11:05 ` [PATCH 2/2] sched: Document Energy Aware Scheduling Quentin Perret
  2019-01-17 15:51   ` Juri Lelli
@ 2019-01-21 11:38   ` tip-bot for Quentin Perret
  2019-01-21 13:54   ` tip-bot for Quentin Perret
  2019-01-27 11:37   ` tip-bot for Quentin Perret
  3 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Quentin Perret @ 2019-01-21 11:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, qais.yousef, mingo, tglx, quentin.perret, torvalds,
	morten.rasmussen, peterz, linux-kernel

Commit-ID:  cd0638a84e77a456d4bbab28368d31b0bdb7eeb3
Gitweb:     https://git.kernel.org/tip/cd0638a84e77a456d4bbab28368d31b0bdb7eeb3
Author:     Quentin Perret <quentin.perret@arm.com>
AuthorDate: Thu, 10 Jan 2019 11:05:46 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 21 Jan 2019 11:27:57 +0100

sched/doc: Document Energy Aware Scheduling

Add some documentation detailing the main design points of EAS, as well
as a list of its dependencies.

Parts of this documentation are taken from Morten Rasmussen's original
EAS posting: https://lkml.org/lkml/2015/7/7/754

Co-authored-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: dietmar.eggemann@arm.com
Cc: patrick.bellasi@arm.com
Cc: rjw@rjwysocki.net
Link: https://lkml.kernel.org/r/20190110110546.8101-3-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/scheduler/sched-energy.txt | 425 +++++++++++++++++++++++++++++++
 1 file changed, 425 insertions(+)

diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
new file mode 100644
index 000000000000..197d81f4b836
--- /dev/null
+++ b/Documentation/scheduler/sched-energy.txt
@@ -0,0 +1,425 @@
+			   =======================
+			   Energy Aware Scheduling
+			   =======================
+
+1. Introduction
+---------------
+
+Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict
+the impact of its decisions on the energy consumed by CPUs. EAS relies on an
+Energy Model (EM) of the CPUs to select an energy efficient CPU for each task,
+with a minimal impact on throughput. This document aims at providing an
+introduction on how EAS works, what are the main design decisions behind it, and
+details what is needed to get it to run.
+
+Before going any further, please note that at the time of writing:
+
+   /!\ EAS does not support platforms with symmetric CPU topologies /!\
+
+EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
+because this is where the potential for saving energy through scheduling is
+the highest.
+
+The actual EM used by EAS is _not_ maintained by the scheduler, but by a
+dedicated framework. For details about this framework and what it provides,
+please refer to its documentation (see Documentation/power/energy-model.txt).
+
+
+2. Background and Terminology
+-----------------------------
+
+To make it clear from the start:
+ - energy = [joule] (resource like a battery on powered devices)
+ - power = energy/time = [joule/second] = [watt]
+
+The goal of EAS is to minimize energy, while still getting the job done. That
+is, we want to maximize:
+
+	performance [inst/s]
+	--------------------
+	    power [W]
+
+which is equivalent to minimizing:
+
+	energy [J]
+	-----------
+	instruction
+
+while still getting 'good' performance. It is essentially an alternative
+optimization objective to the current performance-only objective for the
+scheduler. This alternative considers two objectives: energy-efficiency and
+performance.
+
+The idea behind introducing an EM is to allow the scheduler to evaluate the
+implications of its decisions rather than blindly applying energy-saving
+techniques that may have positive effects only on some platforms. At the same
+time, the EM must be as simple as possible to minimize the scheduler latency
+impact.
+
+In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
+for the scheduler to decide where a task should run (during wake-up), the EM
+is used to break the tie between several good CPU candidates and pick the one
+that is predicted to yield the best energy consumption without harming the
+system's throughput. The predictions made by EAS rely on specific elements of
+knowledge about the platform's topology, which include the 'capacity' of CPUs,
+and their respective energy costs.
+
+
+3. Topology information
+-----------------------
+
+EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
+differentiate CPUs with different computing throughput. The 'capacity' of a CPU
+represents the amount of work it can absorb when running at its highest
+frequency compared to the most capable CPU of the system. Capacity values are
+normalized in a 1024 range, and are comparable with the utilization signals of
+tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
+to capacity and utilization values, EAS is able to estimate how big/busy a
+task/CPU is, and to take this into consideration when evaluating performance vs
+energy trade-offs. The capacity of CPUs is provided via arch-specific code
+through the arch_scale_cpu_capacity() callback.
+
+The rest of platform knowledge used by EAS is directly read from the Energy
+Model (EM) framework. The EM of a platform is composed of a power cost table
+per 'performance domain' in the system (see Documentation/power/energy-model.txt
+for futher details about performance domains).
+
+The scheduler manages references to the EM objects in the topology code when the
+scheduling domains are built, or re-built. For each root domain (rd), the
+scheduler maintains a singly linked list of all performance domains intersecting
+the current rd->span. Each node in the list contains a pointer to a struct
+em_perf_domain as provided by the EM framework.
+
+The lists are attached to the root domains in order to cope with exclusive
+cpuset configurations. Since the boundaries of exclusive cpusets do not
+necessarily match those of performance domains, the lists of different root
+domains can contain duplicate elements.
+
+Example 1.
+    Let us consider a platform with 12 CPUs, split in 3 performance domains
+    (pd0, pd4 and pd8), organized as follows:
+
+	          CPUs:   0 1 2 3 4 5 6 7 8 9 10 11
+	          PDs:   |--pd0--|--pd4--|---pd8---|
+	          RDs:   |----rd1----|-----rd2-----|
+
+    Now, consider that userspace decided to split the system with two
+    exclusive cpusets, hence creating two independent root domains, each
+    containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the
+    above figure. Since pd4 intersects with both rd1 and rd2, it will be
+    present in the linked list '->pd' attached to each of them:
+       * rd1->pd: pd0 -> pd4
+       * rd2->pd: pd4 -> pd8
+
+    Please note that the scheduler will create two duplicate list nodes for
+    pd4 (one for each list). However, both just hold a pointer to the same
+    shared data structure of the EM framework.
+
+Since the access to these lists can happen concurrently with hotplug and other
+things, they are protected by RCU, like the rest of topology structures
+manipulated by the scheduler.
+
+EAS also maintains a static key (sched_energy_present) which is enabled when at
+least one root domain meets all conditions for EAS to start. Those conditions
+are summarized in Section 6.
+
+
+4. Energy-Aware task placement
+------------------------------
+
+EAS overrides the CFS task wake-up balancing code. It uses the EM of the
+platform and the PELT signals to choose an energy-efficient target CPU during
+wake-up balance. When EAS is enabled, select_task_rq_fair() calls
+find_energy_efficient_cpu() to do the placement decision. This function looks
+for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in
+each performance domain since it is the one which will allow us to keep the
+frequency the lowest. Then, the function checks if placing the task there could
+save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran
+in its previous activation.
+
+find_energy_efficient_cpu() uses compute_energy() to estimate what will be the
+energy consumed by the system if the waking task was migrated. compute_energy()
+looks at the current utilization landscape of the CPUs and adjusts it to
+'simulate' the task migration. The EM framework provides the em_pd_energy() API
+which computes the expected energy consumption of each performance domain for
+the given utilization landscape.
+
+An example of energy-optimized task placement decision is detailed below.
+
+Example 2.
+    Let us consider a (fake) platform with 2 independent performance domains
+    composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3
+    are big.
+
+    The scheduler must decide where to place a task P whose util_avg = 200
+    and prev_cpu = 0.
+
+    The current utilization landscape of the CPUs is depicted on the graph
+    below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively
+    Each performance domain has three Operating Performance Points (OPPs).
+    The CPU capacity and power cost associated with each OPP is listed in
+    the Energy Model table. The util_avg of P is shown on the figures
+    below as 'PP'.
+
+    CPU util.
+      1024                 - - - - - - -              Energy Model
+                                               +-----------+-------------+
+                                               |  Little   |     Big     |
+       768                 =============       +-----+-----+------+------+
+                                               | Cap | Pwr | Cap  | Pwr  |
+                                               +-----+-----+------+------+
+       512  ===========    - ##- - - - -       | 170 | 50  | 512  | 400  |
+                             ##     ##         | 341 | 150 | 768  | 800  |
+       341  -PP - - - -      ##     ##         | 512 | 300 | 1024 | 1700 |
+             PP              ##     ##         +-----+-----+------+------+
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+      Current OPP: =====       Other OPP: - - -     util_avg (100 each): ##
+
+
+    find_energy_efficient_cpu() will first look for the CPUs with the
+    maximum spare capacity in the two performance domains. In this example,
+    CPU1 and CPU3. Then it will estimate the energy of the system if P was
+    placed on either of them, and check if that would save some energy
+    compared to leaving P on CPU0. EAS assumes that OPPs follow utilization
+    (which is coherent with the behaviour of the schedutil CPUFreq
+    governor, see Section 6. for more details on this topic).
+
+    Case 1. P is migrated to CPU1
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 300 / 341 * 150 = 131
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1364
+       341  ===========      ##     ##
+                    PP       ##     ##
+       170  -## - - PP-      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    Case 2. P is migrated to CPU3
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 100 / 341 * 150 = 43
+                                    PP       * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - -PP -     * CPU3: 700 / 768 * 800 = 729
+                             ##     ##          => total_energy = 1485
+       341  ===========      ##     ##
+                             ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    Case 3. P stays on prev_cpu / CPU 0
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 400 / 512 * 300 = 234
+                                             * CPU1: 100 / 512 * 300 = 58
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  ===========    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1437
+       341  -PP - - - -      ##     ##
+             PP              ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    From these calculations, the Case 1 has the lowest total energy. So CPU 1
+    is be the best candidate from an energy-efficiency standpoint.
+
+Big CPUs are generally more power hungry than the little ones and are thus used
+mainly when a task doesn't fit the littles. However, little CPUs aren't always
+necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
+of the little CPUs can be less energy-efficient than the lowest OPPs of the
+bigs, for example. So, if the little CPUs happen to have enough utilization at
+a specific point in time, a small task waking up at that moment could be better
+of executing on the big side in order to save energy, even though it would fit
+on the little side.
+
+And even in the case where all OPPs of the big CPUs are less energy-efficient
+than those of the little, using the big CPUs for a small task might still, under
+specific conditions, save energy. Indeed, placing a task on a little CPU can
+result in raising the OPP of the entire performance domain, and that will
+increase the cost of the tasks already running there. If the waking task is
+placed on a big CPU, its own execution cost might be higher than if it was
+running on a little, but it won't impact the other tasks of the little CPUs
+which will keep running at a lower OPP. So, when considering the total energy
+consumed by CPUs, the extra cost of running that one task on a big core can be
+smaller than the cost of raising the OPP on the little CPUs for all the other
+tasks.
+
+The examples above would be nearly impossible to get right in a generic way, and
+for all platforms, without knowing the cost of running at different OPPs on all
+CPUs of the system. Thanks to its EM-based design, EAS should cope with them
+correctly without too many troubles. However, in order to ensure a minimal
+impact on throughput for high-utilization scenarios, EAS also implements another
+mechanism called 'over-utilization'.
+
+
+5. Over-utilization
+-------------------
+
+From a general standpoint, the use-cases where EAS can help the most are those
+involving a light/medium CPU utilization. Whenever long CPU-bound tasks are
+being run, they will require all of the available CPU capacity, and there isn't
+much that can be done by the scheduler to save energy without severly harming
+throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
+'over-utilized' as soon as they are used at more than 80% of their compute
+capacity. As long as no CPUs are over-utilized in a root domain, load balancing
+is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
+the most energy efficient CPUs of the system more than the others if that can be
+done without harming throughput. So, the load-balancer is disabled to prevent
+it from breaking the energy-efficient task placement found by EAS. It is safe to
+do so when the system isn't overutilized since being below the 80% tipping point
+implies that:
+
+    a. there is some idle time on all CPUs, so the utilization signals used by
+       EAS are likely to accurately represent the 'size' of the various tasks
+       in the system;
+    b. all tasks should already be provided with enough CPU capacity,
+       regardless of their nice values;
+    c. since there is spare capacity all tasks must be blocking/sleeping
+       regularly and balancing at wake-up is sufficient.
+
+As soon as one CPU goes above the 80% tipping point, at least one of the three
+assumptions above becomes incorrect. In this scenario, the 'overutilized' flag
+is raised for the entire root domain, EAS is disabled, and the load-balancer is
+re-enabled. By doing so, the scheduler falls back onto load-based algorithms for
+wake-up and load balance under CPU-bound conditions. This provides a better
+respect of the nice values of tasks.
+
+Since the notion of overutilization largely relies on detecting whether or not
+there is some idle time in the system, the CPU capacity 'stolen' by higher
+(than CFS) scheduling classes (as well as IRQ) must be taken into account. As
+such, the detection of overutilization accounts for the capacity used not only
+by CFS tasks, but also by the other scheduling classes and IRQ.
+
+
+6. Dependencies and requirements for EAS
+----------------------------------------
+
+Energy Aware Scheduling depends on the CPUs of the system having specific
+hardware properties and on other features of the kernel being enabled. This
+section lists these dependencies and provides hints as to how they can be met.
+
+
+  6.1 - Asymmetric CPU topology
+
+As mentioned in the introduction, EAS is only supported on platforms with
+asymmetric CPU topologies for now. This requirement is checked at run-time by
+looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
+domains are built.
+
+The flag is set/cleared automatically by the scheduler topology code whenever
+there are CPUs with different capacities in a root domain. The capacities of
+CPUs are provided by arch-specific code through the arch_scale_cpu_capacity()
+callback. As an example, arm and arm64 share an implementation of this callback
+which uses a combination of CPUFreq data and device-tree bindings to compute the
+capacity of CPUs (see drivers/base/arch_topology.c for more details).
+
+So, in order to use EAS on your platform your architecture must implement the
+arch_scale_cpu_capacity() callback, and some of the CPUs must have a lower
+capacity than others.
+
+Please note that EAS is not fundamentally incompatible with SMP, but no
+significant savings on SMP platforms have been observed yet. This restriction
+could be amended in the future if proven otherwise.
+
+
+  6.2 - Energy Model presence
+
+EAS uses the EM of a platform to estimate the impact of scheduling decisions on
+energy. So, your platform must provide power cost tables to the EM framework in
+order to make EAS start. To do so, please refer to documentation of the
+independent EM framework in Documentation/power/energy-model.txt.
+
+Please also note that the scheduling domains need to be re-built after the
+EM has been registered in order to start EAS.
+
+
+  6.3 - Energy Model complexity
+
+The task wake-up path is very latency-sensitive. When the EM of a platform is
+too complex (too many CPUs, too many performance domains, too many performance
+states, ...), the cost of using it in the wake-up path can become prohibitive.
+The energy-aware wake-up algorithm has a complexity of:
+
+	C = Nd * (Nc + Ns)
+
+with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
+total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
+
+A complexity check is performed at the root domain level, when scheduling
+domains are built. EAS will not start on a root domain if its C happens to be
+higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
+time of writing).
+
+If you really want to use EAS but the complexity of your platform's Energy
+Model is too high to be used with a single root domain, you're left with only
+two possible options:
+
+    1. split your system into separate, smaller, root domains using exclusive
+       cpusets and enable EAS locally on each of them. This option has the
+       benefit to work out of the box but the drawback of preventing load
+       balance between root domains, which can result in an unbalanced system
+       overall;
+    2. submit patches to reduce the complexity of the EAS wake-up algorithm,
+       hence enabling it to cope with larger EMs in reasonable time.
+
+
+  6.4 - Schedutil governor
+
+EAS tries to predict at which OPP will the CPUs be running in the close future
+in order to estimate their energy consumption. To do so, it is assumed that OPPs
+of CPUs follow their utilization.
+
+Although it is very difficult to provide hard guarantees regarding the accuracy
+of this assumption in practice (because the hardware might not do what it is
+told to do, for example), schedutil as opposed to other CPUFreq governors at
+least _requests_ frequencies calculated using the utilization signals.
+Consequently, the only sane governor to use together with EAS is schedutil,
+because it is the only one providing some degree of consistency between
+frequency requests and energy predictions.
+
+Using EAS with any other governor than schedutil is not supported.
+
+
+  6.5 Scale-invariant utilization signals
+
+In order to make accurate prediction across CPUs and for all performance
+states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
+be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
+callbacks.
+
+Using EAS on a platform that doesn't implement these two callbacks is not
+supported.
+
+
+  6.6 Multithreading (SMT)
+
+EAS in its current form is SMT unaware and is not able to leverage
+multithreaded hardware to save energy. EAS considers threads as independent
+CPUs, which can actually be counter-productive for both performance and energy.
+
+EAS on SMT is not supported.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:sched/core] PM/EM: Document the Energy Model framework
  2019-01-10 11:05 ` [PATCH 1/2] PM / EM: Document the Energy Model framework Quentin Perret
  2019-01-17 14:47   ` Juri Lelli
  2019-01-21 11:37   ` [tip:sched/core] PM/EM: " tip-bot for Quentin Perret
@ 2019-01-21 13:53   ` tip-bot for Quentin Perret
  2019-01-21 14:10     ` Quentin Perret
  2019-01-27 11:36   ` tip-bot for Quentin Perret
  3 siblings, 1 reply; 22+ messages in thread
From: tip-bot for Quentin Perret @ 2019-01-21 13:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, mingo, tglx, hpa, torvalds, quentin.perret, linux-kernel

Commit-ID:  a6a2333618df721d942d37564f8c4b28d1f6924b
Gitweb:     https://git.kernel.org/tip/a6a2333618df721d942d37564f8c4b28d1f6924b
Author:     Quentin Perret <quentin.perret@arm.com>
AuthorDate: Thu, 10 Jan 2019 11:05:45 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 21 Jan 2019 14:40:28 +0100

PM/EM: Document the Energy Model framework

Introduce a documentation file summarizing the key design points and
APIs of the newly introduced Energy Model framework.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: dietmar.eggemann@arm.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: qais.yousef@arm.com
Cc: rjw@rjwysocki.net
Link: https://lkml.kernel.org/r/20190110110546.8101-2-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/power/energy-model.txt | 144 +++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/Documentation/power/energy-model.txt b/Documentation/power/energy-model.txt
new file mode 100644
index 000000000000..a2b0ae4c76bd
--- /dev/null
+++ b/Documentation/power/energy-model.txt
@@ -0,0 +1,144 @@
+                           ====================
+                           Energy Model of CPUs
+                           ====================
+
+1. Overview
+-----------
+
+The Energy Model (EM) framework serves as an interface between drivers knowing
+the power consumed by CPUs at various performance levels, and the kernel
+subsystems willing to use that information to make energy-aware decisions.
+
+The source of the information about the power consumed by CPUs can vary greatly
+from one platform to another. These power costs can be estimated using
+devicetree data in some cases. In others, the firmware will know better.
+Alternatively, userspace might be best positioned. And so on. In order to avoid
+each and every client subsystem to re-implement support for each and every
+possible source of information on its own, the EM framework intervenes as an
+abstraction layer which standardizes the format of power cost tables in the
+kernel, hence enabling to avoid redundant work.
+
+The figure below depicts an example of drivers (Arm-specific here, but the
+approach is applicable to any architecture) providing power costs to the EM
+framework, and interested clients reading the data from it.
+
+       +---------------+  +-----------------+  +---------------+
+       | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
+       +---------------+  +-----------------+  +---------------+
+               |                   | em_pd_energy()    |
+               |                   | em_cpu_get()      |
+               +---------+         |         +---------+
+                         |         |         |
+                         v         v         v
+                        +---------------------+
+                        |    Energy Model     |
+                        |     Framework       |
+                        +---------------------+
+                           ^       ^       ^
+                           |       |       | em_register_perf_domain()
+                +----------+       |       +---------+
+                |                  |                 |
+        +---------------+  +---------------+  +--------------+
+        |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
+        +---------------+  +---------------+  +--------------+
+                ^                  ^                 ^
+                |                  |                 |
+        +--------------+   +---------------+  +--------------+
+        | Device Tree  |   |   Firmware    |  |      ?       |
+        +--------------+   +---------------+  +--------------+
+
+The EM framework manages power cost tables per 'performance domain' in the
+system. A performance domain is a group of CPUs whose performance is scaled
+together. Performance domains generally have a 1-to-1 mapping with CPUFreq
+policies. All CPUs in a performance domain are required to have the same
+micro-architecture. CPUs in different performance domains can have different
+micro-architectures.
+
+
+2. Core APIs
+------------
+
+  2.1 Config options
+
+CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
+
+
+  2.2 Registration of performance domains
+
+Drivers are expected to register performance domains into the EM framework by
+calling the following API:
+
+  int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
+			      struct em_data_callback *cb);
+
+Drivers must specify the CPUs of the performance domains using the cpumask
+argument, and provide a callback function returning <frequency, power> tuples
+for each capacity state. The callback function provided by the driver is free
+to fetch data from any relevant location (DT, firmware, ...), and by any mean
+deemed necessary. See Section 3. for an example of driver implementing this
+callback, and kernel/power/energy_model.c for further documentation on this
+API.
+
+
+  2.3 Accessing performance domains
+
+Subsystems interested in the energy model of a CPU can retrieve it using the
+em_cpu_get() API. The energy model tables are allocated once upon creation of
+the performance domains, and kept in memory untouched.
+
+The energy consumed by a performance domain can be estimated using the
+em_pd_energy() API. The estimation is performed assuming that the schedutil
+CPUfreq governor is in use.
+
+More details about the above APIs can be found in include/linux/energy_model.h.
+
+
+3. Example driver
+-----------------
+
+This section provides a simple example of a CPUFreq driver registering a
+performance domain in the Energy Model framework using the (fake) 'foo'
+protocol. The driver implements an est_power() function to be provided to the
+EM framework.
+
+ -> drivers/cpufreq/foo_cpufreq.c
+
+01	static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
+02	{
+03		long freq, power;
+04
+05		/* Use the 'foo' protocol to ceil the frequency */
+06		freq = foo_get_freq_ceil(cpu, *KHz);
+07		if (freq < 0);
+08			return freq;
+09
+10		/* Estimate the power cost for the CPU at the relevant freq. */
+11		power = foo_estimate_power(cpu, freq);
+12		if (power < 0);
+13			return power;
+14
+15		/* Return the values to the EM framework */
+16		*mW = power;
+17		*KHz = freq;
+18
+19		return 0;
+20	}
+21
+22	static int foo_cpufreq_init(struct cpufreq_policy *policy)
+23	{
+24		struct em_data_callback em_cb = EM_DATA_CB(est_power);
+25		int nr_opp, ret;
+26
+27		/* Do the actual CPUFreq init work ... */
+28		ret = do_foo_cpufreq_init(policy);
+29		if (ret)
+30			return ret;
+31
+32		/* Find the number of OPPs for this policy */
+33		nr_opp = foo_get_nr_opp(policy);
+34
+35		/* And register the new performance domain */
+36		em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
+37
+38	        return 0;
+39	}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:sched/core] sched/doc: Document Energy Aware Scheduling
  2019-01-10 11:05 ` [PATCH 2/2] sched: Document Energy Aware Scheduling Quentin Perret
  2019-01-17 15:51   ` Juri Lelli
  2019-01-21 11:38   ` [tip:sched/core] sched/doc: " tip-bot for Quentin Perret
@ 2019-01-21 13:54   ` tip-bot for Quentin Perret
  2019-01-27 11:37   ` tip-bot for Quentin Perret
  3 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Quentin Perret @ 2019-01-21 13:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, torvalds, morten.rasmussen, tglx, qais.yousef, peterz,
	mingo, linux-kernel, quentin.perret

Commit-ID:  3948e120a22eb19276de2ecf9b5aea592af7031a
Gitweb:     https://git.kernel.org/tip/3948e120a22eb19276de2ecf9b5aea592af7031a
Author:     Quentin Perret <quentin.perret@arm.com>
AuthorDate: Thu, 10 Jan 2019 11:05:46 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 21 Jan 2019 14:40:28 +0100

sched/doc: Document Energy Aware Scheduling

Add some documentation detailing the main design points of EAS, as well
as a list of its dependencies.

Parts of this documentation are taken from Morten Rasmussen's original
EAS posting: https://lkml.org/lkml/2015/7/7/754

Co-authored-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: dietmar.eggemann@arm.com
Cc: patrick.bellasi@arm.com
Cc: rjw@rjwysocki.net
Link: https://lkml.kernel.org/r/20190110110546.8101-3-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/scheduler/sched-energy.txt | 425 +++++++++++++++++++++++++++++++
 1 file changed, 425 insertions(+)

diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
new file mode 100644
index 000000000000..197d81f4b836
--- /dev/null
+++ b/Documentation/scheduler/sched-energy.txt
@@ -0,0 +1,425 @@
+			   =======================
+			   Energy Aware Scheduling
+			   =======================
+
+1. Introduction
+---------------
+
+Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict
+the impact of its decisions on the energy consumed by CPUs. EAS relies on an
+Energy Model (EM) of the CPUs to select an energy efficient CPU for each task,
+with a minimal impact on throughput. This document aims at providing an
+introduction on how EAS works, what are the main design decisions behind it, and
+details what is needed to get it to run.
+
+Before going any further, please note that at the time of writing:
+
+   /!\ EAS does not support platforms with symmetric CPU topologies /!\
+
+EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
+because this is where the potential for saving energy through scheduling is
+the highest.
+
+The actual EM used by EAS is _not_ maintained by the scheduler, but by a
+dedicated framework. For details about this framework and what it provides,
+please refer to its documentation (see Documentation/power/energy-model.txt).
+
+
+2. Background and Terminology
+-----------------------------
+
+To make it clear from the start:
+ - energy = [joule] (resource like a battery on powered devices)
+ - power = energy/time = [joule/second] = [watt]
+
+The goal of EAS is to minimize energy, while still getting the job done. That
+is, we want to maximize:
+
+	performance [inst/s]
+	--------------------
+	    power [W]
+
+which is equivalent to minimizing:
+
+	energy [J]
+	-----------
+	instruction
+
+while still getting 'good' performance. It is essentially an alternative
+optimization objective to the current performance-only objective for the
+scheduler. This alternative considers two objectives: energy-efficiency and
+performance.
+
+The idea behind introducing an EM is to allow the scheduler to evaluate the
+implications of its decisions rather than blindly applying energy-saving
+techniques that may have positive effects only on some platforms. At the same
+time, the EM must be as simple as possible to minimize the scheduler latency
+impact.
+
+In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
+for the scheduler to decide where a task should run (during wake-up), the EM
+is used to break the tie between several good CPU candidates and pick the one
+that is predicted to yield the best energy consumption without harming the
+system's throughput. The predictions made by EAS rely on specific elements of
+knowledge about the platform's topology, which include the 'capacity' of CPUs,
+and their respective energy costs.
+
+
+3. Topology information
+-----------------------
+
+EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
+differentiate CPUs with different computing throughput. The 'capacity' of a CPU
+represents the amount of work it can absorb when running at its highest
+frequency compared to the most capable CPU of the system. Capacity values are
+normalized in a 1024 range, and are comparable with the utilization signals of
+tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
+to capacity and utilization values, EAS is able to estimate how big/busy a
+task/CPU is, and to take this into consideration when evaluating performance vs
+energy trade-offs. The capacity of CPUs is provided via arch-specific code
+through the arch_scale_cpu_capacity() callback.
+
+The rest of platform knowledge used by EAS is directly read from the Energy
+Model (EM) framework. The EM of a platform is composed of a power cost table
+per 'performance domain' in the system (see Documentation/power/energy-model.txt
+for futher details about performance domains).
+
+The scheduler manages references to the EM objects in the topology code when the
+scheduling domains are built, or re-built. For each root domain (rd), the
+scheduler maintains a singly linked list of all performance domains intersecting
+the current rd->span. Each node in the list contains a pointer to a struct
+em_perf_domain as provided by the EM framework.
+
+The lists are attached to the root domains in order to cope with exclusive
+cpuset configurations. Since the boundaries of exclusive cpusets do not
+necessarily match those of performance domains, the lists of different root
+domains can contain duplicate elements.
+
+Example 1.
+    Let us consider a platform with 12 CPUs, split in 3 performance domains
+    (pd0, pd4 and pd8), organized as follows:
+
+	          CPUs:   0 1 2 3 4 5 6 7 8 9 10 11
+	          PDs:   |--pd0--|--pd4--|---pd8---|
+	          RDs:   |----rd1----|-----rd2-----|
+
+    Now, consider that userspace decided to split the system with two
+    exclusive cpusets, hence creating two independent root domains, each
+    containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the
+    above figure. Since pd4 intersects with both rd1 and rd2, it will be
+    present in the linked list '->pd' attached to each of them:
+       * rd1->pd: pd0 -> pd4
+       * rd2->pd: pd4 -> pd8
+
+    Please note that the scheduler will create two duplicate list nodes for
+    pd4 (one for each list). However, both just hold a pointer to the same
+    shared data structure of the EM framework.
+
+Since the access to these lists can happen concurrently with hotplug and other
+things, they are protected by RCU, like the rest of topology structures
+manipulated by the scheduler.
+
+EAS also maintains a static key (sched_energy_present) which is enabled when at
+least one root domain meets all conditions for EAS to start. Those conditions
+are summarized in Section 6.
+
+
+4. Energy-Aware task placement
+------------------------------
+
+EAS overrides the CFS task wake-up balancing code. It uses the EM of the
+platform and the PELT signals to choose an energy-efficient target CPU during
+wake-up balance. When EAS is enabled, select_task_rq_fair() calls
+find_energy_efficient_cpu() to do the placement decision. This function looks
+for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in
+each performance domain since it is the one which will allow us to keep the
+frequency the lowest. Then, the function checks if placing the task there could
+save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran
+in its previous activation.
+
+find_energy_efficient_cpu() uses compute_energy() to estimate what will be the
+energy consumed by the system if the waking task was migrated. compute_energy()
+looks at the current utilization landscape of the CPUs and adjusts it to
+'simulate' the task migration. The EM framework provides the em_pd_energy() API
+which computes the expected energy consumption of each performance domain for
+the given utilization landscape.
+
+An example of energy-optimized task placement decision is detailed below.
+
+Example 2.
+    Let us consider a (fake) platform with 2 independent performance domains
+    composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3
+    are big.
+
+    The scheduler must decide where to place a task P whose util_avg = 200
+    and prev_cpu = 0.
+
+    The current utilization landscape of the CPUs is depicted on the graph
+    below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively
+    Each performance domain has three Operating Performance Points (OPPs).
+    The CPU capacity and power cost associated with each OPP is listed in
+    the Energy Model table. The util_avg of P is shown on the figures
+    below as 'PP'.
+
+    CPU util.
+      1024                 - - - - - - -              Energy Model
+                                               +-----------+-------------+
+                                               |  Little   |     Big     |
+       768                 =============       +-----+-----+------+------+
+                                               | Cap | Pwr | Cap  | Pwr  |
+                                               +-----+-----+------+------+
+       512  ===========    - ##- - - - -       | 170 | 50  | 512  | 400  |
+                             ##     ##         | 341 | 150 | 768  | 800  |
+       341  -PP - - - -      ##     ##         | 512 | 300 | 1024 | 1700 |
+             PP              ##     ##         +-----+-----+------+------+
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+      Current OPP: =====       Other OPP: - - -     util_avg (100 each): ##
+
+
+    find_energy_efficient_cpu() will first look for the CPUs with the
+    maximum spare capacity in the two performance domains. In this example,
+    CPU1 and CPU3. Then it will estimate the energy of the system if P was
+    placed on either of them, and check if that would save some energy
+    compared to leaving P on CPU0. EAS assumes that OPPs follow utilization
+    (which is coherent with the behaviour of the schedutil CPUFreq
+    governor, see Section 6. for more details on this topic).
+
+    Case 1. P is migrated to CPU1
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 300 / 341 * 150 = 131
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1364
+       341  ===========      ##     ##
+                    PP       ##     ##
+       170  -## - - PP-      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    Case 2. P is migrated to CPU3
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 100 / 341 * 150 = 43
+                                    PP       * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - -PP -     * CPU3: 700 / 768 * 800 = 729
+                             ##     ##          => total_energy = 1485
+       341  ===========      ##     ##
+                             ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    Case 3. P stays on prev_cpu / CPU 0
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 400 / 512 * 300 = 234
+                                             * CPU1: 100 / 512 * 300 = 58
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  ===========    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1437
+       341  -PP - - - -      ##     ##
+             PP              ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    From these calculations, the Case 1 has the lowest total energy. So CPU 1
+    is be the best candidate from an energy-efficiency standpoint.
+
+Big CPUs are generally more power hungry than the little ones and are thus used
+mainly when a task doesn't fit the littles. However, little CPUs aren't always
+necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
+of the little CPUs can be less energy-efficient than the lowest OPPs of the
+bigs, for example. So, if the little CPUs happen to have enough utilization at
+a specific point in time, a small task waking up at that moment could be better
+of executing on the big side in order to save energy, even though it would fit
+on the little side.
+
+And even in the case where all OPPs of the big CPUs are less energy-efficient
+than those of the little, using the big CPUs for a small task might still, under
+specific conditions, save energy. Indeed, placing a task on a little CPU can
+result in raising the OPP of the entire performance domain, and that will
+increase the cost of the tasks already running there. If the waking task is
+placed on a big CPU, its own execution cost might be higher than if it was
+running on a little, but it won't impact the other tasks of the little CPUs
+which will keep running at a lower OPP. So, when considering the total energy
+consumed by CPUs, the extra cost of running that one task on a big core can be
+smaller than the cost of raising the OPP on the little CPUs for all the other
+tasks.
+
+The examples above would be nearly impossible to get right in a generic way, and
+for all platforms, without knowing the cost of running at different OPPs on all
+CPUs of the system. Thanks to its EM-based design, EAS should cope with them
+correctly without too many troubles. However, in order to ensure a minimal
+impact on throughput for high-utilization scenarios, EAS also implements another
+mechanism called 'over-utilization'.
+
+
+5. Over-utilization
+-------------------
+
+From a general standpoint, the use-cases where EAS can help the most are those
+involving a light/medium CPU utilization. Whenever long CPU-bound tasks are
+being run, they will require all of the available CPU capacity, and there isn't
+much that can be done by the scheduler to save energy without severly harming
+throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
+'over-utilized' as soon as they are used at more than 80% of their compute
+capacity. As long as no CPUs are over-utilized in a root domain, load balancing
+is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
+the most energy efficient CPUs of the system more than the others if that can be
+done without harming throughput. So, the load-balancer is disabled to prevent
+it from breaking the energy-efficient task placement found by EAS. It is safe to
+do so when the system isn't overutilized since being below the 80% tipping point
+implies that:
+
+    a. there is some idle time on all CPUs, so the utilization signals used by
+       EAS are likely to accurately represent the 'size' of the various tasks
+       in the system;
+    b. all tasks should already be provided with enough CPU capacity,
+       regardless of their nice values;
+    c. since there is spare capacity all tasks must be blocking/sleeping
+       regularly and balancing at wake-up is sufficient.
+
+As soon as one CPU goes above the 80% tipping point, at least one of the three
+assumptions above becomes incorrect. In this scenario, the 'overutilized' flag
+is raised for the entire root domain, EAS is disabled, and the load-balancer is
+re-enabled. By doing so, the scheduler falls back onto load-based algorithms for
+wake-up and load balance under CPU-bound conditions. This provides a better
+respect of the nice values of tasks.
+
+Since the notion of overutilization largely relies on detecting whether or not
+there is some idle time in the system, the CPU capacity 'stolen' by higher
+(than CFS) scheduling classes (as well as IRQ) must be taken into account. As
+such, the detection of overutilization accounts for the capacity used not only
+by CFS tasks, but also by the other scheduling classes and IRQ.
+
+
+6. Dependencies and requirements for EAS
+----------------------------------------
+
+Energy Aware Scheduling depends on the CPUs of the system having specific
+hardware properties and on other features of the kernel being enabled. This
+section lists these dependencies and provides hints as to how they can be met.
+
+
+  6.1 - Asymmetric CPU topology
+
+As mentioned in the introduction, EAS is only supported on platforms with
+asymmetric CPU topologies for now. This requirement is checked at run-time by
+looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
+domains are built.
+
+The flag is set/cleared automatically by the scheduler topology code whenever
+there are CPUs with different capacities in a root domain. The capacities of
+CPUs are provided by arch-specific code through the arch_scale_cpu_capacity()
+callback. As an example, arm and arm64 share an implementation of this callback
+which uses a combination of CPUFreq data and device-tree bindings to compute the
+capacity of CPUs (see drivers/base/arch_topology.c for more details).
+
+So, in order to use EAS on your platform your architecture must implement the
+arch_scale_cpu_capacity() callback, and some of the CPUs must have a lower
+capacity than others.
+
+Please note that EAS is not fundamentally incompatible with SMP, but no
+significant savings on SMP platforms have been observed yet. This restriction
+could be amended in the future if proven otherwise.
+
+
+  6.2 - Energy Model presence
+
+EAS uses the EM of a platform to estimate the impact of scheduling decisions on
+energy. So, your platform must provide power cost tables to the EM framework in
+order to make EAS start. To do so, please refer to documentation of the
+independent EM framework in Documentation/power/energy-model.txt.
+
+Please also note that the scheduling domains need to be re-built after the
+EM has been registered in order to start EAS.
+
+
+  6.3 - Energy Model complexity
+
+The task wake-up path is very latency-sensitive. When the EM of a platform is
+too complex (too many CPUs, too many performance domains, too many performance
+states, ...), the cost of using it in the wake-up path can become prohibitive.
+The energy-aware wake-up algorithm has a complexity of:
+
+	C = Nd * (Nc + Ns)
+
+with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
+total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
+
+A complexity check is performed at the root domain level, when scheduling
+domains are built. EAS will not start on a root domain if its C happens to be
+higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
+time of writing).
+
+If you really want to use EAS but the complexity of your platform's Energy
+Model is too high to be used with a single root domain, you're left with only
+two possible options:
+
+    1. split your system into separate, smaller, root domains using exclusive
+       cpusets and enable EAS locally on each of them. This option has the
+       benefit to work out of the box but the drawback of preventing load
+       balance between root domains, which can result in an unbalanced system
+       overall;
+    2. submit patches to reduce the complexity of the EAS wake-up algorithm,
+       hence enabling it to cope with larger EMs in reasonable time.
+
+
+  6.4 - Schedutil governor
+
+EAS tries to predict at which OPP will the CPUs be running in the close future
+in order to estimate their energy consumption. To do so, it is assumed that OPPs
+of CPUs follow their utilization.
+
+Although it is very difficult to provide hard guarantees regarding the accuracy
+of this assumption in practice (because the hardware might not do what it is
+told to do, for example), schedutil as opposed to other CPUFreq governors at
+least _requests_ frequencies calculated using the utilization signals.
+Consequently, the only sane governor to use together with EAS is schedutil,
+because it is the only one providing some degree of consistency between
+frequency requests and energy predictions.
+
+Using EAS with any other governor than schedutil is not supported.
+
+
+  6.5 Scale-invariant utilization signals
+
+In order to make accurate prediction across CPUs and for all performance
+states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
+be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
+callbacks.
+
+Using EAS on a platform that doesn't implement these two callbacks is not
+supported.
+
+
+  6.6 Multithreading (SMT)
+
+EAS in its current form is SMT unaware and is not able to leverage
+multithreaded hardware to save energy. EAS considers threads as independent
+CPUs, which can actually be counter-productive for both performance and energy.
+
+EAS on SMT is not supported.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] PM/EM: Document the Energy Model framework
  2019-01-21 13:53   ` tip-bot for Quentin Perret
@ 2019-01-21 14:10     ` Quentin Perret
  0 siblings, 0 replies; 22+ messages in thread
From: Quentin Perret @ 2019-01-21 14:10 UTC (permalink / raw)
  To: mingo, tglx, hpa, peterz, linux-kernel, torvalds; +Cc: linux-tip-commits

On Monday 21 Jan 2019 at 05:53:49 (-0800), tip-bot for Quentin Perret wrote:
> Commit-ID:  a6a2333618df721d942d37564f8c4b28d1f6924b
> Gitweb:     https://git.kernel.org/tip/a6a2333618df721d942d37564f8c4b28d1f6924b
> Author:     Quentin Perret <quentin.perret@arm.com>
> AuthorDate: Thu, 10 Jan 2019 11:05:45 +0000
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Mon, 21 Jan 2019 14:40:28 +0100
> 
> PM/EM: Document the Energy Model framework
> 
> Introduce a documentation file summarizing the key design points and
> APIs of the newly introduced Energy Model framework.
> 
> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: corbet@lwn.net
> Cc: dietmar.eggemann@arm.com
> Cc: morten.rasmussen@arm.com
> Cc: patrick.bellasi@arm.com
> Cc: qais.yousef@arm.com
> Cc: rjw@rjwysocki.net
> Link: https://lkml.kernel.org/r/20190110110546.8101-2-quentin.perret@arm.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>

Argh, I sent a v2 addressing Juri's and Rafael's comments on the list
just today [1]. Should I send a diff patch later ? Or is it still
possible to pick up the v2 ?

Thanks,
Quentin

[1] https://lore.kernel.org/lkml/20190121111724.18234-1-quentin.perret@arm.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [tip:sched/core] PM/EM: Document the Energy Model framework
  2019-01-10 11:05 ` [PATCH 1/2] PM / EM: Document the Energy Model framework Quentin Perret
                     ` (2 preceding siblings ...)
  2019-01-21 13:53   ` tip-bot for Quentin Perret
@ 2019-01-27 11:36   ` tip-bot for Quentin Perret
  3 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Quentin Perret @ 2019-01-27 11:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, linux-kernel, tglx, torvalds, hpa, quentin.perret, mingo

Commit-ID:  1017b48ccc11a70634a7b8ec4ba3a6acb234c17b
Gitweb:     https://git.kernel.org/tip/1017b48ccc11a70634a7b8ec4ba3a6acb234c17b
Author:     Quentin Perret <quentin.perret@arm.com>
AuthorDate: Thu, 10 Jan 2019 11:05:45 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 27 Jan 2019 12:29:37 +0100

PM/EM: Document the Energy Model framework

Introduce a documentation file summarizing the key design points and
APIs of the newly introduced Energy Model framework.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: dietmar.eggemann@arm.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: qais.yousef@arm.com
Cc: rjw@rjwysocki.net
Link: https://lkml.kernel.org/r/20190110110546.8101-2-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/power/energy-model.txt | 144 +++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/Documentation/power/energy-model.txt b/Documentation/power/energy-model.txt
new file mode 100644
index 000000000000..a2b0ae4c76bd
--- /dev/null
+++ b/Documentation/power/energy-model.txt
@@ -0,0 +1,144 @@
+                           ====================
+                           Energy Model of CPUs
+                           ====================
+
+1. Overview
+-----------
+
+The Energy Model (EM) framework serves as an interface between drivers knowing
+the power consumed by CPUs at various performance levels, and the kernel
+subsystems willing to use that information to make energy-aware decisions.
+
+The source of the information about the power consumed by CPUs can vary greatly
+from one platform to another. These power costs can be estimated using
+devicetree data in some cases. In others, the firmware will know better.
+Alternatively, userspace might be best positioned. And so on. In order to avoid
+each and every client subsystem to re-implement support for each and every
+possible source of information on its own, the EM framework intervenes as an
+abstraction layer which standardizes the format of power cost tables in the
+kernel, hence enabling to avoid redundant work.
+
+The figure below depicts an example of drivers (Arm-specific here, but the
+approach is applicable to any architecture) providing power costs to the EM
+framework, and interested clients reading the data from it.
+
+       +---------------+  +-----------------+  +---------------+
+       | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
+       +---------------+  +-----------------+  +---------------+
+               |                   | em_pd_energy()    |
+               |                   | em_cpu_get()      |
+               +---------+         |         +---------+
+                         |         |         |
+                         v         v         v
+                        +---------------------+
+                        |    Energy Model     |
+                        |     Framework       |
+                        +---------------------+
+                           ^       ^       ^
+                           |       |       | em_register_perf_domain()
+                +----------+       |       +---------+
+                |                  |                 |
+        +---------------+  +---------------+  +--------------+
+        |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
+        +---------------+  +---------------+  +--------------+
+                ^                  ^                 ^
+                |                  |                 |
+        +--------------+   +---------------+  +--------------+
+        | Device Tree  |   |   Firmware    |  |      ?       |
+        +--------------+   +---------------+  +--------------+
+
+The EM framework manages power cost tables per 'performance domain' in the
+system. A performance domain is a group of CPUs whose performance is scaled
+together. Performance domains generally have a 1-to-1 mapping with CPUFreq
+policies. All CPUs in a performance domain are required to have the same
+micro-architecture. CPUs in different performance domains can have different
+micro-architectures.
+
+
+2. Core APIs
+------------
+
+  2.1 Config options
+
+CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
+
+
+  2.2 Registration of performance domains
+
+Drivers are expected to register performance domains into the EM framework by
+calling the following API:
+
+  int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
+			      struct em_data_callback *cb);
+
+Drivers must specify the CPUs of the performance domains using the cpumask
+argument, and provide a callback function returning <frequency, power> tuples
+for each capacity state. The callback function provided by the driver is free
+to fetch data from any relevant location (DT, firmware, ...), and by any mean
+deemed necessary. See Section 3. for an example of driver implementing this
+callback, and kernel/power/energy_model.c for further documentation on this
+API.
+
+
+  2.3 Accessing performance domains
+
+Subsystems interested in the energy model of a CPU can retrieve it using the
+em_cpu_get() API. The energy model tables are allocated once upon creation of
+the performance domains, and kept in memory untouched.
+
+The energy consumed by a performance domain can be estimated using the
+em_pd_energy() API. The estimation is performed assuming that the schedutil
+CPUfreq governor is in use.
+
+More details about the above APIs can be found in include/linux/energy_model.h.
+
+
+3. Example driver
+-----------------
+
+This section provides a simple example of a CPUFreq driver registering a
+performance domain in the Energy Model framework using the (fake) 'foo'
+protocol. The driver implements an est_power() function to be provided to the
+EM framework.
+
+ -> drivers/cpufreq/foo_cpufreq.c
+
+01	static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
+02	{
+03		long freq, power;
+04
+05		/* Use the 'foo' protocol to ceil the frequency */
+06		freq = foo_get_freq_ceil(cpu, *KHz);
+07		if (freq < 0);
+08			return freq;
+09
+10		/* Estimate the power cost for the CPU at the relevant freq. */
+11		power = foo_estimate_power(cpu, freq);
+12		if (power < 0);
+13			return power;
+14
+15		/* Return the values to the EM framework */
+16		*mW = power;
+17		*KHz = freq;
+18
+19		return 0;
+20	}
+21
+22	static int foo_cpufreq_init(struct cpufreq_policy *policy)
+23	{
+24		struct em_data_callback em_cb = EM_DATA_CB(est_power);
+25		int nr_opp, ret;
+26
+27		/* Do the actual CPUFreq init work ... */
+28		ret = do_foo_cpufreq_init(policy);
+29		if (ret)
+30			return ret;
+31
+32		/* Find the number of OPPs for this policy */
+33		nr_opp = foo_get_nr_opp(policy);
+34
+35		/* And register the new performance domain */
+36		em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
+37
+38	        return 0;
+39	}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:sched/core] sched/doc: Document Energy Aware Scheduling
  2019-01-10 11:05 ` [PATCH 2/2] sched: Document Energy Aware Scheduling Quentin Perret
                     ` (2 preceding siblings ...)
  2019-01-21 13:54   ` tip-bot for Quentin Perret
@ 2019-01-27 11:37   ` tip-bot for Quentin Perret
  3 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Quentin Perret @ 2019-01-27 11:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, qais.yousef, torvalds, linux-kernel, hpa, mingo,
	quentin.perret, peterz, morten.rasmussen

Commit-ID:  81a930d3a64a00c5adb2aab28dd1c904045adf57
Gitweb:     https://git.kernel.org/tip/81a930d3a64a00c5adb2aab28dd1c904045adf57
Author:     Quentin Perret <quentin.perret@arm.com>
AuthorDate: Thu, 10 Jan 2019 11:05:46 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 27 Jan 2019 12:29:37 +0100

sched/doc: Document Energy Aware Scheduling

Add some documentation detailing the main design points of EAS, as well
as a list of its dependencies.

Parts of this documentation are taken from Morten Rasmussen's original
EAS posting: https://lkml.org/lkml/2015/7/7/754

Co-authored-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: dietmar.eggemann@arm.com
Cc: patrick.bellasi@arm.com
Cc: rjw@rjwysocki.net
Link: https://lkml.kernel.org/r/20190110110546.8101-3-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/scheduler/sched-energy.txt | 425 +++++++++++++++++++++++++++++++
 1 file changed, 425 insertions(+)

diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
new file mode 100644
index 000000000000..197d81f4b836
--- /dev/null
+++ b/Documentation/scheduler/sched-energy.txt
@@ -0,0 +1,425 @@
+			   =======================
+			   Energy Aware Scheduling
+			   =======================
+
+1. Introduction
+---------------
+
+Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict
+the impact of its decisions on the energy consumed by CPUs. EAS relies on an
+Energy Model (EM) of the CPUs to select an energy efficient CPU for each task,
+with a minimal impact on throughput. This document aims at providing an
+introduction on how EAS works, what are the main design decisions behind it, and
+details what is needed to get it to run.
+
+Before going any further, please note that at the time of writing:
+
+   /!\ EAS does not support platforms with symmetric CPU topologies /!\
+
+EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
+because this is where the potential for saving energy through scheduling is
+the highest.
+
+The actual EM used by EAS is _not_ maintained by the scheduler, but by a
+dedicated framework. For details about this framework and what it provides,
+please refer to its documentation (see Documentation/power/energy-model.txt).
+
+
+2. Background and Terminology
+-----------------------------
+
+To make it clear from the start:
+ - energy = [joule] (resource like a battery on powered devices)
+ - power = energy/time = [joule/second] = [watt]
+
+The goal of EAS is to minimize energy, while still getting the job done. That
+is, we want to maximize:
+
+	performance [inst/s]
+	--------------------
+	    power [W]
+
+which is equivalent to minimizing:
+
+	energy [J]
+	-----------
+	instruction
+
+while still getting 'good' performance. It is essentially an alternative
+optimization objective to the current performance-only objective for the
+scheduler. This alternative considers two objectives: energy-efficiency and
+performance.
+
+The idea behind introducing an EM is to allow the scheduler to evaluate the
+implications of its decisions rather than blindly applying energy-saving
+techniques that may have positive effects only on some platforms. At the same
+time, the EM must be as simple as possible to minimize the scheduler latency
+impact.
+
+In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
+for the scheduler to decide where a task should run (during wake-up), the EM
+is used to break the tie between several good CPU candidates and pick the one
+that is predicted to yield the best energy consumption without harming the
+system's throughput. The predictions made by EAS rely on specific elements of
+knowledge about the platform's topology, which include the 'capacity' of CPUs,
+and their respective energy costs.
+
+
+3. Topology information
+-----------------------
+
+EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
+differentiate CPUs with different computing throughput. The 'capacity' of a CPU
+represents the amount of work it can absorb when running at its highest
+frequency compared to the most capable CPU of the system. Capacity values are
+normalized in a 1024 range, and are comparable with the utilization signals of
+tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
+to capacity and utilization values, EAS is able to estimate how big/busy a
+task/CPU is, and to take this into consideration when evaluating performance vs
+energy trade-offs. The capacity of CPUs is provided via arch-specific code
+through the arch_scale_cpu_capacity() callback.
+
+The rest of platform knowledge used by EAS is directly read from the Energy
+Model (EM) framework. The EM of a platform is composed of a power cost table
+per 'performance domain' in the system (see Documentation/power/energy-model.txt
+for futher details about performance domains).
+
+The scheduler manages references to the EM objects in the topology code when the
+scheduling domains are built, or re-built. For each root domain (rd), the
+scheduler maintains a singly linked list of all performance domains intersecting
+the current rd->span. Each node in the list contains a pointer to a struct
+em_perf_domain as provided by the EM framework.
+
+The lists are attached to the root domains in order to cope with exclusive
+cpuset configurations. Since the boundaries of exclusive cpusets do not
+necessarily match those of performance domains, the lists of different root
+domains can contain duplicate elements.
+
+Example 1.
+    Let us consider a platform with 12 CPUs, split in 3 performance domains
+    (pd0, pd4 and pd8), organized as follows:
+
+	          CPUs:   0 1 2 3 4 5 6 7 8 9 10 11
+	          PDs:   |--pd0--|--pd4--|---pd8---|
+	          RDs:   |----rd1----|-----rd2-----|
+
+    Now, consider that userspace decided to split the system with two
+    exclusive cpusets, hence creating two independent root domains, each
+    containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the
+    above figure. Since pd4 intersects with both rd1 and rd2, it will be
+    present in the linked list '->pd' attached to each of them:
+       * rd1->pd: pd0 -> pd4
+       * rd2->pd: pd4 -> pd8
+
+    Please note that the scheduler will create two duplicate list nodes for
+    pd4 (one for each list). However, both just hold a pointer to the same
+    shared data structure of the EM framework.
+
+Since the access to these lists can happen concurrently with hotplug and other
+things, they are protected by RCU, like the rest of topology structures
+manipulated by the scheduler.
+
+EAS also maintains a static key (sched_energy_present) which is enabled when at
+least one root domain meets all conditions for EAS to start. Those conditions
+are summarized in Section 6.
+
+
+4. Energy-Aware task placement
+------------------------------
+
+EAS overrides the CFS task wake-up balancing code. It uses the EM of the
+platform and the PELT signals to choose an energy-efficient target CPU during
+wake-up balance. When EAS is enabled, select_task_rq_fair() calls
+find_energy_efficient_cpu() to do the placement decision. This function looks
+for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in
+each performance domain since it is the one which will allow us to keep the
+frequency the lowest. Then, the function checks if placing the task there could
+save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran
+in its previous activation.
+
+find_energy_efficient_cpu() uses compute_energy() to estimate what will be the
+energy consumed by the system if the waking task was migrated. compute_energy()
+looks at the current utilization landscape of the CPUs and adjusts it to
+'simulate' the task migration. The EM framework provides the em_pd_energy() API
+which computes the expected energy consumption of each performance domain for
+the given utilization landscape.
+
+An example of energy-optimized task placement decision is detailed below.
+
+Example 2.
+    Let us consider a (fake) platform with 2 independent performance domains
+    composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3
+    are big.
+
+    The scheduler must decide where to place a task P whose util_avg = 200
+    and prev_cpu = 0.
+
+    The current utilization landscape of the CPUs is depicted on the graph
+    below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively
+    Each performance domain has three Operating Performance Points (OPPs).
+    The CPU capacity and power cost associated with each OPP is listed in
+    the Energy Model table. The util_avg of P is shown on the figures
+    below as 'PP'.
+
+    CPU util.
+      1024                 - - - - - - -              Energy Model
+                                               +-----------+-------------+
+                                               |  Little   |     Big     |
+       768                 =============       +-----+-----+------+------+
+                                               | Cap | Pwr | Cap  | Pwr  |
+                                               +-----+-----+------+------+
+       512  ===========    - ##- - - - -       | 170 | 50  | 512  | 400  |
+                             ##     ##         | 341 | 150 | 768  | 800  |
+       341  -PP - - - -      ##     ##         | 512 | 300 | 1024 | 1700 |
+             PP              ##     ##         +-----+-----+------+------+
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+      Current OPP: =====       Other OPP: - - -     util_avg (100 each): ##
+
+
+    find_energy_efficient_cpu() will first look for the CPUs with the
+    maximum spare capacity in the two performance domains. In this example,
+    CPU1 and CPU3. Then it will estimate the energy of the system if P was
+    placed on either of them, and check if that would save some energy
+    compared to leaving P on CPU0. EAS assumes that OPPs follow utilization
+    (which is coherent with the behaviour of the schedutil CPUFreq
+    governor, see Section 6. for more details on this topic).
+
+    Case 1. P is migrated to CPU1
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 300 / 341 * 150 = 131
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1364
+       341  ===========      ##     ##
+                    PP       ##     ##
+       170  -## - - PP-      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    Case 2. P is migrated to CPU3
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 100 / 341 * 150 = 43
+                                    PP       * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - -PP -     * CPU3: 700 / 768 * 800 = 729
+                             ##     ##          => total_energy = 1485
+       341  ===========      ##     ##
+                             ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    Case 3. P stays on prev_cpu / CPU 0
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 400 / 512 * 300 = 234
+                                             * CPU1: 100 / 512 * 300 = 58
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  ===========    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1437
+       341  -PP - - - -      ##     ##
+             PP              ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    From these calculations, the Case 1 has the lowest total energy. So CPU 1
+    is be the best candidate from an energy-efficiency standpoint.
+
+Big CPUs are generally more power hungry than the little ones and are thus used
+mainly when a task doesn't fit the littles. However, little CPUs aren't always
+necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
+of the little CPUs can be less energy-efficient than the lowest OPPs of the
+bigs, for example. So, if the little CPUs happen to have enough utilization at
+a specific point in time, a small task waking up at that moment could be better
+of executing on the big side in order to save energy, even though it would fit
+on the little side.
+
+And even in the case where all OPPs of the big CPUs are less energy-efficient
+than those of the little, using the big CPUs for a small task might still, under
+specific conditions, save energy. Indeed, placing a task on a little CPU can
+result in raising the OPP of the entire performance domain, and that will
+increase the cost of the tasks already running there. If the waking task is
+placed on a big CPU, its own execution cost might be higher than if it was
+running on a little, but it won't impact the other tasks of the little CPUs
+which will keep running at a lower OPP. So, when considering the total energy
+consumed by CPUs, the extra cost of running that one task on a big core can be
+smaller than the cost of raising the OPP on the little CPUs for all the other
+tasks.
+
+The examples above would be nearly impossible to get right in a generic way, and
+for all platforms, without knowing the cost of running at different OPPs on all
+CPUs of the system. Thanks to its EM-based design, EAS should cope with them
+correctly without too many troubles. However, in order to ensure a minimal
+impact on throughput for high-utilization scenarios, EAS also implements another
+mechanism called 'over-utilization'.
+
+
+5. Over-utilization
+-------------------
+
+From a general standpoint, the use-cases where EAS can help the most are those
+involving a light/medium CPU utilization. Whenever long CPU-bound tasks are
+being run, they will require all of the available CPU capacity, and there isn't
+much that can be done by the scheduler to save energy without severly harming
+throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
+'over-utilized' as soon as they are used at more than 80% of their compute
+capacity. As long as no CPUs are over-utilized in a root domain, load balancing
+is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
+the most energy efficient CPUs of the system more than the others if that can be
+done without harming throughput. So, the load-balancer is disabled to prevent
+it from breaking the energy-efficient task placement found by EAS. It is safe to
+do so when the system isn't overutilized since being below the 80% tipping point
+implies that:
+
+    a. there is some idle time on all CPUs, so the utilization signals used by
+       EAS are likely to accurately represent the 'size' of the various tasks
+       in the system;
+    b. all tasks should already be provided with enough CPU capacity,
+       regardless of their nice values;
+    c. since there is spare capacity all tasks must be blocking/sleeping
+       regularly and balancing at wake-up is sufficient.
+
+As soon as one CPU goes above the 80% tipping point, at least one of the three
+assumptions above becomes incorrect. In this scenario, the 'overutilized' flag
+is raised for the entire root domain, EAS is disabled, and the load-balancer is
+re-enabled. By doing so, the scheduler falls back onto load-based algorithms for
+wake-up and load balance under CPU-bound conditions. This provides a better
+respect of the nice values of tasks.
+
+Since the notion of overutilization largely relies on detecting whether or not
+there is some idle time in the system, the CPU capacity 'stolen' by higher
+(than CFS) scheduling classes (as well as IRQ) must be taken into account. As
+such, the detection of overutilization accounts for the capacity used not only
+by CFS tasks, but also by the other scheduling classes and IRQ.
+
+
+6. Dependencies and requirements for EAS
+----------------------------------------
+
+Energy Aware Scheduling depends on the CPUs of the system having specific
+hardware properties and on other features of the kernel being enabled. This
+section lists these dependencies and provides hints as to how they can be met.
+
+
+  6.1 - Asymmetric CPU topology
+
+As mentioned in the introduction, EAS is only supported on platforms with
+asymmetric CPU topologies for now. This requirement is checked at run-time by
+looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
+domains are built.
+
+The flag is set/cleared automatically by the scheduler topology code whenever
+there are CPUs with different capacities in a root domain. The capacities of
+CPUs are provided by arch-specific code through the arch_scale_cpu_capacity()
+callback. As an example, arm and arm64 share an implementation of this callback
+which uses a combination of CPUFreq data and device-tree bindings to compute the
+capacity of CPUs (see drivers/base/arch_topology.c for more details).
+
+So, in order to use EAS on your platform your architecture must implement the
+arch_scale_cpu_capacity() callback, and some of the CPUs must have a lower
+capacity than others.
+
+Please note that EAS is not fundamentally incompatible with SMP, but no
+significant savings on SMP platforms have been observed yet. This restriction
+could be amended in the future if proven otherwise.
+
+
+  6.2 - Energy Model presence
+
+EAS uses the EM of a platform to estimate the impact of scheduling decisions on
+energy. So, your platform must provide power cost tables to the EM framework in
+order to make EAS start. To do so, please refer to documentation of the
+independent EM framework in Documentation/power/energy-model.txt.
+
+Please also note that the scheduling domains need to be re-built after the
+EM has been registered in order to start EAS.
+
+
+  6.3 - Energy Model complexity
+
+The task wake-up path is very latency-sensitive. When the EM of a platform is
+too complex (too many CPUs, too many performance domains, too many performance
+states, ...), the cost of using it in the wake-up path can become prohibitive.
+The energy-aware wake-up algorithm has a complexity of:
+
+	C = Nd * (Nc + Ns)
+
+with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
+total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
+
+A complexity check is performed at the root domain level, when scheduling
+domains are built. EAS will not start on a root domain if its C happens to be
+higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
+time of writing).
+
+If you really want to use EAS but the complexity of your platform's Energy
+Model is too high to be used with a single root domain, you're left with only
+two possible options:
+
+    1. split your system into separate, smaller, root domains using exclusive
+       cpusets and enable EAS locally on each of them. This option has the
+       benefit to work out of the box but the drawback of preventing load
+       balance between root domains, which can result in an unbalanced system
+       overall;
+    2. submit patches to reduce the complexity of the EAS wake-up algorithm,
+       hence enabling it to cope with larger EMs in reasonable time.
+
+
+  6.4 - Schedutil governor
+
+EAS tries to predict at which OPP will the CPUs be running in the close future
+in order to estimate their energy consumption. To do so, it is assumed that OPPs
+of CPUs follow their utilization.
+
+Although it is very difficult to provide hard guarantees regarding the accuracy
+of this assumption in practice (because the hardware might not do what it is
+told to do, for example), schedutil as opposed to other CPUFreq governors at
+least _requests_ frequencies calculated using the utilization signals.
+Consequently, the only sane governor to use together with EAS is schedutil,
+because it is the only one providing some degree of consistency between
+frequency requests and energy predictions.
+
+Using EAS with any other governor than schedutil is not supported.
+
+
+  6.5 Scale-invariant utilization signals
+
+In order to make accurate prediction across CPUs and for all performance
+states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
+be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
+callbacks.
+
+Using EAS on a platform that doesn't implement these two callbacks is not
+supported.
+
+
+  6.6 Multithreading (SMT)
+
+EAS in its current form is SMT unaware and is not able to leverage
+multithreaded hardware to save energy. EAS considers threads as independent
+CPUs, which can actually be counter-productive for both performance and energy.
+
+EAS on SMT is not supported.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-01-27 11:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-10 11:05 [PATCH 0/2] Documentation: Explain EAS and EM Quentin Perret
2019-01-10 11:05 ` [PATCH 1/2] PM / EM: Document the Energy Model framework Quentin Perret
2019-01-17 14:47   ` Juri Lelli
2019-01-17 14:53     ` Quentin Perret
2019-01-21 11:37   ` [tip:sched/core] PM/EM: " tip-bot for Quentin Perret
2019-01-21 13:53   ` tip-bot for Quentin Perret
2019-01-21 14:10     ` Quentin Perret
2019-01-27 11:36   ` tip-bot for Quentin Perret
2019-01-10 11:05 ` [PATCH 2/2] sched: Document Energy Aware Scheduling Quentin Perret
2019-01-17 15:51   ` Juri Lelli
2019-01-18  9:16     ` Quentin Perret
2019-01-18  9:57       ` Rafael J. Wysocki
2019-01-18 10:34         ` Quentin Perret
2019-01-18 10:58           ` Rafael J. Wysocki
2019-01-18 11:01             ` Rafael J. Wysocki
2019-01-18 11:11               ` Quentin Perret
2019-01-18 11:20                 ` Rafael J. Wysocki
2019-01-18 11:26                   ` Quentin Perret
2019-01-18 12:34           ` Juri Lelli
2019-01-21 11:38   ` [tip:sched/core] sched/doc: " tip-bot for Quentin Perret
2019-01-21 13:54   ` tip-bot for Quentin Perret
2019-01-27 11:37   ` tip-bot for Quentin Perret

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).