* Re: [PATCH v4 7/7] sched/fair: Remove the energy margin in feec()
@ 2022-04-15 11:58 kernel test robot
0 siblings, 0 replies; 2+ messages in thread
From: kernel test robot @ 2022-04-15 11:58 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 17678 bytes --]
CC: kbuild-all(a)lists.01.org
BCC: lkp(a)intel.com
In-Reply-To: <20220412134220.1588482-8-vincent.donnefort@arm.com>
References: <20220412134220.1588482-8-vincent.donnefort@arm.com>
TO: Vincent Donnefort <vincent.donnefort@arm.com>
Hi Vincent,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on tip/sched/core]
[also build test WARNING on rafael-pm/linux-next rafael-pm/thermal v5.18-rc2 next-20220414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Vincent-Donnefort/feec-energy-margin-removal/20220412-214441
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 089c02ae2771a14af2928c59c56abfb9b885a8d7
:::::: branch date: 3 days ago
:::::: commit date: 3 days ago
config: i386-randconfig-m021 (https://download.01.org/0day-ci/archive/20220415/202204151914.NKAe2ef1-lkp(a)intel.com/config)
compiler: gcc-11 (Debian 11.2.0-19) 11.2.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
smatch warnings:
kernel/sched/fair.c:6975 find_energy_efficient_cpu() error: uninitialized symbol 'best_energy_cpu'.
vim +/best_energy_cpu +6975 kernel/sched/fair.c
390031e4c309c9 Quentin Perret 2018-12-03 6814
732cd75b8c920d Quentin Perret 2018-12-03 6815 /*
732cd75b8c920d Quentin Perret 2018-12-03 6816 * find_energy_efficient_cpu(): Find most energy-efficient target CPU for the
732cd75b8c920d Quentin Perret 2018-12-03 6817 * waking task. find_energy_efficient_cpu() looks for the CPU with maximum
732cd75b8c920d Quentin Perret 2018-12-03 6818 * spare capacity in each performance domain and uses it as a potential
732cd75b8c920d Quentin Perret 2018-12-03 6819 * candidate to execute the task. Then, it uses the Energy Model to figure
732cd75b8c920d Quentin Perret 2018-12-03 6820 * out which of the CPU candidates is the most energy-efficient.
732cd75b8c920d Quentin Perret 2018-12-03 6821 *
732cd75b8c920d Quentin Perret 2018-12-03 6822 * The rationale for this heuristic is as follows. In a performance domain,
732cd75b8c920d Quentin Perret 2018-12-03 6823 * all the most energy efficient CPU candidates (according to the Energy
732cd75b8c920d Quentin Perret 2018-12-03 6824 * Model) are those for which we'll request a low frequency. When there are
732cd75b8c920d Quentin Perret 2018-12-03 6825 * several CPUs for which the frequency request will be the same, we don't
732cd75b8c920d Quentin Perret 2018-12-03 6826 * have enough data to break the tie between them, because the Energy Model
732cd75b8c920d Quentin Perret 2018-12-03 6827 * only includes active power costs. With this model, if we assume that
732cd75b8c920d Quentin Perret 2018-12-03 6828 * frequency requests follow utilization (e.g. using schedutil), the CPU with
732cd75b8c920d Quentin Perret 2018-12-03 6829 * the maximum spare capacity in a performance domain is guaranteed to be among
732cd75b8c920d Quentin Perret 2018-12-03 6830 * the best candidates of the performance domain.
732cd75b8c920d Quentin Perret 2018-12-03 6831 *
732cd75b8c920d Quentin Perret 2018-12-03 6832 * In practice, it could be preferable from an energy standpoint to pack
732cd75b8c920d Quentin Perret 2018-12-03 6833 * small tasks on a CPU in order to let other CPUs go in deeper idle states,
732cd75b8c920d Quentin Perret 2018-12-03 6834 * but that could also hurt our chances to go cluster idle, and we have no
732cd75b8c920d Quentin Perret 2018-12-03 6835 * ways to tell with the current Energy Model if this is actually a good
732cd75b8c920d Quentin Perret 2018-12-03 6836 * idea or not. So, find_energy_efficient_cpu() basically favors
732cd75b8c920d Quentin Perret 2018-12-03 6837 * cluster-packing, and spreading inside a cluster. That should at least be
732cd75b8c920d Quentin Perret 2018-12-03 6838 * a good thing for latency, and this is consistent with the idea that most
732cd75b8c920d Quentin Perret 2018-12-03 6839 * of the energy savings of EAS come from the asymmetry of the system, and
732cd75b8c920d Quentin Perret 2018-12-03 6840 * not so much from breaking the tie between identical CPUs. That's also the
732cd75b8c920d Quentin Perret 2018-12-03 6841 * reason why EAS is enabled in the topology code only for systems where
732cd75b8c920d Quentin Perret 2018-12-03 6842 * SD_ASYM_CPUCAPACITY is set.
732cd75b8c920d Quentin Perret 2018-12-03 6843 *
732cd75b8c920d Quentin Perret 2018-12-03 6844 * NOTE: Forkees are not accepted in the energy-aware wake-up path because
732cd75b8c920d Quentin Perret 2018-12-03 6845 * they don't have any useful utilization data yet and it's not possible to
732cd75b8c920d Quentin Perret 2018-12-03 6846 * forecast their impact on energy consumption. Consequently, they will be
732cd75b8c920d Quentin Perret 2018-12-03 6847 * placed by find_idlest_cpu() on the least loaded CPU, which might turn out
732cd75b8c920d Quentin Perret 2018-12-03 6848 * to be energy-inefficient in some use-cases. The alternative would be to
732cd75b8c920d Quentin Perret 2018-12-03 6849 * bias new tasks towards specific types of CPUs first, or to try to infer
732cd75b8c920d Quentin Perret 2018-12-03 6850 * their util_avg from the parent task, but those heuristics could hurt
732cd75b8c920d Quentin Perret 2018-12-03 6851 * other use-cases too. So, until someone finds a better way to solve this,
732cd75b8c920d Quentin Perret 2018-12-03 6852 * let's keep things simple by re-using the existing slow path.
732cd75b8c920d Quentin Perret 2018-12-03 6853 */
732cd75b8c920d Quentin Perret 2018-12-03 6854 static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
732cd75b8c920d Quentin Perret 2018-12-03 6855 {
04409022d37d2c Dietmar Eggemann 2022-04-12 6856 struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
eb92692b2544d3 Quentin Perret 2019-09-12 6857 unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
ce557de247bd64 Vincent Donnefort 2022-04-12 6858 struct root_domain *rd = this_rq()->rd;
5311f1261af84b Vincent Donnefort 2022-04-12 6859 int cpu, best_energy_cpu, target = -1;
732cd75b8c920d Quentin Perret 2018-12-03 6860 struct sched_domain *sd;
eb92692b2544d3 Quentin Perret 2019-09-12 6861 struct perf_domain *pd;
ce557de247bd64 Vincent Donnefort 2022-04-12 6862 struct energy_env eenv;
732cd75b8c920d Quentin Perret 2018-12-03 6863
732cd75b8c920d Quentin Perret 2018-12-03 6864 rcu_read_lock();
732cd75b8c920d Quentin Perret 2018-12-03 6865 pd = rcu_dereference(rd->pd);
732cd75b8c920d Quentin Perret 2018-12-03 6866 if (!pd || READ_ONCE(rd->overutilized))
619e090c8e409e Pierre Gondois 2021-05-04 6867 goto unlock;
732cd75b8c920d Quentin Perret 2018-12-03 6868
732cd75b8c920d Quentin Perret 2018-12-03 6869 /*
732cd75b8c920d Quentin Perret 2018-12-03 6870 * Energy-aware wake-up happens on the lowest sched_domain starting
732cd75b8c920d Quentin Perret 2018-12-03 6871 * from sd_asym_cpucapacity spanning over this_cpu and prev_cpu.
732cd75b8c920d Quentin Perret 2018-12-03 6872 */
732cd75b8c920d Quentin Perret 2018-12-03 6873 sd = rcu_dereference(*this_cpu_ptr(&sd_asym_cpucapacity));
732cd75b8c920d Quentin Perret 2018-12-03 6874 while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd)))
732cd75b8c920d Quentin Perret 2018-12-03 6875 sd = sd->parent;
732cd75b8c920d Quentin Perret 2018-12-03 6876 if (!sd)
619e090c8e409e Pierre Gondois 2021-05-04 6877 goto unlock;
619e090c8e409e Pierre Gondois 2021-05-04 6878
619e090c8e409e Pierre Gondois 2021-05-04 6879 target = prev_cpu;
732cd75b8c920d Quentin Perret 2018-12-03 6880
732cd75b8c920d Quentin Perret 2018-12-03 6881 sync_entity_load_avg(&p->se);
732cd75b8c920d Quentin Perret 2018-12-03 6882 if (!task_util_est(p))
732cd75b8c920d Quentin Perret 2018-12-03 6883 goto unlock;
732cd75b8c920d Quentin Perret 2018-12-03 6884
ce557de247bd64 Vincent Donnefort 2022-04-12 6885 eenv_task_busy_time(&eenv, p, prev_cpu);
ce557de247bd64 Vincent Donnefort 2022-04-12 6886
732cd75b8c920d Quentin Perret 2018-12-03 6887 for (; pd; pd = pd->next) {
ce557de247bd64 Vincent Donnefort 2022-04-12 6888 unsigned long cpu_cap, cpu_thermal_cap, util;
ce557de247bd64 Vincent Donnefort 2022-04-12 6889 unsigned long cur_delta, max_spare_cap = 0;
8d4c97c105ca07 Pierre Gondois 2021-05-04 6890 bool compute_prev_delta = false;
732cd75b8c920d Quentin Perret 2018-12-03 6891 int max_spare_cap_cpu = -1;
5311f1261af84b Vincent Donnefort 2022-04-12 6892 unsigned long base_energy;
732cd75b8c920d Quentin Perret 2018-12-03 6893
04409022d37d2c Dietmar Eggemann 2022-04-12 6894 cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
04409022d37d2c Dietmar Eggemann 2022-04-12 6895
ce557de247bd64 Vincent Donnefort 2022-04-12 6896 /* Account thermal pressure for the energy estimation */
ce557de247bd64 Vincent Donnefort 2022-04-12 6897 cpu = cpumask_first(cpus);
ce557de247bd64 Vincent Donnefort 2022-04-12 6898 cpu_thermal_cap = arch_scale_cpu_capacity(cpu);
ce557de247bd64 Vincent Donnefort 2022-04-12 6899 cpu_thermal_cap -= arch_scale_thermal_pressure(cpu);
ce557de247bd64 Vincent Donnefort 2022-04-12 6900
ce557de247bd64 Vincent Donnefort 2022-04-12 6901 eenv.cpu_cap = cpu_thermal_cap;
ce557de247bd64 Vincent Donnefort 2022-04-12 6902 eenv.pd_cap = 0;
ce557de247bd64 Vincent Donnefort 2022-04-12 6903
ce557de247bd64 Vincent Donnefort 2022-04-12 6904 for_each_cpu(cpu, cpus) {
ce557de247bd64 Vincent Donnefort 2022-04-12 6905 eenv.pd_cap += cpu_thermal_cap;
ce557de247bd64 Vincent Donnefort 2022-04-12 6906
ce557de247bd64 Vincent Donnefort 2022-04-12 6907 if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
ce557de247bd64 Vincent Donnefort 2022-04-12 6908 continue;
ce557de247bd64 Vincent Donnefort 2022-04-12 6909
3bd3706251ee8a Sebastian Andrzej Siewior 2019-04-23 6910 if (!cpumask_test_cpu(cpu, p->cpus_ptr))
732cd75b8c920d Quentin Perret 2018-12-03 6911 continue;
732cd75b8c920d Quentin Perret 2018-12-03 6912
732cd75b8c920d Quentin Perret 2018-12-03 6913 util = cpu_util_next(cpu, p, cpu);
732cd75b8c920d Quentin Perret 2018-12-03 6914 cpu_cap = capacity_of(cpu);
1d42509e475cdc Valentin Schneider 2019-12-11 6915
1d42509e475cdc Valentin Schneider 2019-12-11 6916 /*
1d42509e475cdc Valentin Schneider 2019-12-11 6917 * Skip CPUs that cannot satisfy the capacity request.
1d42509e475cdc Valentin Schneider 2019-12-11 6918 * IOW, placing the task there would make the CPU
1d42509e475cdc Valentin Schneider 2019-12-11 6919 * overutilized. Take uclamp into account to see how
1d42509e475cdc Valentin Schneider 2019-12-11 6920 * much capacity we can get out of the CPU; this is
a5418be9dffe70 Viresh Kumar 2020-12-08 6921 * aligned with sched_cpu_util().
1d42509e475cdc Valentin Schneider 2019-12-11 6922 */
1d42509e475cdc Valentin Schneider 2019-12-11 6923 util = uclamp_rq_util_with(cpu_rq(cpu), util, p);
60e17f5cef838e Viresh Kumar 2019-06-04 6924 if (!fits_capacity(util, cpu_cap))
732cd75b8c920d Quentin Perret 2018-12-03 6925 continue;
732cd75b8c920d Quentin Perret 2018-12-03 6926
ce557de247bd64 Vincent Donnefort 2022-04-12 6927 lsub_positive(&cpu_cap, util);
ce557de247bd64 Vincent Donnefort 2022-04-12 6928
732cd75b8c920d Quentin Perret 2018-12-03 6929 if (cpu == prev_cpu) {
8d4c97c105ca07 Pierre Gondois 2021-05-04 6930 /* Always use prev_cpu as a candidate. */
8d4c97c105ca07 Pierre Gondois 2021-05-04 6931 compute_prev_delta = true;
ce557de247bd64 Vincent Donnefort 2022-04-12 6932 } else if (cpu_cap > max_spare_cap) {
732cd75b8c920d Quentin Perret 2018-12-03 6933 /*
8d4c97c105ca07 Pierre Gondois 2021-05-04 6934 * Find the CPU with the maximum spare capacity
8d4c97c105ca07 Pierre Gondois 2021-05-04 6935 * in the performance domain.
732cd75b8c920d Quentin Perret 2018-12-03 6936 */
ce557de247bd64 Vincent Donnefort 2022-04-12 6937 max_spare_cap = cpu_cap;
732cd75b8c920d Quentin Perret 2018-12-03 6938 max_spare_cap_cpu = cpu;
732cd75b8c920d Quentin Perret 2018-12-03 6939 }
732cd75b8c920d Quentin Perret 2018-12-03 6940 }
732cd75b8c920d Quentin Perret 2018-12-03 6941
8d4c97c105ca07 Pierre Gondois 2021-05-04 6942 if (max_spare_cap_cpu < 0 && !compute_prev_delta)
8d4c97c105ca07 Pierre Gondois 2021-05-04 6943 continue;
8d4c97c105ca07 Pierre Gondois 2021-05-04 6944
8d4c97c105ca07 Pierre Gondois 2021-05-04 6945 /* Compute the 'base' energy of the pd, without @p */
ce557de247bd64 Vincent Donnefort 2022-04-12 6946 eenv_pd_busy_time(&eenv, cpus, p);
5311f1261af84b Vincent Donnefort 2022-04-12 6947 base_energy = compute_energy(&eenv, pd, cpus, p, -1);
8d4c97c105ca07 Pierre Gondois 2021-05-04 6948
8d4c97c105ca07 Pierre Gondois 2021-05-04 6949 /* Evaluate the energy impact of using prev_cpu. */
8d4c97c105ca07 Pierre Gondois 2021-05-04 6950 if (compute_prev_delta) {
ce557de247bd64 Vincent Donnefort 2022-04-12 6951 prev_delta = compute_energy(&eenv, pd, cpus, p,
ce557de247bd64 Vincent Donnefort 2022-04-12 6952 prev_cpu);
5311f1261af84b Vincent Donnefort 2022-04-12 6953 if (prev_delta < base_energy)
619e090c8e409e Pierre Gondois 2021-05-04 6954 goto unlock;
5311f1261af84b Vincent Donnefort 2022-04-12 6955 prev_delta -= base_energy;
8d4c97c105ca07 Pierre Gondois 2021-05-04 6956 best_delta = min(best_delta, prev_delta);
8d4c97c105ca07 Pierre Gondois 2021-05-04 6957 }
8d4c97c105ca07 Pierre Gondois 2021-05-04 6958
8d4c97c105ca07 Pierre Gondois 2021-05-04 6959 /* Evaluate the energy impact of using max_spare_cap_cpu. */
8d4c97c105ca07 Pierre Gondois 2021-05-04 6960 if (max_spare_cap_cpu >= 0) {
ce557de247bd64 Vincent Donnefort 2022-04-12 6961 cur_delta = compute_energy(&eenv, pd, cpus, p,
ce557de247bd64 Vincent Donnefort 2022-04-12 6962 max_spare_cap_cpu);
5311f1261af84b Vincent Donnefort 2022-04-12 6963 if (cur_delta < base_energy)
619e090c8e409e Pierre Gondois 2021-05-04 6964 goto unlock;
5311f1261af84b Vincent Donnefort 2022-04-12 6965 cur_delta -= base_energy;
eb92692b2544d3 Quentin Perret 2019-09-12 6966 if (cur_delta < best_delta) {
eb92692b2544d3 Quentin Perret 2019-09-12 6967 best_delta = cur_delta;
732cd75b8c920d Quentin Perret 2018-12-03 6968 best_energy_cpu = max_spare_cap_cpu;
732cd75b8c920d Quentin Perret 2018-12-03 6969 }
732cd75b8c920d Quentin Perret 2018-12-03 6970 }
732cd75b8c920d Quentin Perret 2018-12-03 6971 }
732cd75b8c920d Quentin Perret 2018-12-03 6972 rcu_read_unlock();
732cd75b8c920d Quentin Perret 2018-12-03 6973
5311f1261af84b Vincent Donnefort 2022-04-12 6974 if (best_delta < prev_delta)
619e090c8e409e Pierre Gondois 2021-05-04 @6975 target = best_energy_cpu;
732cd75b8c920d Quentin Perret 2018-12-03 6976
619e090c8e409e Pierre Gondois 2021-05-04 6977 return target;
732cd75b8c920d Quentin Perret 2018-12-03 6978
619e090c8e409e Pierre Gondois 2021-05-04 6979 unlock:
732cd75b8c920d Quentin Perret 2018-12-03 6980 rcu_read_unlock();
732cd75b8c920d Quentin Perret 2018-12-03 6981
619e090c8e409e Pierre Gondois 2021-05-04 6982 return target;
732cd75b8c920d Quentin Perret 2018-12-03 6983 }
732cd75b8c920d Quentin Perret 2018-12-03 6984
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH v3 0/7] feec() energy margin removal
@ 2022-04-12 13:42 Vincent Donnefort
2022-04-12 13:42 ` [PATCH v4 7/7] sched/fair: Remove the energy margin in feec() Vincent Donnefort
0 siblings, 1 reply; 2+ messages in thread
From: Vincent Donnefort @ 2022-04-12 13:42 UTC (permalink / raw)
To: peterz, mingo, vincent.guittot
Cc: linux-kernel, dietmar.eggemann, morten.rasmussen, chris.redpath,
qperret, Vincent Donnefort
find_energy_efficient() (feec()) will migrate a task to save energy only
if it saves at least 6% of the total energy consumed by the system. This
conservative approach is a problem on a system where a lot of small tasks
create a huge load on the overall: very few of them will be allowed to migrate
to a smaller CPU, wasting a lot of energy. Instead of trying to determine yet
another margin, let's try to remove it.
The first elements of this patch-set are various fixes and improvement that
stabilizes task_util and ensures energy comparison fairness across all CPUs of
the topology. Only once those fixed, we can completely remove the margin and
let feec() aggressively place task and save energy.
This has been validated by two different ways:
First using LISA's eas_behaviour test suite. This is composed of a set of
scenario and verify if the task placement is optimum. No failure have been
observed and it also improved some tests such as Ramp-Down (as the placement
is now more energy oriented) and *ThreeSmall (as no bouncing between clusters
happen anymore).
* Hikey960: 100% PASSED
* DB-845C: 100% PASSED
* RB5: 100% PASSED
Second, using an Android benchmark: PCMark2 on a Pixel4, with a lot of
backports to have a scheduler as close as we can from mainline.
+------------+-----------------+-----------------+
| Test | Perf | Energy [1] |
+------------+-----------------+-----------------+
| Web2 | -0.3% pval 0.03 | -1.8% pval 0.00 |
| Video2 | -0.3% pval 0.13 | -5.6% pval 0.00 |
| Photo2 [2] | -3.8% pval 0.00 | -1% pval 0.00 |
| Writing2 | 0% pval 0.13 | -1% pval 0.00 |
| Data2 | 0% pval 0.8 | -0.43 pval 0.00 |
+------------+-----------------+-----------------+
The margin removal let the kernel make the best use of the Energy Model,
tasks are more likely to be placed where they fit and this saves a
substantial amount of energy, while having a limited impact on performances.
[1] This is an energy estimation based on the CPU activity and the Energy Model
for this device. "All models are wrong but some are useful"; yes, this is an
imperfect estimation that doesn't take into account some idle states and shared
power rails. Nonetheless this is based on the information the kernel has during
runtime and it proves the scheduler can take better decisions based solely on
those data.
[2] This is the only performance impact observed. The debugging of this test
showed no issue with task placement. The better score was solely due to some
critical threads held on better performing CPUs. If a thread needs a higher
capacity CPU, the placement must result from a user input (with e.g. uclamp
min) instead of being artificially held on less efficient CPUs by feec().
Notice also, the experiment didn't use the Android only latency_sensitive
feature which would hide this problem on a real-life device.
v3 -> v4:
- Minor cosmetic changes (Dietmar)
v2 -> v3:
- feec(): introduce energy_env struct (Dietmar)
- PELT migration decay: Only apply when src CPU is idle (Vincent G.)
- PELT migration decay: Do not apply when cfs_rq is throttled
- PELT migration decay: Snapshot the lag at cfs_rq's level
v1 -> v2:
- Fix PELT migration last_update_time (previously root cfs_rq's).
- Add Dietmar's patches to refactor feec()'s CPU loop.
- feec(): renaming busy time functions get_{pd,tsk}_busy_time()
- feec(): pd_cap computation in the first for_each_cpu loop.
- feec(): create get_pd_max_util() function (previously within compute_energy())
- feec(): rename base_energy_pd to base_energy.
Dietmar Eggemann (3):
sched, drivers: Remove max param from
effective_cpu_util()/sched_cpu_util()
sched/fair: Rename select_idle_mask to select_rq_mask
sched/fair: Use the same cpumask per-PD throughout
find_energy_efficient_cpu()
Vincent Donnefort (4):
sched/fair: Provide u64 read for 32-bits arch helper
sched/fair: Decay task PELT values during wakeup migration
sched/fair: Remove task_util from effective utilization in feec()
sched/fair: Remove the energy margin in feec()
drivers/powercap/dtpm_cpu.c | 33 +--
drivers/thermal/cpufreq_cooling.c | 6 +-
include/linux/sched.h | 2 +-
kernel/sched/core.c | 15 +-
kernel/sched/cpufreq_schedutil.c | 5 +-
kernel/sched/fair.c | 385 ++++++++++++++++++------------
kernel/sched/sched.h | 49 +++-
7 files changed, 298 insertions(+), 197 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH v4 7/7] sched/fair: Remove the energy margin in feec()
2022-04-12 13:42 [PATCH v3 0/7] feec() energy margin removal Vincent Donnefort
@ 2022-04-12 13:42 ` Vincent Donnefort
0 siblings, 0 replies; 2+ messages in thread
From: Vincent Donnefort @ 2022-04-12 13:42 UTC (permalink / raw)
To: peterz, mingo, vincent.guittot
Cc: linux-kernel, dietmar.eggemann, morten.rasmussen, chris.redpath,
qperret, Vincent Donnefort
find_energy_efficient_cpu() integrates a margin to protect tasks from
bouncing back and forth from a CPU to another. This margin is set as being
6% of the total current energy estimated on the system. This however does
not work for two reasons:
1. The energy estimation is not a good absolute value:
compute_energy() used in feec() is a good estimation for task placement as
it allows to compare the energy with and without a task. The computed
delta will give a good overview of the cost for a certain task placement.
It, however, doesn't work as an absolute estimation for the total energy
of the system. First it adds the contribution to idle CPUs into the
energy, second it mixes util_avg with util_est values. util_avg contains
the near history for a CPU usage, it doesn't tell at all what the current
utilization is. A system that has been quite busy in the near past will
hold a very high energy and then a high margin preventing any task
migration to a lower capacity CPU, wasting energy. It even creates a
negative feedback loop: by holding the tasks on a less efficient CPU, the
margin contributes in keeping the energy high.
2. The margin handicaps small tasks:
On a system where the workload is composed mostly of small tasks (which is
often the case on Android), the overall energy will be high enough to
create a margin none of those tasks can cross. On a Pixel4, a small
utilization of 5% on all the CPUs creates a global estimated energy of 140
joules, as per the Energy Model declaration of that same device. This
means, after applying the 6% margin that any migration must save more than
8 joules to happen. No task with a utilization lower than 40 would then be
able to migrate away from the biggest CPU of the system.
The 6% of the overall system energy was brought by the following patch:
(eb92692b2544 sched/fair: Speed-up energy-aware wake-ups)
It was previously 6% of the prev_cpu energy. Also, the following one
made this margin value conditional on the clusters where the task fits:
(8d4c97c105ca sched/fair: Only compute base_energy_pd if necessary)
We could simply revert that margin change to what it was, but the original
version didn't have strong grounds neither and as demonstrated in (1.) the
estimated energy isn't a good absolute value. Instead, removing it
completely. It is indeed, made possible by recent changes that improved
energy estimation comparison fairness (sched/fair: Remove task_util from
effective utilization in feec()) (PM: EM: Increase energy calculation
precision) and task utilization stabilization (sched/fair: Decay task
util_avg during migration)
Without a margin, we could have feared bouncing between CPUs. But running
LISA's eas_behaviour test coverage on three different platforms (Hikey960,
RB-5 and DB-845) showed no issue.
Removing the energy margin enables more energy-optimized placements for a
more energy efficient system.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d17ef80487e4..1779405c377b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6855,9 +6855,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
{
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
- int cpu, best_energy_cpu = prev_cpu, target = -1;
struct root_domain *rd = this_rq()->rd;
- unsigned long base_energy = 0;
+ int cpu, best_energy_cpu, target = -1;
struct sched_domain *sd;
struct perf_domain *pd;
struct energy_env eenv;
@@ -6889,8 +6888,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
unsigned long cpu_cap, cpu_thermal_cap, util;
unsigned long cur_delta, max_spare_cap = 0;
bool compute_prev_delta = false;
- unsigned long base_energy_pd;
int max_spare_cap_cpu = -1;
+ unsigned long base_energy;
cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
@@ -6945,16 +6944,15 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
/* Compute the 'base' energy of the pd, without @p */
eenv_pd_busy_time(&eenv, cpus, p);
- base_energy_pd = compute_energy(&eenv, pd, cpus, p, -1);
- base_energy += base_energy_pd;
+ base_energy = compute_energy(&eenv, pd, cpus, p, -1);
/* Evaluate the energy impact of using prev_cpu. */
if (compute_prev_delta) {
prev_delta = compute_energy(&eenv, pd, cpus, p,
prev_cpu);
- if (prev_delta < base_energy_pd)
+ if (prev_delta < base_energy)
goto unlock;
- prev_delta -= base_energy_pd;
+ prev_delta -= base_energy;
best_delta = min(best_delta, prev_delta);
}
@@ -6962,9 +6960,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
if (max_spare_cap_cpu >= 0) {
cur_delta = compute_energy(&eenv, pd, cpus, p,
max_spare_cap_cpu);
- if (cur_delta < base_energy_pd)
+ if (cur_delta < base_energy)
goto unlock;
- cur_delta -= base_energy_pd;
+ cur_delta -= base_energy;
if (cur_delta < best_delta) {
best_delta = cur_delta;
best_energy_cpu = max_spare_cap_cpu;
@@ -6973,12 +6971,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
}
rcu_read_unlock();
- /*
- * Pick the best CPU if prev_cpu cannot be used, or if it saves at
- * least 6% of the energy used by prev_cpu.
- */
- if ((prev_delta == ULONG_MAX) ||
- (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
+ if (best_delta < prev_delta)
target = best_energy_cpu;
return target;
--
2.25.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-04-15 11:58 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-15 11:58 [PATCH v4 7/7] sched/fair: Remove the energy margin in feec() kernel test robot
-- strict thread matches above, loose matches on Subject: below --
2022-04-12 13:42 [PATCH v3 0/7] feec() energy margin removal Vincent Donnefort
2022-04-12 13:42 ` [PATCH v4 7/7] sched/fair: Remove the energy margin in feec() Vincent Donnefort
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.