All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] feec() energy margin removal
@ 2022-01-12 16:12 Vincent Donnefort
  2022-01-12 16:12 ` [PATCH v2 1/7] sched/fair: Provide u64 read for 32-bits arch helper Vincent Donnefort
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Vincent Donnefort @ 2022-01-12 16:12 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot
  Cc: linux-kernel, dietmar.eggemann, valentin.schneider,
	morten.rasmussen, chris.redpath, qperret, lukasz.luba,
	Vincent Donnefort

find_energy_efficient() (feec()) will migrate a task to save energy only
if it saves at least 6% of the total energy consumed by the system. This
conservative approach is a problem on a system where a lot of small tasks
create a huge load on the overall: very few of them will be allowed to migrate
to a smaller CPU, wasting a lot of energy. Instead of trying to determine yet
another margin, let's try to remove it.

The first elements of this patch-set are various fixes and improvement that
stabilizes task_util and ensures energy comparison fairness across all CPUs of
the topology. Only once those fixed, we can completely remove the margin and
let feec() aggressively place task and save energy.

This has been validated by two different ways:

First using LISA's eas_behaviour test suite. This is composed of a set of
scenario and verify if the task placement is optimum. No failure have been
observed and it also improved some tests such as Ramp-Down (as the placement
is now more energy oriented) and *ThreeSmall (as no bouncing between clusters
happen anymore).

  * Hikey960: 100% PASSED
  * DB-845C:  100% PASSED
  * RB5:      100% PASSED

Second, using an Android benchmark: PCMark2 on a Pixel4, with a lot of
backports to have a scheduler as close as we can from mainline. 

  +------------+-----------------+-----------------+
  |    Test    |      Perf       |    Energy [1]   |
  +------------+-----------------+-----------------+
  | Web2       | -0.3% pval 0.03 | -1.8% pval 0.00 |
  | Video2     | -0.3% pval 0.13 | -5.6% pval 0.00 |
  | Photo2 [2] | -3.8% pval 0.00 | -1%   pval 0.00 |
  | Writing2   |  0%   pval 0.13 | -1%   pval 0.00 |
  | Data2      |  0%   pval 0.8  | -0.43 pval 0.00 |
  +------------+-----------------+-----------------+ 

The margin removal let the kernel make the best use of the Energy Model,
tasks are more likely to be placed where they fit and this saves a 
substantial amount of energy, while having a limited impact on performances.

[1] This is an energy estimation based on the CPU activity and the Energy Model
for this device. "All models are wrong but some are useful"; yes, this is an
imperfect estimation that doesn't take into account some idle states and shared
power rails. Nonetheless this is based on the information the kernel has during
runtime and it proves the scheduler can take better decisions based solely on
those data.

[2] This is the only performance impact observed. The debugging of this test
showed no issue with task placement. The better score was solely due to some

v1 -> v2:
  - Fix PELT migration last_update_time (previously root cfs_rq's).
  - Add Dietmar's patches to refactor feec()'s CPU loop.
  - feec(): renaming busy time functions get_{pd,tsk}_busy_time()
  - feec(): pd_cap computation in the first for_each_cpu loop.
  - feec(): create get_pd_max_util() function (previously within compute_energy())
  - feec(): rename base_energy_pd to base_energy.

Dietmar Eggemann (3):
  sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util()
  sched/fair: Rename select_idle_mask to select_rq_mask
  sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu()

Vincent Donnefort (4):
  sched/fair: Provide u64 read for 32-bits arch helper
  sched/fair: Decay task PELT values during migration
  sched/fair: Remove task_util from effective utilization in feec()
  sched/fair: Remove the energy margin in feec()

 drivers/powercap/dtpm_cpu.c       |  33 +---
 drivers/thermal/cpufreq_cooling.c |   6 +-
 include/linux/sched.h             |   2 +-
 kernel/sched/core.c               |  22 ++-
 kernel/sched/cpufreq_schedutil.c  |   5 +-
 kernel/sched/fair.c               | 313 ++++++++++++++++--------------
 kernel/sched/sched.h              |  48 ++++-
 7 files changed, 238 insertions(+), 191 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH v2 6/7] sched/fair: Remove task_util from effective utilization in feec()
@ 2022-01-13 11:35 kernel test robot
  0 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2022-01-13 11:35 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 32496 bytes --]

CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
In-Reply-To: <20220112161230.836326-7-vincent.donnefort@arm.com>
References: <20220112161230.836326-7-vincent.donnefort@arm.com>
TO: Vincent Donnefort <vincent.donnefort@arm.com>

Hi Vincent,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on next-20220113]
[cannot apply to rafael-pm/linux-next rafael-pm/thermal v5.16]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Vincent-Donnefort/feec-energy-margin-removal/20220113-002104
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 82762d2af31a60081162890983a83499c9c7dd74
:::::: branch date: 19 hours ago
:::::: commit date: 19 hours ago
config: riscv-randconfig-c006-20220112 (https://download.01.org/0day-ci/archive/20220113/202201131912.xXKka1RZ-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 244dd2913a43a200f5a6544d424cdc37b771028b)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/ce70047d014b32af0102fca5681c1e8aebc4b7ae
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Vincent-Donnefort/feec-energy-margin-removal/20220113-002104
        git checkout ce70047d014b32af0102fca5681c1e8aebc4b7ae
        # save the config file to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv clang-analyzer 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


clang-analyzer warnings: (new ones prefixed by >>)
   drivers/char/ipmi/ipmi_si_intf.c:770:3: note: Calling 'handle_transaction_done'
                   handle_transaction_done(smi_info);
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:542:2: note: Control jumps to 'case SI_GETTING_EVENTS:'  at line 601
           switch (smi_info->si_state) {
           ^
   drivers/char/ipmi/ipmi_si_intf.c:604:4: note: Access to field 'rsp_size' results in a dereference of a null pointer (loaded from field 'curr_msg')
                           = smi_info->handlers->get_result(
                           ^
   drivers/char/ipmi/ipmi_si_intf.c:642:4: warning: Access to field 'rsp_size' results in a dereference of a null pointer (loaded from field 'curr_msg') [clang-analyzer-core.NullDereference]
                           = smi_info->handlers->get_result(
                           ^
   drivers/char/ipmi/ipmi_si_intf.c:2159:6: note: Assuming field 'dev_group_added' is false
           if (smi_info->dev_group_added) {
               ^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:2159:2: note: Taking false branch
           if (smi_info->dev_group_added) {
           ^
   drivers/char/ipmi/ipmi_si_intf.c:2163:6: note: Assuming field 'dev' is null
           if (smi_info->io.dev)
               ^~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:2163:2: note: Taking false branch
           if (smi_info->io.dev)
           ^
   drivers/char/ipmi/ipmi_si_intf.c:2171:6: note: Assuming field 'irq_cleanup' is null
           if (smi_info->io.irq_cleanup) {
               ^~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:2171:2: note: Taking false branch
           if (smi_info->io.irq_cleanup) {
           ^
   drivers/char/ipmi/ipmi_si_intf.c:2175:2: note: Calling 'stop_timer_and_thread'
           stop_timer_and_thread(smi_info);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:1835:6: note: Assuming field 'thread' is equal to NULL
           if (smi_info->thread != NULL) {
               ^~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:1835:2: note: Taking false branch
           if (smi_info->thread != NULL) {
           ^
   drivers/char/ipmi/ipmi_si_intf.c:1841:2: note: Value assigned to field 'curr_msg'
           del_timer_sync(&smi_info->si_timer);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:2175:2: note: Returning from 'stop_timer_and_thread'
           stop_timer_and_thread(smi_info);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:2189:9: note: Assuming field 'curr_msg' is null
           while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
                  ^~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:2189:9: note: Left side of '||' is false
   drivers/char/ipmi/ipmi_si_intf.c:2189:32: note: Assuming field 'si_state' is not equal to SI_NORMAL
           while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:2189:2: note: Loop condition is true.  Entering loop body
           while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
           ^
   drivers/char/ipmi/ipmi_si_intf.c:2190:3: note: Calling 'poll'
                   poll(smi_info);
                   ^~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:1041:6: note: Assuming 'run_to_completion' is true
           if (!run_to_completion)
               ^~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:1041:2: note: Taking false branch
           if (!run_to_completion)
           ^
   drivers/char/ipmi/ipmi_si_intf.c:1043:2: note: Calling 'smi_event_handler'
           smi_event_handler(smi_info, 10);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:764:9: note: Assuming 'si_sm_result' is not equal to SI_SM_CALL_WITHOUT_DELAY
           while (si_sm_result == SI_SM_CALL_WITHOUT_DELAY)
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:764:2: note: Loop condition is false. Execution continues on line 767
           while (si_sm_result == SI_SM_CALL_WITHOUT_DELAY)
           ^
   drivers/char/ipmi/ipmi_si_intf.c:767:6: note: Assuming 'si_sm_result' is equal to SI_SM_TRANSACTION_COMPLETE
           if (si_sm_result == SI_SM_TRANSACTION_COMPLETE) {
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:767:2: note: Taking true branch
           if (si_sm_result == SI_SM_TRANSACTION_COMPLETE) {
           ^
   drivers/char/ipmi/ipmi_si_intf.c:770:3: note: Calling 'handle_transaction_done'
                   handle_transaction_done(smi_info);
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/ipmi/ipmi_si_intf.c:542:2: note: Control jumps to 'case SI_GETTING_MESSAGES:'  at line 639
           switch (smi_info->si_state) {
           ^
   drivers/char/ipmi/ipmi_si_intf.c:642:4: note: Access to field 'rsp_size' results in a dereference of a null pointer (loaded from field 'curr_msg')
                           = smi_info->handlers->get_result(
                           ^
   Suppressed 5 warnings (5 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   13 warnings generated.
   Suppressed 13 warnings (13 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   17 warnings generated.
   kernel/sched/fair.c:683:15: warning: Value stored to 'nr_running' during its initialization is never read [clang-analyzer-deadcode.DeadStores]
           unsigned int nr_running = cfs_rq->nr_running;
                        ^~~~~~~~~~   ~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:683:15: note: Value stored to 'nr_running' during its initialization is never read
           unsigned int nr_running = cfs_rq->nr_running;
                        ^~~~~~~~~~   ~~~~~~~~~~~~~~~~~~
>> kernel/sched/fair.c:6738:11: warning: The left expression of the compound assignment is an uninitialized value. The computed value will also be garbage [clang-analyzer-core.uninitialized.Assign]
                           pd_cap += cpu_thermal_cap;
                           ~~~~~~ ^
   kernel/sched/fair.c:6691:25: note: Loop condition is false.  Exiting loop
           struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
                                  ^
   include/linux/cpumask.h:735:37: note: expanded from macro 'this_cpu_cpumask_var_ptr'
   #define this_cpu_cpumask_var_ptr(x) this_cpu_ptr(x)
                                       ^
   include/linux/percpu-defs.h:252:27: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
                             ^
   include/linux/percpu-defs.h:241:2: note: expanded from macro 'raw_cpu_ptr'
           __verify_pcpu_ptr(ptr);                                         \
           ^
   include/linux/percpu-defs.h:217:37: note: expanded from macro '__verify_pcpu_ptr'
   #define __verify_pcpu_ptr(ptr)                                          \
                                                                           ^
   kernel/sched/fair.c:6693:52: note: 'pd_cap' declared without an initial value
           unsigned long busy_time, tsk_busy_time, max_util, pd_cap;
                                                             ^~~~~~
   kernel/sched/fair.c:6694:27: note: Loop condition is false.  Exiting loop
           struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
                                    ^
   kernel/sched/sched.h:1411:24: note: expanded from macro 'cpu_rq'
   #define cpu_rq(cpu)             (&per_cpu(runqueues, (cpu)))
                                     ^
   include/linux/percpu-defs.h:269:29: note: expanded from macro 'per_cpu'
   #define per_cpu(var, cpu)       (*per_cpu_ptr(&(var), cpu))
                                     ^
   include/linux/percpu-defs.h:235:2: note: expanded from macro 'per_cpu_ptr'
           __verify_pcpu_ptr(ptr);                                         \
           ^
   include/linux/percpu-defs.h:217:37: note: expanded from macro '__verify_pcpu_ptr'
   #define __verify_pcpu_ptr(ptr)                                          \
                                                                           ^
   kernel/sched/fair.c:6702:7: note: Left side of '||' is false
           pd = rcu_dereference(rd->pd);
                ^
   include/linux/rcupdate.h:597:28: note: expanded from macro 'rcu_dereference'
   #define rcu_dereference(p) rcu_dereference_check(p, 0)
                              ^
   include/linux/rcupdate.h:529:2: note: expanded from macro 'rcu_dereference_check'
           __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
           ^
   include/linux/rcupdate.h:390:48: note: expanded from macro '__rcu_dereference_check'
           typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \
                                                         ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:302:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/sched/fair.c:6702:7: note: Left side of '||' is false
           pd = rcu_dereference(rd->pd);
                ^
   include/linux/rcupdate.h:597:28: note: expanded from macro 'rcu_dereference'
   #define rcu_dereference(p) rcu_dereference_check(p, 0)
                              ^
   include/linux/rcupdate.h:529:2: note: expanded from macro 'rcu_dereference_check'
           __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
           ^
   include/linux/rcupdate.h:390:48: note: expanded from macro '__rcu_dereference_check'
           typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \
                                                         ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:302:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/sched/fair.c:6702:7: note: Left side of '||' is true
           pd = rcu_dereference(rd->pd);
                ^
   include/linux/rcupdate.h:597:28: note: expanded from macro 'rcu_dereference'
   #define rcu_dereference(p) rcu_dereference_check(p, 0)
                              ^
   include/linux/rcupdate.h:529:2: note: expanded from macro 'rcu_dereference_check'
           __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
           ^
   include/linux/rcupdate.h:390:48: note: expanded from macro '__rcu_dereference_check'
           typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \
                                                         ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:303:28: note: expanded from macro '__native_word'
            sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
                                     ^
   kernel/sched/fair.c:6702:7: note: Taking false branch
           pd = rcu_dereference(rd->pd);

vim +6738 kernel/sched/fair.c

390031e4c309c9 Quentin Perret            2018-12-03  6649  
732cd75b8c920d Quentin Perret            2018-12-03  6650  /*
732cd75b8c920d Quentin Perret            2018-12-03  6651   * find_energy_efficient_cpu(): Find most energy-efficient target CPU for the
732cd75b8c920d Quentin Perret            2018-12-03  6652   * waking task. find_energy_efficient_cpu() looks for the CPU with maximum
732cd75b8c920d Quentin Perret            2018-12-03  6653   * spare capacity in each performance domain and uses it as a potential
732cd75b8c920d Quentin Perret            2018-12-03  6654   * candidate to execute the task. Then, it uses the Energy Model to figure
732cd75b8c920d Quentin Perret            2018-12-03  6655   * out which of the CPU candidates is the most energy-efficient.
732cd75b8c920d Quentin Perret            2018-12-03  6656   *
732cd75b8c920d Quentin Perret            2018-12-03  6657   * The rationale for this heuristic is as follows. In a performance domain,
732cd75b8c920d Quentin Perret            2018-12-03  6658   * all the most energy efficient CPU candidates (according to the Energy
732cd75b8c920d Quentin Perret            2018-12-03  6659   * Model) are those for which we'll request a low frequency. When there are
732cd75b8c920d Quentin Perret            2018-12-03  6660   * several CPUs for which the frequency request will be the same, we don't
732cd75b8c920d Quentin Perret            2018-12-03  6661   * have enough data to break the tie between them, because the Energy Model
732cd75b8c920d Quentin Perret            2018-12-03  6662   * only includes active power costs. With this model, if we assume that
732cd75b8c920d Quentin Perret            2018-12-03  6663   * frequency requests follow utilization (e.g. using schedutil), the CPU with
732cd75b8c920d Quentin Perret            2018-12-03  6664   * the maximum spare capacity in a performance domain is guaranteed to be among
732cd75b8c920d Quentin Perret            2018-12-03  6665   * the best candidates of the performance domain.
732cd75b8c920d Quentin Perret            2018-12-03  6666   *
732cd75b8c920d Quentin Perret            2018-12-03  6667   * In practice, it could be preferable from an energy standpoint to pack
732cd75b8c920d Quentin Perret            2018-12-03  6668   * small tasks on a CPU in order to let other CPUs go in deeper idle states,
732cd75b8c920d Quentin Perret            2018-12-03  6669   * but that could also hurt our chances to go cluster idle, and we have no
732cd75b8c920d Quentin Perret            2018-12-03  6670   * ways to tell with the current Energy Model if this is actually a good
732cd75b8c920d Quentin Perret            2018-12-03  6671   * idea or not. So, find_energy_efficient_cpu() basically favors
732cd75b8c920d Quentin Perret            2018-12-03  6672   * cluster-packing, and spreading inside a cluster. That should at least be
732cd75b8c920d Quentin Perret            2018-12-03  6673   * a good thing for latency, and this is consistent with the idea that most
732cd75b8c920d Quentin Perret            2018-12-03  6674   * of the energy savings of EAS come from the asymmetry of the system, and
732cd75b8c920d Quentin Perret            2018-12-03  6675   * not so much from breaking the tie between identical CPUs. That's also the
732cd75b8c920d Quentin Perret            2018-12-03  6676   * reason why EAS is enabled in the topology code only for systems where
732cd75b8c920d Quentin Perret            2018-12-03  6677   * SD_ASYM_CPUCAPACITY is set.
732cd75b8c920d Quentin Perret            2018-12-03  6678   *
732cd75b8c920d Quentin Perret            2018-12-03  6679   * NOTE: Forkees are not accepted in the energy-aware wake-up path because
732cd75b8c920d Quentin Perret            2018-12-03  6680   * they don't have any useful utilization data yet and it's not possible to
732cd75b8c920d Quentin Perret            2018-12-03  6681   * forecast their impact on energy consumption. Consequently, they will be
732cd75b8c920d Quentin Perret            2018-12-03  6682   * placed by find_idlest_cpu() on the least loaded CPU, which might turn out
732cd75b8c920d Quentin Perret            2018-12-03  6683   * to be energy-inefficient in some use-cases. The alternative would be to
732cd75b8c920d Quentin Perret            2018-12-03  6684   * bias new tasks towards specific types of CPUs first, or to try to infer
732cd75b8c920d Quentin Perret            2018-12-03  6685   * their util_avg from the parent task, but those heuristics could hurt
732cd75b8c920d Quentin Perret            2018-12-03  6686   * other use-cases too. So, until someone finds a better way to solve this,
732cd75b8c920d Quentin Perret            2018-12-03  6687   * let's keep things simple by re-using the existing slow path.
732cd75b8c920d Quentin Perret            2018-12-03  6688   */
732cd75b8c920d Quentin Perret            2018-12-03  6689  static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
732cd75b8c920d Quentin Perret            2018-12-03  6690  {
245c43dc401377 Dietmar Eggemann          2022-01-12  6691  	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
eb92692b2544d3 Quentin Perret            2019-09-12  6692  	unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
ce70047d014b32 Vincent Donnefort         2022-01-12  6693  	unsigned long busy_time, tsk_busy_time, max_util, pd_cap;
732cd75b8c920d Quentin Perret            2018-12-03  6694  	struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
619e090c8e409e Pierre Gondois            2021-05-04  6695  	int cpu, best_energy_cpu = prev_cpu, target = -1;
ce70047d014b32 Vincent Donnefort         2022-01-12  6696  	unsigned long cpu_cap, cpu_thermal_cap, util;
ce70047d014b32 Vincent Donnefort         2022-01-12  6697  	unsigned long base_energy = 0;
732cd75b8c920d Quentin Perret            2018-12-03  6698  	struct sched_domain *sd;
eb92692b2544d3 Quentin Perret            2019-09-12  6699  	struct perf_domain *pd;
732cd75b8c920d Quentin Perret            2018-12-03  6700  
732cd75b8c920d Quentin Perret            2018-12-03  6701  	rcu_read_lock();
732cd75b8c920d Quentin Perret            2018-12-03  6702  	pd = rcu_dereference(rd->pd);
732cd75b8c920d Quentin Perret            2018-12-03  6703  	if (!pd || READ_ONCE(rd->overutilized))
619e090c8e409e Pierre Gondois            2021-05-04  6704  		goto unlock;
732cd75b8c920d Quentin Perret            2018-12-03  6705  
732cd75b8c920d Quentin Perret            2018-12-03  6706  	/*
732cd75b8c920d Quentin Perret            2018-12-03  6707  	 * Energy-aware wake-up happens on the lowest sched_domain starting
732cd75b8c920d Quentin Perret            2018-12-03  6708  	 * from sd_asym_cpucapacity spanning over this_cpu and prev_cpu.
732cd75b8c920d Quentin Perret            2018-12-03  6709  	 */
732cd75b8c920d Quentin Perret            2018-12-03  6710  	sd = rcu_dereference(*this_cpu_ptr(&sd_asym_cpucapacity));
732cd75b8c920d Quentin Perret            2018-12-03  6711  	while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd)))
732cd75b8c920d Quentin Perret            2018-12-03  6712  		sd = sd->parent;
732cd75b8c920d Quentin Perret            2018-12-03  6713  	if (!sd)
619e090c8e409e Pierre Gondois            2021-05-04  6714  		goto unlock;
619e090c8e409e Pierre Gondois            2021-05-04  6715  
619e090c8e409e Pierre Gondois            2021-05-04  6716  	target = prev_cpu;
732cd75b8c920d Quentin Perret            2018-12-03  6717  
732cd75b8c920d Quentin Perret            2018-12-03  6718  	sync_entity_load_avg(&p->se);
732cd75b8c920d Quentin Perret            2018-12-03  6719  	if (!task_util_est(p))
732cd75b8c920d Quentin Perret            2018-12-03  6720  		goto unlock;
732cd75b8c920d Quentin Perret            2018-12-03  6721  
ce70047d014b32 Vincent Donnefort         2022-01-12  6722  	tsk_busy_time = get_task_busy_time(p, prev_cpu);
ce70047d014b32 Vincent Donnefort         2022-01-12  6723  
732cd75b8c920d Quentin Perret            2018-12-03  6724  	for (; pd; pd = pd->next) {
eb92692b2544d3 Quentin Perret            2019-09-12  6725  		unsigned long cur_delta, spare_cap, max_spare_cap = 0;
8d4c97c105ca07 Pierre Gondois            2021-05-04  6726  		bool compute_prev_delta = false;
eb92692b2544d3 Quentin Perret            2019-09-12  6727  		unsigned long base_energy_pd;
732cd75b8c920d Quentin Perret            2018-12-03  6728  		int max_spare_cap_cpu = -1;
732cd75b8c920d Quentin Perret            2018-12-03  6729  
245c43dc401377 Dietmar Eggemann          2022-01-12  6730  		cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
245c43dc401377 Dietmar Eggemann          2022-01-12  6731  
ce70047d014b32 Vincent Donnefort         2022-01-12  6732  		/* Account thermal pressure for the energy estimation */
ce70047d014b32 Vincent Donnefort         2022-01-12  6733  		cpu = cpumask_first(cpus);
ce70047d014b32 Vincent Donnefort         2022-01-12  6734  		cpu_thermal_cap = arch_scale_cpu_capacity(cpu);
ce70047d014b32 Vincent Donnefort         2022-01-12  6735  		cpu_thermal_cap -= arch_scale_thermal_pressure(cpu);
ce70047d014b32 Vincent Donnefort         2022-01-12  6736  
ce70047d014b32 Vincent Donnefort         2022-01-12  6737  		for_each_cpu(cpu, cpus) {
ce70047d014b32 Vincent Donnefort         2022-01-12 @6738  			pd_cap += cpu_thermal_cap;
ce70047d014b32 Vincent Donnefort         2022-01-12  6739  
ce70047d014b32 Vincent Donnefort         2022-01-12  6740  			if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
ce70047d014b32 Vincent Donnefort         2022-01-12  6741  				continue;
ce70047d014b32 Vincent Donnefort         2022-01-12  6742  
3bd3706251ee8a Sebastian Andrzej Siewior 2019-04-23  6743  			if (!cpumask_test_cpu(cpu, p->cpus_ptr))
732cd75b8c920d Quentin Perret            2018-12-03  6744  				continue;
732cd75b8c920d Quentin Perret            2018-12-03  6745  
732cd75b8c920d Quentin Perret            2018-12-03  6746  			util = cpu_util_next(cpu, p, cpu);
732cd75b8c920d Quentin Perret            2018-12-03  6747  			cpu_cap = capacity_of(cpu);
da0777d35f4789 Lukasz Luba               2020-08-10  6748  			spare_cap = cpu_cap;
da0777d35f4789 Lukasz Luba               2020-08-10  6749  			lsub_positive(&spare_cap, util);
1d42509e475cdc Valentin Schneider        2019-12-11  6750  
1d42509e475cdc Valentin Schneider        2019-12-11  6751  			/*
1d42509e475cdc Valentin Schneider        2019-12-11  6752  			 * Skip CPUs that cannot satisfy the capacity request.
1d42509e475cdc Valentin Schneider        2019-12-11  6753  			 * IOW, placing the task there would make the CPU
1d42509e475cdc Valentin Schneider        2019-12-11  6754  			 * overutilized. Take uclamp into account to see how
1d42509e475cdc Valentin Schneider        2019-12-11  6755  			 * much capacity we can get out of the CPU; this is
a5418be9dffe70 Viresh Kumar              2020-12-08  6756  			 * aligned with sched_cpu_util().
1d42509e475cdc Valentin Schneider        2019-12-11  6757  			 */
1d42509e475cdc Valentin Schneider        2019-12-11  6758  			util = uclamp_rq_util_with(cpu_rq(cpu), util, p);
60e17f5cef838e Viresh Kumar              2019-06-04  6759  			if (!fits_capacity(util, cpu_cap))
732cd75b8c920d Quentin Perret            2018-12-03  6760  				continue;
732cd75b8c920d Quentin Perret            2018-12-03  6761  
732cd75b8c920d Quentin Perret            2018-12-03  6762  			if (cpu == prev_cpu) {
8d4c97c105ca07 Pierre Gondois            2021-05-04  6763  				/* Always use prev_cpu as a candidate. */
8d4c97c105ca07 Pierre Gondois            2021-05-04  6764  				compute_prev_delta = true;
8d4c97c105ca07 Pierre Gondois            2021-05-04  6765  			} else if (spare_cap > max_spare_cap) {
732cd75b8c920d Quentin Perret            2018-12-03  6766  				/*
8d4c97c105ca07 Pierre Gondois            2021-05-04  6767  				 * Find the CPU with the maximum spare capacity
8d4c97c105ca07 Pierre Gondois            2021-05-04  6768  				 * in the performance domain.
732cd75b8c920d Quentin Perret            2018-12-03  6769  				 */
732cd75b8c920d Quentin Perret            2018-12-03  6770  				max_spare_cap = spare_cap;
732cd75b8c920d Quentin Perret            2018-12-03  6771  				max_spare_cap_cpu = cpu;
732cd75b8c920d Quentin Perret            2018-12-03  6772  			}
732cd75b8c920d Quentin Perret            2018-12-03  6773  		}
732cd75b8c920d Quentin Perret            2018-12-03  6774  
8d4c97c105ca07 Pierre Gondois            2021-05-04  6775  		if (max_spare_cap_cpu < 0 && !compute_prev_delta)
8d4c97c105ca07 Pierre Gondois            2021-05-04  6776  			continue;
8d4c97c105ca07 Pierre Gondois            2021-05-04  6777  
8d4c97c105ca07 Pierre Gondois            2021-05-04  6778  		/* Compute the 'base' energy of the pd, without @p */
ce70047d014b32 Vincent Donnefort         2022-01-12  6779  		busy_time = get_pd_busy_time(p, cpus, pd_cap);
ce70047d014b32 Vincent Donnefort         2022-01-12  6780  		max_util = get_pd_max_util(p, -1, cpus, cpu_thermal_cap);
ce70047d014b32 Vincent Donnefort         2022-01-12  6781  		base_energy_pd = compute_energy(pd, max_util, busy_time,
ce70047d014b32 Vincent Donnefort         2022-01-12  6782  						cpu_thermal_cap);
8d4c97c105ca07 Pierre Gondois            2021-05-04  6783  		base_energy += base_energy_pd;
8d4c97c105ca07 Pierre Gondois            2021-05-04  6784  
ce70047d014b32 Vincent Donnefort         2022-01-12  6785  		/* Take task into account for the next energy computations */
ce70047d014b32 Vincent Donnefort         2022-01-12  6786  		busy_time = min(pd_cap, busy_time + tsk_busy_time);
ce70047d014b32 Vincent Donnefort         2022-01-12  6787  
8d4c97c105ca07 Pierre Gondois            2021-05-04  6788  		/* Evaluate the energy impact of using prev_cpu. */
8d4c97c105ca07 Pierre Gondois            2021-05-04  6789  		if (compute_prev_delta) {
ce70047d014b32 Vincent Donnefort         2022-01-12  6790  			max_util = get_pd_max_util(p, prev_cpu, cpus,
ce70047d014b32 Vincent Donnefort         2022-01-12  6791  						   cpu_thermal_cap);
ce70047d014b32 Vincent Donnefort         2022-01-12  6792  			prev_delta = compute_energy(pd, max_util, busy_time,
ce70047d014b32 Vincent Donnefort         2022-01-12  6793  						    cpu_thermal_cap);
619e090c8e409e Pierre Gondois            2021-05-04  6794  			if (prev_delta < base_energy_pd)
619e090c8e409e Pierre Gondois            2021-05-04  6795  				goto unlock;
8d4c97c105ca07 Pierre Gondois            2021-05-04  6796  			prev_delta -= base_energy_pd;
8d4c97c105ca07 Pierre Gondois            2021-05-04  6797  			best_delta = min(best_delta, prev_delta);
8d4c97c105ca07 Pierre Gondois            2021-05-04  6798  		}
8d4c97c105ca07 Pierre Gondois            2021-05-04  6799  
8d4c97c105ca07 Pierre Gondois            2021-05-04  6800  		/* Evaluate the energy impact of using max_spare_cap_cpu. */
8d4c97c105ca07 Pierre Gondois            2021-05-04  6801  		if (max_spare_cap_cpu >= 0) {
ce70047d014b32 Vincent Donnefort         2022-01-12  6802  			max_util = get_pd_max_util(p, max_spare_cap_cpu, cpus,
ce70047d014b32 Vincent Donnefort         2022-01-12  6803  						   cpu_thermal_cap);
ce70047d014b32 Vincent Donnefort         2022-01-12  6804  			cur_delta = compute_energy(pd, max_util, busy_time,
ce70047d014b32 Vincent Donnefort         2022-01-12  6805  						   cpu_thermal_cap);
619e090c8e409e Pierre Gondois            2021-05-04  6806  			if (cur_delta < base_energy_pd)
619e090c8e409e Pierre Gondois            2021-05-04  6807  				goto unlock;
eb92692b2544d3 Quentin Perret            2019-09-12  6808  			cur_delta -= base_energy_pd;
eb92692b2544d3 Quentin Perret            2019-09-12  6809  			if (cur_delta < best_delta) {
eb92692b2544d3 Quentin Perret            2019-09-12  6810  				best_delta = cur_delta;
732cd75b8c920d Quentin Perret            2018-12-03  6811  				best_energy_cpu = max_spare_cap_cpu;
732cd75b8c920d Quentin Perret            2018-12-03  6812  			}
732cd75b8c920d Quentin Perret            2018-12-03  6813  		}
732cd75b8c920d Quentin Perret            2018-12-03  6814  	}
732cd75b8c920d Quentin Perret            2018-12-03  6815  	rcu_read_unlock();
732cd75b8c920d Quentin Perret            2018-12-03  6816  
732cd75b8c920d Quentin Perret            2018-12-03  6817  	/*
732cd75b8c920d Quentin Perret            2018-12-03  6818  	 * Pick the best CPU if prev_cpu cannot be used, or if it saves at
732cd75b8c920d Quentin Perret            2018-12-03  6819  	 * least 6% of the energy used by prev_cpu.
732cd75b8c920d Quentin Perret            2018-12-03  6820  	 */
619e090c8e409e Pierre Gondois            2021-05-04  6821  	if ((prev_delta == ULONG_MAX) ||
619e090c8e409e Pierre Gondois            2021-05-04  6822  	    (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
619e090c8e409e Pierre Gondois            2021-05-04  6823  		target = best_energy_cpu;
732cd75b8c920d Quentin Perret            2018-12-03  6824  
619e090c8e409e Pierre Gondois            2021-05-04  6825  	return target;
732cd75b8c920d Quentin Perret            2018-12-03  6826  
619e090c8e409e Pierre Gondois            2021-05-04  6827  unlock:
732cd75b8c920d Quentin Perret            2018-12-03  6828  	rcu_read_unlock();
732cd75b8c920d Quentin Perret            2018-12-03  6829  
619e090c8e409e Pierre Gondois            2021-05-04  6830  	return target;
732cd75b8c920d Quentin Perret            2018-12-03  6831  }
732cd75b8c920d Quentin Perret            2018-12-03  6832  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-01-21 15:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-12 16:12 [PATCH v2 0/7] feec() energy margin removal Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 1/7] sched/fair: Provide u64 read for 32-bits arch helper Vincent Donnefort
2022-01-17 16:11   ` Tao Zhou
2022-01-17 19:42     ` Vincent Donnefort
2022-01-17 23:44       ` Tao Zhou
2022-01-12 16:12 ` [PATCH v2 2/7] sched/fair: Decay task PELT values during migration Vincent Donnefort
2022-01-17 17:31   ` Vincent Guittot
2022-01-18 10:56     ` Vincent Donnefort
2022-01-19  9:54       ` Vincent Guittot
2022-01-19 11:59         ` Vincent Donnefort
2022-01-19 13:22           ` Vincent Guittot
2022-01-20 21:12             ` Vincent Donnefort
2022-01-21 15:27               ` Vincent Guittot
2022-01-12 16:12 ` [PATCH v2 3/7] sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util() Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 4/7] sched/fair: Rename select_idle_mask to select_rq_mask Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 5/7] sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu() Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 6/7] sched/fair: Remove task_util from effective utilization in feec() Vincent Donnefort
2022-01-12 19:44   ` kernel test robot
2022-01-12 19:44     ` kernel test robot
2022-01-17 13:17   ` Dietmar Eggemann
2022-01-18  9:46     ` Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 7/7] sched/fair: Remove the energy margin " Vincent Donnefort
2022-01-13 11:35 [PATCH v2 6/7] sched/fair: Remove task_util from effective utilization " kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.