linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/topology: Remove EM_MAX_COMPLEXITY limit
@ 2022-08-12 10:16 Pierre Gondois
  2022-08-17 14:21 ` Ionela Voinescu
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Pierre Gondois @ 2022-08-12 10:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ionela.Voinescu, Lukasz.Luba, Pierre Gondois, Jonathan Corbet,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-doc

From: Pierre Gondois <Pierre.Gondois@arm.com>

The Energy Aware Scheduler (EAS) estimates the energy consumption
of placing a task on different CPUs. The goal is to minimize this
energy consumption. Estimating the energy of different task placements
is increasingly complex with the size of the platform. To avoid having
a slow wake-up path, EAS is only enabled if this complexity is low
enough.

The current complexity limit was set in:
commit b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate
platforms").
base on the first implementation of EAS, which was re-computing
the power of the whole platform for each task placement scenario, cf:
commit 390031e4c309 ("sched/fair: Introduce an energy estimation helper
function").
but the complexity of EAS was reduced in:
commit eb92692b2544d ("sched/fair: Speed-up energy-aware wake-ups")
and find_energy_efficient_cpu() (feec) algorithm was updated in:
commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective
utilization in feec()")

find_energy_efficient_cpu() (feec) is now doing:
feec()
\_ for_each_pd(pd) [0]
  // get max_spare_cap_cpu and compute_prev_delta
  \_ for_each_cpu(pd) [1]

  \_ get_pd_busy_time(pd) [2]
    \_ for_each_cpu(pd)

  // evaluate pd energy without the task
  \_ get_pd_max_util(pd, -1) [3.0]
    \_ for_each_cpu(pd)
  \_ compute_energy(pd, -1)
    \_ for_each_ps(pd)

  // evaluate pd energy with the task on prev_cpu
  \_ get_pd_max_util(pd, prev_cpu) [3.1]
    \_ for_each_cpu(pd)
  \_ compute_energy(pd, prev_cpu)
    \_ for_each_ps(pd)

  // evaluate pd energy with the task on max_spare_cap_cpu
  \_ get_pd_max_util(pd, max_spare_cap_cpu) [3.2]
    \_ for_each_cpu(pd)
  \_ compute_energy(pd, max_spare_cap_cpu)
    \_ for_each_ps(pd)

[3.1] happens only once since prev_cpu is unique. To have an upper
bound of the complexity, [3.1] is taken into account for all pds.
So with the same definitions for nr_pd, nr_cpus and nr_ps,
the complexity is of:
nr_pd * (2 * [nr_cpus in pd] + 3 * ([nr_cpus in pd] + [nr_ps in pd]))
 [0]  * (     [1] + [2]      +       [3.0] + [3.1] + [3.2]          )
= 5 * nr_cpus + 3 * nr_ps

The complexity limit was set to 2048 in:
commit b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate
platforms")
to make "EAS usable up to 16 CPUs with per-CPU DVFS and less than 8
performance states each". For the same platform, the complexity would
actually be of:
5 * 16 + 3 * 7 = 101

Since the EAS complexity was greatly reduced, bigger platforms can
handle EAS. For instance, a platform with 256 CPUs with 256
performance states each would reach it. To reflect this improvement,
remove the EAS complexity check.

Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>
---
 Documentation/scheduler/sched-energy.rst | 37 ++--------------------
 kernel/sched/topology.c                  | 39 ++----------------------
 2 files changed, 6 insertions(+), 70 deletions(-)

diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst
index 8fbce5e767d9..3d1d71134d16 100644
--- a/Documentation/scheduler/sched-energy.rst
+++ b/Documentation/scheduler/sched-energy.rst
@@ -356,38 +356,7 @@ placement. For EAS it doesn't matter whether the EM power values are expressed
 in milli-Watts or in an 'abstract scale'.
 
 
-6.3 - Energy Model complexity
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The task wake-up path is very latency-sensitive. When the EM of a platform is
-too complex (too many CPUs, too many performance domains, too many performance
-states, ...), the cost of using it in the wake-up path can become prohibitive.
-The energy-aware wake-up algorithm has a complexity of:
-
-	C = Nd * (Nc + Ns)
-
-with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
-total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
-
-A complexity check is performed at the root domain level, when scheduling
-domains are built. EAS will not start on a root domain if its C happens to be
-higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
-time of writing).
-
-If you really want to use EAS but the complexity of your platform's Energy
-Model is too high to be used with a single root domain, you're left with only
-two possible options:
-
-    1. split your system into separate, smaller, root domains using exclusive
-       cpusets and enable EAS locally on each of them. This option has the
-       benefit to work out of the box but the drawback of preventing load
-       balance between root domains, which can result in an unbalanced system
-       overall;
-    2. submit patches to reduce the complexity of the EAS wake-up algorithm,
-       hence enabling it to cope with larger EMs in reasonable time.
-
-
-6.4 - Schedutil governor
+6.3 - Schedutil governor
 ^^^^^^^^^^^^^^^^^^^^^^^^
 
 EAS tries to predict at which OPP will the CPUs be running in the close future
@@ -405,7 +374,7 @@ frequency requests and energy predictions.
 Using EAS with any other governor than schedutil is not supported.
 
 
-6.5 Scale-invariant utilization signals
+6.4 Scale-invariant utilization signals
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 In order to make accurate prediction across CPUs and for all performance
@@ -417,7 +386,7 @@ Using EAS on a platform that doesn't implement these two callbacks is not
 supported.
 
 
-6.6 Multithreading (SMT)
+6.5 Multithreading (SMT)
 ^^^^^^^^^^^^^^^^^^^^^^^^
 
 EAS in its current form is SMT unaware and is not able to leverage
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8739c2a5a54e..ce2fa85b2362 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -346,32 +346,13 @@ static void sched_energy_set(bool has_eas)
  *    1. an Energy Model (EM) is available;
  *    2. the SD_ASYM_CPUCAPACITY flag is set in the sched_domain hierarchy.
  *    3. no SMT is detected.
- *    4. the EM complexity is low enough to keep scheduling overheads low;
- *    5. schedutil is driving the frequency of all CPUs of the rd;
- *    6. frequency invariance support is present;
- *
- * The complexity of the Energy Model is defined as:
- *
- *              C = nr_pd * (nr_cpus + nr_ps)
- *
- * with parameters defined as:
- *  - nr_pd:    the number of performance domains
- *  - nr_cpus:  the number of CPUs
- *  - nr_ps:    the sum of the number of performance states of all performance
- *              domains (for example, on a system with 2 performance domains,
- *              with 10 performance states each, nr_ps = 2 * 10 = 20).
- *
- * It is generally not a good idea to use such a model in the wake-up path on
- * very complex platforms because of the associated scheduling overheads. The
- * arbitrary constraint below prevents that. It makes EAS usable up to 16 CPUs
- * with per-CPU DVFS and less than 8 performance states each, for example.
+ *    4. schedutil is driving the frequency of all CPUs of the rd;
+ *    5. frequency invariance support is present;
  */
-#define EM_MAX_COMPLEXITY 2048
-
 extern struct cpufreq_governor schedutil_gov;
 static bool build_perf_domains(const struct cpumask *cpu_map)
 {
-	int i, nr_pd = 0, nr_ps = 0, nr_cpus = cpumask_weight(cpu_map);
+	int i;
 	struct perf_domain *pd = NULL, *tmp;
 	int cpu = cpumask_first(cpu_map);
 	struct root_domain *rd = cpu_rq(cpu)->rd;
@@ -429,20 +410,6 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
 			goto free;
 		tmp->next = pd;
 		pd = tmp;
-
-		/*
-		 * Count performance domains and performance states for the
-		 * complexity check.
-		 */
-		nr_pd++;
-		nr_ps += em_pd_nr_perf_states(pd->em_pd);
-	}
-
-	/* Bail out if the Energy Model complexity is too high. */
-	if (nr_pd * (nr_ps + nr_cpus) > EM_MAX_COMPLEXITY) {
-		WARN(1, "rd %*pbl: Failed to start EAS, EM complexity is too high\n",
-						cpumask_pr_args(cpu_map));
-		goto free;
 	}
 
 	perf_domain_debug(cpu_map, pd);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/topology: Remove EM_MAX_COMPLEXITY limit
  2022-08-12 10:16 [PATCH] sched/topology: Remove EM_MAX_COMPLEXITY limit Pierre Gondois
@ 2022-08-17 14:21 ` Ionela Voinescu
  2022-08-17 15:03   ` Pierre Gondois
  2022-08-18 12:19 ` Dietmar Eggemann
  2022-10-26 12:23 ` Lukasz Luba
  2 siblings, 1 reply; 5+ messages in thread
From: Ionela Voinescu @ 2022-08-17 14:21 UTC (permalink / raw)
  To: Pierre Gondois
  Cc: linux-kernel, Lukasz.Luba, Jonathan Corbet, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-doc

Hi Pierre,

On Friday 12 Aug 2022 at 12:16:19 (+0200), Pierre Gondois wrote:
> From: Pierre Gondois <Pierre.Gondois@arm.com>
> 
> The Energy Aware Scheduler (EAS) estimates the energy consumption
> of placing a task on different CPUs. The goal is to minimize this
> energy consumption. Estimating the energy of different task placements
> is increasingly complex with the size of the platform. To avoid having
> a slow wake-up path, EAS is only enabled if this complexity is low
> enough.
> 
> The current complexity limit was set in:
> commit b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate
> platforms").
> base on the first implementation of EAS, which was re-computing
> the power of the whole platform for each task placement scenario, cf:
> commit 390031e4c309 ("sched/fair: Introduce an energy estimation helper
> function").
> but the complexity of EAS was reduced in:
> commit eb92692b2544d ("sched/fair: Speed-up energy-aware wake-ups")
> and find_energy_efficient_cpu() (feec) algorithm was updated in:
> commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective
> utilization in feec()")
> 
> find_energy_efficient_cpu() (feec) is now doing:
> feec()
> \_ for_each_pd(pd) [0]
>   // get max_spare_cap_cpu and compute_prev_delta
>   \_ for_each_cpu(pd) [1]
> 
>   \_ get_pd_busy_time(pd) [2]
>     \_ for_each_cpu(pd)
> 
>   // evaluate pd energy without the task
>   \_ get_pd_max_util(pd, -1) [3.0]
>     \_ for_each_cpu(pd)
>   \_ compute_energy(pd, -1)
>     \_ for_each_ps(pd)
> 
>   // evaluate pd energy with the task on prev_cpu
>   \_ get_pd_max_util(pd, prev_cpu) [3.1]
>     \_ for_each_cpu(pd)
>   \_ compute_energy(pd, prev_cpu)
>     \_ for_each_ps(pd)
> 
>   // evaluate pd energy with the task on max_spare_cap_cpu
>   \_ get_pd_max_util(pd, max_spare_cap_cpu) [3.2]
>     \_ for_each_cpu(pd)
>   \_ compute_energy(pd, max_spare_cap_cpu)
>     \_ for_each_ps(pd)
> 
> [3.1] happens only once since prev_cpu is unique. To have an upper
> bound of the complexity, [3.1] is taken into account for all pds.
> So with the same definitions for nr_pd, nr_cpus and nr_ps,
> the complexity is of:
> nr_pd * (2 * [nr_cpus in pd] + 3 * ([nr_cpus in pd] + [nr_ps in pd]))
>  [0]  * (     [1] + [2]      +       [3.0] + [3.1] + [3.2]          )
> = 5 * nr_cpus + 3 * nr_ps
> 

I just want to draw your attention to [1] and the fact that the
structure of the function changed. Your calculations largely remain the
same - 3 calls to compute_energy() which in turn now calls
eenv_pd_max_util() with operations for each cpu, plus some scattered
calls to eenv_pd_busy_time(), all for each pd.

[1]
https://lore.kernel.org/lkml/20220621090414.433602-7-vdonnefort@google.com/

Thanks,
Ionela.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/topology: Remove EM_MAX_COMPLEXITY limit
  2022-08-17 14:21 ` Ionela Voinescu
@ 2022-08-17 15:03   ` Pierre Gondois
  0 siblings, 0 replies; 5+ messages in thread
From: Pierre Gondois @ 2022-08-17 15:03 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: linux-kernel, Lukasz.Luba, Jonathan Corbet, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-doc

Hi Ionela,

On 8/17/22 16:21, Ionela Voinescu wrote:
> Hi Pierre,
> 
> On Friday 12 Aug 2022 at 12:16:19 (+0200), Pierre Gondois wrote:
>> From: Pierre Gondois <Pierre.Gondois@arm.com>
>>
>> The Energy Aware Scheduler (EAS) estimates the energy consumption
>> of placing a task on different CPUs. The goal is to minimize this
>> energy consumption. Estimating the energy of different task placements
>> is increasingly complex with the size of the platform. To avoid having
>> a slow wake-up path, EAS is only enabled if this complexity is low
>> enough.
>>
>> The current complexity limit was set in:
>> commit b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate
>> platforms").
>> base on the first implementation of EAS, which was re-computing
>> the power of the whole platform for each task placement scenario, cf:
>> commit 390031e4c309 ("sched/fair: Introduce an energy estimation helper
>> function").
>> but the complexity of EAS was reduced in:
>> commit eb92692b2544d ("sched/fair: Speed-up energy-aware wake-ups")
>> and find_energy_efficient_cpu() (feec) algorithm was updated in:
>> commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective
>> utilization in feec()")
>>
>> find_energy_efficient_cpu() (feec) is now doing:
>> feec()
>> \_ for_each_pd(pd) [0]
>>    // get max_spare_cap_cpu and compute_prev_delta
>>    \_ for_each_cpu(pd) [1]
>>
>>    \_ get_pd_busy_time(pd) [2]
>>      \_ for_each_cpu(pd)
>>
>>    // evaluate pd energy without the task
>>    \_ get_pd_max_util(pd, -1) [3.0]
>>      \_ for_each_cpu(pd)
>>    \_ compute_energy(pd, -1)
>>      \_ for_each_ps(pd)
>>
>>    // evaluate pd energy with the task on prev_cpu
>>    \_ get_pd_max_util(pd, prev_cpu) [3.1]
>>      \_ for_each_cpu(pd)
>>    \_ compute_energy(pd, prev_cpu)
>>      \_ for_each_ps(pd)
>>
>>    // evaluate pd energy with the task on max_spare_cap_cpu
>>    \_ get_pd_max_util(pd, max_spare_cap_cpu) [3.2]
>>      \_ for_each_cpu(pd)
>>    \_ compute_energy(pd, max_spare_cap_cpu)
>>      \_ for_each_ps(pd)
>>
>> [3.1] happens only once since prev_cpu is unique. To have an upper
>> bound of the complexity, [3.1] is taken into account for all pds.
>> So with the same definitions for nr_pd, nr_cpus and nr_ps,
>> the complexity is of:
>> nr_pd * (2 * [nr_cpus in pd] + 3 * ([nr_cpus in pd] + [nr_ps in pd]))
>>   [0]  * (     [1] + [2]      +       [3.0] + [3.1] + [3.2]          )
>> = 5 * nr_cpus + 3 * nr_ps
>>
> 
> I just want to draw your attention to [1] and the fact that the
> structure of the function changed. Your calculations largely remain the
> same - 3 calls to compute_energy() which in turn now calls
> eenv_pd_max_util() with operations for each cpu, plus some scattered
> calls to eenv_pd_busy_time(), all for each pd.

Yes indeed, there is:
s/get_pd_max_util/eenv_pd_max_util

and also as you spotted, the following pattern:
\_ eenv_pd_max_util(pd, dst_cpu)
   \_ for_each_cpu(pd)
\_ compute_energy(pd, dst_cpu)
   \_ for_each_ps(pd)

should actually be:
\_ compute_energy(pd, dst_cpu)
   \_ eenv_pd_max_util(pd, dst_cpu)
     \_ for_each_cpu(pd)
   \_ for_each_ps(pd)

Thanks,
Pierre

> 
> [1]
> https://lore.kernel.org/lkml/20220621090414.433602-7-vdonnefort@google.com/
> 
> Thanks,
> Ionela.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/topology: Remove EM_MAX_COMPLEXITY limit
  2022-08-12 10:16 [PATCH] sched/topology: Remove EM_MAX_COMPLEXITY limit Pierre Gondois
  2022-08-17 14:21 ` Ionela Voinescu
@ 2022-08-18 12:19 ` Dietmar Eggemann
  2022-10-26 12:23 ` Lukasz Luba
  2 siblings, 0 replies; 5+ messages in thread
From: Dietmar Eggemann @ 2022-08-18 12:19 UTC (permalink / raw)
  To: Pierre Gondois, linux-kernel
  Cc: Ionela.Voinescu, Lukasz.Luba, Jonathan Corbet, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, linux-doc

On 12/08/2022 12:16, Pierre Gondois wrote:
> From: Pierre Gondois <Pierre.Gondois@arm.com>

[...]

> find_energy_efficient_cpu() (feec) is now doing:
> feec()
> \_ for_each_pd(pd) [0]
>   // get max_spare_cap_cpu and compute_prev_delta
>   \_ for_each_cpu(pd) [1]
> 
>   \_ get_pd_busy_time(pd) [2]
>     \_ for_each_cpu(pd)
> 
>   // evaluate pd energy without the task
>   \_ get_pd_max_util(pd, -1) [3.0]
>     \_ for_each_cpu(pd)
>   \_ compute_energy(pd, -1)
>     \_ for_each_ps(pd)
> 
>   // evaluate pd energy with the task on prev_cpu
>   \_ get_pd_max_util(pd, prev_cpu) [3.1]
>     \_ for_each_cpu(pd)
>   \_ compute_energy(pd, prev_cpu)
>     \_ for_each_ps(pd)
> 
>   // evaluate pd energy with the task on max_spare_cap_cpu
>   \_ get_pd_max_util(pd, max_spare_cap_cpu) [3.2]
>     \_ for_each_cpu(pd)
>   \_ compute_energy(pd, max_spare_cap_cpu)
>     \_ for_each_ps(pd)
> 
> [3.1] happens only once since prev_cpu is unique. To have an upper
> bound of the complexity, [3.1] is taken into account for all pds.
> So with the same definitions for nr_pd, nr_cpus and nr_ps,
> the complexity is of:
> nr_pd * (2 * [nr_cpus in pd] + 3 * ([nr_cpus in pd] + [nr_ps in pd]))
>  [0]  * (     [1] + [2]      +       [3.0] + [3.1] + [3.2]          )
> = 5 * nr_cpus + 3 * nr_ps
> 
> The complexity limit was set to 2048 in:
> commit b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate
> platforms")
> to make "EAS usable up to 16 CPUs with per-CPU DVFS and less than 8
> performance states each". For the same platform, the complexity would
> actually be of:
> 5 * 16 + 3 * 7 = 101

This is somewhat hard to grasp.

Example: 16 CPUs w/ per-CPU DVFS and < 8 performance states (OPPs) each

C  : Complexity

Nc : #CPUs in system
Ns : Sum of PSs (Performance States) over all PDs
Nd : #PDs

Nc' : #CPUs in PD
Ns' : #PSs in PD

(1) Currently we have:

    C = Nd * (Nc + Ns)

    Nc = 16, Nd = 16, Ns = 16 * 7

    C = 16 * (16 + 16 * 7)

      = 2048

(2) Your new formula is:

    Nc' = 1, Ns' = 7

    C = Nd * (2 * Nc' + 3 * (Nc' + Ns'))

      = Nd * (5 * Nc' + 3 * Ns')

      = 16 * (5 * 1 + 3 * 7)

      = 416

      = 5 * Nc + 3 * Ns

I would update the example and leave C ~ at 2048.

> Since the EAS complexity was greatly reduced, bigger platforms can
> handle EAS. For instance, a platform with 256 CPUs with 256
> performance states each would reach it. To reflect this improvement,
> remove the EAS complexity check.
> 
> Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>

We should definitely align feec()'s implementation with the EM
complexity check and documentation. I would suggest that we keep both in
place but we update them.

> ---
>  Documentation/scheduler/sched-energy.rst | 37 ++--------------------
>  kernel/sched/topology.c                  | 39 ++----------------------
>  2 files changed, 6 insertions(+), 70 deletions(-)
> 
> diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst
> index 8fbce5e767d9..3d1d71134d16 100644
> --- a/Documentation/scheduler/sched-energy.rst
> +++ b/Documentation/scheduler/sched-energy.rst
> @@ -356,38 +356,7 @@ placement. For EAS it doesn't matter whether the EM power values are expressed
>  in milli-Watts or in an 'abstract scale'.
>  
>  
> -6.3 - Energy Model complexity
> -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> -
> -The task wake-up path is very latency-sensitive. When the EM of a platform is
> -too complex (too many CPUs, too many performance domains, too many performance
> -states, ...), the cost of using it in the wake-up path can become prohibitive.
> -The energy-aware wake-up algorithm has a complexity of:
> -
> -	C = Nd * (Nc + Ns)
> -
> -with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
> -total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
> -
> -A complexity check is performed at the root domain level, when scheduling
> -domains are built. EAS will not start on a root domain if its C happens to be
> -higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
> -time of writing).
> -
> -If you really want to use EAS but the complexity of your platform's Energy
> -Model is too high to be used with a single root domain, you're left with only
> -two possible options:
> -
> -    1. split your system into separate, smaller, root domains using exclusive
> -       cpusets and enable EAS locally on each of them. This option has the
> -       benefit to work out of the box but the drawback of preventing load
> -       balance between root domains, which can result in an unbalanced system
> -       overall;
> -    2. submit patches to reduce the complexity of the EAS wake-up algorithm,
> -       hence enabling it to cope with larger EMs in reasonable time.
> -
> -

I see value in this paragraph. Obviously it has to match the actual
feec() implementation.

[...]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/topology: Remove EM_MAX_COMPLEXITY limit
  2022-08-12 10:16 [PATCH] sched/topology: Remove EM_MAX_COMPLEXITY limit Pierre Gondois
  2022-08-17 14:21 ` Ionela Voinescu
  2022-08-18 12:19 ` Dietmar Eggemann
@ 2022-10-26 12:23 ` Lukasz Luba
  2 siblings, 0 replies; 5+ messages in thread
From: Lukasz Luba @ 2022-10-26 12:23 UTC (permalink / raw)
  To: Pierre Gondois
  Cc: Ionela.Voinescu, Jonathan Corbet, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, linux-doc, linux-kernel

Hi Pierre,

On 8/12/22 11:16, Pierre Gondois wrote:
> From: Pierre Gondois <Pierre.Gondois@arm.com>
> 
> The Energy Aware Scheduler (EAS) estimates the energy consumption
> of placing a task on different CPUs. The goal is to minimize this
> energy consumption. Estimating the energy of different task placements
> is increasingly complex with the size of the platform. To avoid having
> a slow wake-up path, EAS is only enabled if this complexity is low
> enough.
> 
> The current complexity limit was set in:
> commit b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate
> platforms").
> base on the first implementation of EAS, which was re-computing
> the power of the whole platform for each task placement scenario, cf:
> commit 390031e4c309 ("sched/fair: Introduce an energy estimation helper
> function").
> but the complexity of EAS was reduced in:
> commit eb92692b2544d ("sched/fair: Speed-up energy-aware wake-ups")
> and find_energy_efficient_cpu() (feec) algorithm was updated in:
> commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective
> utilization in feec()")
> 
> find_energy_efficient_cpu() (feec) is now doing:
> feec()
> \_ for_each_pd(pd) [0]
>    // get max_spare_cap_cpu and compute_prev_delta
>    \_ for_each_cpu(pd) [1]
> 
>    \_ get_pd_busy_time(pd) [2]
>      \_ for_each_cpu(pd)
> 
>    // evaluate pd energy without the task
>    \_ get_pd_max_util(pd, -1) [3.0]
>      \_ for_each_cpu(pd)
>    \_ compute_energy(pd, -1)
>      \_ for_each_ps(pd)
> 
>    // evaluate pd energy with the task on prev_cpu
>    \_ get_pd_max_util(pd, prev_cpu) [3.1]
>      \_ for_each_cpu(pd)
>    \_ compute_energy(pd, prev_cpu)
>      \_ for_each_ps(pd)
> 
>    // evaluate pd energy with the task on max_spare_cap_cpu
>    \_ get_pd_max_util(pd, max_spare_cap_cpu) [3.2]
>      \_ for_each_cpu(pd)
>    \_ compute_energy(pd, max_spare_cap_cpu)
>      \_ for_each_ps(pd)
> 
> [3.1] happens only once since prev_cpu is unique. To have an upper
> bound of the complexity, [3.1] is taken into account for all pds.
> So with the same definitions for nr_pd, nr_cpus and nr_ps,
> the complexity is of:
> nr_pd * (2 * [nr_cpus in pd] + 3 * ([nr_cpus in pd] + [nr_ps in pd]))
>   [0]  * (     [1] + [2]      +       [3.0] + [3.1] + [3.2]          )
> = 5 * nr_cpus + 3 * nr_ps
> 
> The complexity limit was set to 2048 in:
> commit b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate
> platforms")
> to make "EAS usable up to 16 CPUs with per-CPU DVFS and less than 8
> performance states each". For the same platform, the complexity would
> actually be of:
> 5 * 16 + 3 * 7 = 101
> 
> Since the EAS complexity was greatly reduced, bigger platforms can
> handle EAS. For instance, a platform with 256 CPUs with 256
> performance states each would reach it. To reflect this improvement,
> remove the EAS complexity check.
> 
> Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>
> ---
>   Documentation/scheduler/sched-energy.rst | 37 ++--------------------
>   kernel/sched/topology.c                  | 39 ++----------------------
>   2 files changed, 6 insertions(+), 70 deletions(-)
> 

The patch looks good for both: documentation bit and code removal.

We have a new safety checks inside the Energy Model during the setup
of EM for perf domian, even a more strict and precised (32bit arch or
64bit arch) to no overflow in our calculations (when we estimate
energy). This is documented in the Energy Model, so IMO you can easily
drop this paragraph as the patch does. The same applies to the checks
in the code.

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-26 12:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 10:16 [PATCH] sched/topology: Remove EM_MAX_COMPLEXITY limit Pierre Gondois
2022-08-17 14:21 ` Ionela Voinescu
2022-08-17 15:03   ` Pierre Gondois
2022-08-18 12:19 ` Dietmar Eggemann
2022-10-26 12:23 ` Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).