linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Improve EAS energy estimation and increase precision
@ 2021-06-25 15:26 Lukasz Luba
  2021-06-25 15:26 ` [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy Lukasz Luba
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-06-25 15:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris.Redpath, lukasz.luba, dietmar.eggemann, morten.rasmussen,
	qperret, linux-pm, peterz, rjw, viresh.kumar, vincent.guittot,
	mingo, juri.lelli, rostedt, segall, mgorman, bristot, CCj.Yeh

Hi all,

The patch set aims to address a scenario for Energy Aware Scheduler,
where we estimate and compare energy values and miss a more precised results.
In some use cases estimations for two CPUs might give the same values
for a given task and it's utilization. Those values would be different
when we have a better precision and avoid this rounding error.
Thus, the decision of choosing a CPU for a waking-up task might also
be better.

We have received this feedback from our partners.

Address this rounding error issue and increase the precision of Energy Model
em_perf_state::cost values. This change should not affect other
subsystems in kernel: thermal IPA, PowerCap DTPM, etc, since they use
em_perf_state::power field, which is not touched. It also doesn't
trigger the need for updating all existing platforms to register EM and report
power values in different scale.

Regards,
Lukasz

Lukasz Luba (3):
  sched/fair: Prepare variables for increased precision of EAS estimated
    energy
  PM: EM: Make em_cpu_energy() able to return bigger values
  PM: EM: Increase energy calculation precision

 include/linux/energy_model.h | 11 +++++++----
 kernel/power/energy_model.c  |  3 ++-
 kernel/sched/fair.c          | 13 +++++++------
 3 files changed, 16 insertions(+), 11 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-06-25 15:26 [PATCH 0/3] Improve EAS energy estimation and increase precision Lukasz Luba
@ 2021-06-25 15:26 ` Lukasz Luba
  2021-06-30 17:01   ` Rafael J. Wysocki
  2021-07-07  7:07   ` Vincent Guittot
  2021-06-25 15:26 ` [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values Lukasz Luba
  2021-06-25 15:26 ` [PATCH 3/3] PM: EM: Increase energy calculation precision Lukasz Luba
  2 siblings, 2 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-06-25 15:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris.Redpath, lukasz.luba, dietmar.eggemann, morten.rasmussen,
	qperret, linux-pm, peterz, rjw, viresh.kumar, vincent.guittot,
	mingo, juri.lelli, rostedt, segall, mgorman, bristot, CCj.Yeh

The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
task. It probes many possibilities and compares the estimated energy values
for different scenarios. For calculating those energy values it relies on
Energy Model (EM) data and em_cpu_energy(). The precision which is used in
EM data is in milli-Watts (or abstract scale), which sometimes is not
sufficient. In some cases it might happen that two CPUs from different
Performance Domains (PDs) get the same calculated value for a given task
placement, but in more precised scale, they might differ. This rounding
error has to be addressed. This patch prepares EAS code for better
precision in the coming EM improvements.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/sched/fair.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7b8990fd4896..b517c9e79768 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6582,7 +6582,7 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
  * to compute what would be the energy if we decided to actually migrate that
  * task.
  */
-static long
+static u64
 compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
 {
 	struct cpumask *pd_mask = perf_domain_span(pd);
@@ -6689,12 +6689,13 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
  */
 static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 {
-	unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
 	struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
+	u64 prev_delta = ULLONG_MAX, best_delta = ULLONG_MAX;
 	int cpu, best_energy_cpu = prev_cpu, target = -1;
-	unsigned long cpu_cap, util, base_energy = 0;
+	unsigned long cpu_cap, util;
 	struct sched_domain *sd;
 	struct perf_domain *pd;
+	u64 base_energy = 0;
 
 	rcu_read_lock();
 	pd = rcu_dereference(rd->pd);
@@ -6718,9 +6719,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 		goto unlock;
 
 	for (; pd; pd = pd->next) {
-		unsigned long cur_delta, spare_cap, max_spare_cap = 0;
+		unsigned long spare_cap, max_spare_cap = 0;
 		bool compute_prev_delta = false;
-		unsigned long base_energy_pd;
+		u64 base_energy_pd, cur_delta;
 		int max_spare_cap_cpu = -1;
 
 		for_each_cpu_and(cpu, perf_domain_span(pd), sched_domain_span(sd)) {
@@ -6790,7 +6791,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 	 * Pick the best CPU if prev_cpu cannot be used, or if it saves at
 	 * least 6% of the energy used by prev_cpu.
 	 */
-	if ((prev_delta == ULONG_MAX) ||
+	if ((prev_delta == ULLONG_MAX) ||
 	    (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
 		target = best_energy_cpu;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values
  2021-06-25 15:26 [PATCH 0/3] Improve EAS energy estimation and increase precision Lukasz Luba
  2021-06-25 15:26 ` [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy Lukasz Luba
@ 2021-06-25 15:26 ` Lukasz Luba
  2021-07-05 12:44   ` Dietmar Eggemann
  2021-07-07  7:07   ` Peter Zijlstra
  2021-06-25 15:26 ` [PATCH 3/3] PM: EM: Increase energy calculation precision Lukasz Luba
  2 siblings, 2 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-06-25 15:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris.Redpath, lukasz.luba, dietmar.eggemann, morten.rasmussen,
	qperret, linux-pm, peterz, rjw, viresh.kumar, vincent.guittot,
	mingo, juri.lelli, rostedt, segall, mgorman, bristot, CCj.Yeh

The Energy Model (EM) em_cpu_energy() is responsible for providing good
estimation regarding CPUs energy. It contains proper data structures which
are then used during calculation. The values stored in there are in
milli-Watts precision (or in abstract scale) smaller that 0xffff, which use
sufficient unsigned long even on 32-bit machines. There are scenarios where
we would like to provide calculated estimations in a better precision and
the values might be 1000 times bigger. This patch makes possible to use
quite big values for also 32-bit machines.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 3f221dbf5f95..2016f5a706e0 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -101,7 +101,7 @@ void em_dev_unregister_perf_domain(struct device *dev);
  * Return: the sum of the energy consumed by the CPUs of the domain assuming
  * a capacity state satisfying the max utilization of the domain.
  */
-static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
+static inline u64 em_cpu_energy(struct em_perf_domain *pd,
 				unsigned long max_util, unsigned long sum_util,
 				unsigned long allowed_cpu_cap)
 {
@@ -180,7 +180,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 *   pd_nrg = ------------------------                       (4)
 	 *                  scale_cpu
 	 */
-	return ps->cost * sum_util / scale_cpu;
+	return div_u64((u64)ps->cost * sum_util, scale_cpu);
 }
 
 /**
@@ -217,7 +217,7 @@ static inline struct em_perf_domain *em_pd_get(struct device *dev)
 {
 	return NULL;
 }
-static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
+static inline u64 em_cpu_energy(struct em_perf_domain *pd,
 			unsigned long max_util, unsigned long sum_util,
 			unsigned long allowed_cpu_cap)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 3/3] PM: EM: Increase energy calculation precision
  2021-06-25 15:26 [PATCH 0/3] Improve EAS energy estimation and increase precision Lukasz Luba
  2021-06-25 15:26 ` [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy Lukasz Luba
  2021-06-25 15:26 ` [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values Lukasz Luba
@ 2021-06-25 15:26 ` Lukasz Luba
  2021-07-05 12:45   ` Dietmar Eggemann
  2 siblings, 1 reply; 33+ messages in thread
From: Lukasz Luba @ 2021-06-25 15:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris.Redpath, lukasz.luba, dietmar.eggemann, morten.rasmussen,
	qperret, linux-pm, peterz, rjw, viresh.kumar, vincent.guittot,
	mingo, juri.lelli, rostedt, segall, mgorman, bristot, CCj.Yeh

The Energy Model (EM) provides useful information about device power in
each performance state to other subsystems like: Energy Aware Scheduler
(EAS). The energy calculation in EAS does arithmetic operation based on
the EM em_cpu_energy(). Current implementation of that function uses
em_perf_state::cost as a pre-computed cost coefficient equal to:
cost = power * max_frequency / frequency.
The 'power' is expressed in milli-Watts (or in abstract scale).

There are corner cases then the EAS energy calculation for two Performance
Domains (PDs) return the same value, e.g. 10mW. The EAS compares these
values to choose smaller one. It might happen that this values are equal
due to rounding error. In such scenario, we need better precision, e.g.
10000 times better. To provide this possibility increase the precision on
the em_perf_state::cost.

This patch allows to avoid the rounding to milli-Watt errors, which might
occur in EAS energy estimation for each Performance Domains (PD). The
rounding error is common for small tasks which have small utilization
values.

The rest of the EM code doesn't change, em_perf_state::power is still
expressed in milli-Watts (or in abstract scale). Thus, all existing
platforms don't have to change their reported power. The same applies to
EM clients, like thermal or DTPM (they use em_perf_state::power).

Reported-by: CCJ Yeh <CCj.Yeh@mediatek.com>
Suggested-by: CCJ Yeh <CCj.Yeh@mediatek.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 5 ++++-
 kernel/power/energy_model.c  | 3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 2016f5a706e0..91037dd57e61 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -16,7 +16,10 @@
  * @power:	The power consumed at this level (by 1 CPU or by a registered
  *		device). It can be a total power: static and dynamic.
  * @cost:	The cost coefficient associated with this level, used during
- *		energy calculation. Equal to: power * max_frequency / frequency
+ *		energy calculation. Equal to:
+		power * 10000 * max_frequency / frequency
+ *		To increase the energy estimation presision use different
+ *		scale in this coefficient than in @power field.
  */
 struct em_perf_state {
 	unsigned long frequency;
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 0f4530b3a8cd..2724f0ac417d 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -170,7 +170,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 	/* Compute the cost of each performance state. */
 	fmax = (u64) table[nr_states - 1].frequency;
 	for (i = 0; i < nr_states; i++) {
-		table[i].cost = div64_u64(fmax * table[i].power,
+		u64 power_res = (u64)table[i].power * 10000;
+		table[i].cost = div64_u64(fmax * power_res,
 					  table[i].frequency);
 	}
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-06-25 15:26 ` [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy Lukasz Luba
@ 2021-06-30 17:01   ` Rafael J. Wysocki
  2021-06-30 17:28     ` Lukasz Luba
  2021-07-07  7:07   ` Vincent Guittot
  1 sibling, 1 reply; 33+ messages in thread
From: Rafael J. Wysocki @ 2021-06-30 17:01 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Linux Kernel Mailing List, Chris Redpath, Dietmar Eggemann,
	Morten Rasmussen, Quentin Perret, Linux PM, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Vincent Guittot, Ingo Molnar,
	Juri Lelli, Steven Rostedt, segall, Mel Gorman,
	Daniel Bristot de Oliveira, CCj.Yeh

On Fri, Jun 25, 2021 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
> task. It probes many possibilities and compares the estimated energy values
> for different scenarios. For calculating those energy values it relies on
> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
> EM data is in milli-Watts (or abstract scale), which sometimes is not
> sufficient. In some cases it might happen that two CPUs from different
> Performance Domains (PDs) get the same calculated value for a given task
> placement, but in more precised scale, they might differ. This rounding
> error has to be addressed. This patch prepares EAS code for better
> precision in the coming EM improvements.
>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

If you want me to pick up this series, this patch requires an ACK from
the scheduler maintainers.

> ---
>  kernel/sched/fair.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7b8990fd4896..b517c9e79768 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6582,7 +6582,7 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
>   * to compute what would be the energy if we decided to actually migrate that
>   * task.
>   */
> -static long
> +static u64
>  compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
>  {
>         struct cpumask *pd_mask = perf_domain_span(pd);
> @@ -6689,12 +6689,13 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
>   */
>  static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  {
> -       unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
>         struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
> +       u64 prev_delta = ULLONG_MAX, best_delta = ULLONG_MAX;
>         int cpu, best_energy_cpu = prev_cpu, target = -1;
> -       unsigned long cpu_cap, util, base_energy = 0;
> +       unsigned long cpu_cap, util;
>         struct sched_domain *sd;
>         struct perf_domain *pd;
> +       u64 base_energy = 0;
>
>         rcu_read_lock();
>         pd = rcu_dereference(rd->pd);
> @@ -6718,9 +6719,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>                 goto unlock;
>
>         for (; pd; pd = pd->next) {
> -               unsigned long cur_delta, spare_cap, max_spare_cap = 0;
> +               unsigned long spare_cap, max_spare_cap = 0;
>                 bool compute_prev_delta = false;
> -               unsigned long base_energy_pd;
> +               u64 base_energy_pd, cur_delta;
>                 int max_spare_cap_cpu = -1;
>
>                 for_each_cpu_and(cpu, perf_domain_span(pd), sched_domain_span(sd)) {
> @@ -6790,7 +6791,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>          * Pick the best CPU if prev_cpu cannot be used, or if it saves at
>          * least 6% of the energy used by prev_cpu.
>          */
> -       if ((prev_delta == ULONG_MAX) ||
> +       if ((prev_delta == ULLONG_MAX) ||
>             (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
>                 target = best_energy_cpu;
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-06-30 17:01   ` Rafael J. Wysocki
@ 2021-06-30 17:28     ` Lukasz Luba
  2021-07-02 19:07       ` Lukasz Luba
  0 siblings, 1 reply; 33+ messages in thread
From: Lukasz Luba @ 2021-06-30 17:28 UTC (permalink / raw)
  To: Rafael J. Wysocki, Peter Zijlstra
  Cc: Linux Kernel Mailing List, Chris Redpath, Dietmar Eggemann,
	Morten Rasmussen, Quentin Perret, Linux PM, Rafael J. Wysocki,
	Viresh Kumar, Vincent Guittot, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 6/30/21 6:01 PM, Rafael J. Wysocki wrote:
> On Fri, Jun 25, 2021 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
>> task. It probes many possibilities and compares the estimated energy values
>> for different scenarios. For calculating those energy values it relies on
>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
>> EM data is in milli-Watts (or abstract scale), which sometimes is not
>> sufficient. In some cases it might happen that two CPUs from different
>> Performance Domains (PDs) get the same calculated value for a given task
>> placement, but in more precised scale, they might differ. This rounding
>> error has to be addressed. This patch prepares EAS code for better
>> precision in the coming EM improvements.
>>
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> 
> If you want me to pick up this series, this patch requires an ACK from
> the scheduler maintainers.
> 

It would be great, if you could take it after e.g. Peter ACK it.

Peter could you have a look at it, please?
In this patch 1/3 we have only variables upgrade.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-06-30 17:28     ` Lukasz Luba
@ 2021-07-02 19:07       ` Lukasz Luba
  0 siblings, 0 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-07-02 19:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Chris Redpath,
	Dietmar Eggemann, Morten Rasmussen, Quentin Perret, Linux PM,
	Rafael J. Wysocki, Viresh Kumar, Vincent Guittot, Ingo Molnar,
	Juri Lelli, Steven Rostedt, segall, Mel Gorman,
	Daniel Bristot de Oliveira, CCj.Yeh

Hi Peter,

Gentle ping.
You might missed my previous email.

On 6/30/21 6:28 PM, Lukasz Luba wrote:
> 
> 
> On 6/30/21 6:01 PM, Rafael J. Wysocki wrote:
>> On Fri, Jun 25, 2021 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>
>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
>>> task. It probes many possibilities and compares the estimated energy 
>>> values
>>> for different scenarios. For calculating those energy values it 
>>> relies on
>>> Energy Model (EM) data and em_cpu_energy(). The precision which is 
>>> used in
>>> EM data is in milli-Watts (or abstract scale), which sometimes is not
>>> sufficient. In some cases it might happen that two CPUs from different
>>> Performance Domains (PDs) get the same calculated value for a given task
>>> placement, but in more precised scale, they might differ. This rounding
>>> error has to be addressed. This patch prepares EAS code for better
>>> precision in the coming EM improvements.
>>>
>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>>
>> If you want me to pick up this series, this patch requires an ACK from
>> the scheduler maintainers.
>>
> 
> It would be great, if you could take it after e.g. Peter ACK it.
> 
> Peter could you have a look at it, please?
> In this patch 1/3 we have only variables upgrade.
> 
> Regards,
> Lukasz

Could you have a look and ACK the patch 1/3, please?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values
  2021-06-25 15:26 ` [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values Lukasz Luba
@ 2021-07-05 12:44   ` Dietmar Eggemann
  2021-07-06 19:44     ` Lukasz Luba
  2021-07-07  7:07   ` Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Dietmar Eggemann @ 2021-07-05 12:44 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel
  Cc: Chris.Redpath, morten.rasmussen, qperret, linux-pm, peterz, rjw,
	viresh.kumar, vincent.guittot, mingo, juri.lelli, rostedt,
	segall, mgorman, bristot, CCj.Yeh

On 25/06/2021 17:26, Lukasz Luba wrote:
> The Energy Model (EM) em_cpu_energy() is responsible for providing good
> estimation regarding CPUs energy. It contains proper data structures which
> are then used during calculation. The values stored in there are in
> milli-Watts precision (or in abstract scale) smaller that 0xffff, which use

I guess you refer to 'if (... || power > EM_MAX_POWER)' check in
em_create_perf_table() [kernel/power/energy_model.c].

> sufficient unsigned long even on 32-bit machines. There are scenarios where
                                                              ^^^^^^^^^

Can you describe these scenarios better with one example (EAS placement
of an example task on a 2 PD system) which highlights the issue and how
it this patch-set solves it?

In this example you can list all the things which must be there to
create a situation in EAS in which the patch-set helps.

> we would like to provide calculated estimations in a better precision and
> the values might be 1000 times bigger. This patch makes possible to use

Where is this `1000` coming from?

> quite big values for also 32-bit machines.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  include/linux/energy_model.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 3f221dbf5f95..2016f5a706e0 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -101,7 +101,7 @@ void em_dev_unregister_perf_domain(struct device *dev);
>   * Return: the sum of the energy consumed by the CPUs of the domain assuming
>   * a capacity state satisfying the max utilization of the domain.
>   */
> -static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
> +static inline u64 em_cpu_energy(struct em_perf_domain *pd,
>  				unsigned long max_util, unsigned long sum_util,
>  				unsigned long allowed_cpu_cap)
>  {
> @@ -180,7 +180,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>  	 *   pd_nrg = ------------------------                       (4)
>  	 *                  scale_cpu
>  	 */
> -	return ps->cost * sum_util / scale_cpu;
> +	return div_u64((u64)ps->cost * sum_util, scale_cpu);
>  }
>  
>  /**
> @@ -217,7 +217,7 @@ static inline struct em_perf_domain *em_pd_get(struct device *dev)
>  {
>  	return NULL;
>  }
> -static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
> +static inline u64 em_cpu_energy(struct em_perf_domain *pd,
>  			unsigned long max_util, unsigned long sum_util,
>  			unsigned long allowed_cpu_cap)
>  {
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] PM: EM: Increase energy calculation precision
  2021-06-25 15:26 ` [PATCH 3/3] PM: EM: Increase energy calculation precision Lukasz Luba
@ 2021-07-05 12:45   ` Dietmar Eggemann
  2021-07-06 19:51     ` Lukasz Luba
  0 siblings, 1 reply; 33+ messages in thread
From: Dietmar Eggemann @ 2021-07-05 12:45 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel
  Cc: Chris.Redpath, morten.rasmussen, qperret, linux-pm, peterz, rjw,
	viresh.kumar, vincent.guittot, mingo, juri.lelli, rostedt,
	segall, mgorman, bristot, CCj.Yeh

On 25/06/2021 17:26, Lukasz Luba wrote:
> The Energy Model (EM) provides useful information about device power in
> each performance state to other subsystems like: Energy Aware Scheduler
> (EAS). The energy calculation in EAS does arithmetic operation based on
> the EM em_cpu_energy(). Current implementation of that function uses
> em_perf_state::cost as a pre-computed cost coefficient equal to:
> cost = power * max_frequency / frequency.
> The 'power' is expressed in milli-Watts (or in abstract scale).
> 
> There are corner cases then the EAS energy calculation for two Performance
            ^^^^^^^^^^^^

Again, an easy to understand example to describe in which situation this
change would bring a benefit would help.

> Domains (PDs) return the same value, e.g. 10mW. The EAS compares these
> values to choose smaller one. It might happen that this values are equal
> due to rounding error. In such scenario, we need better precision, e.g.
> 10000 times better. To provide this possibility increase the precision on
> the em_perf_state::cost.
> 
> This patch allows to avoid the rounding to milli-Watt errors, which might
> occur in EAS energy estimation for each Performance Domains (PD). The
> rounding error is common for small tasks which have small utilization
> values.

What's the influence of the CPU utilization 'cpu_util_next()' here?

compute_energy()
    em_cpu_energy()
            return ps->cost * sum_util / scale_cpu
                              ^^^^^^^^
> The rest of the EM code doesn't change, em_perf_state::power is still
> expressed in milli-Watts (or in abstract scale). Thus, all existing
> platforms don't have to change their reported power. The same applies to

Not only existing platforms since there are no changes. So why
highlighting `existing` here.?

> EM clients, like thermal or DTPM (they use em_perf_state::power).
> 
> Reported-by: CCJ Yeh <CCj.Yeh@mediatek.com>
> Suggested-by: CCJ Yeh <CCj.Yeh@mediatek.com>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  include/linux/energy_model.h | 5 ++++-
>  kernel/power/energy_model.c  | 3 ++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 2016f5a706e0..91037dd57e61 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -16,7 +16,10 @@
>   * @power:	The power consumed at this level (by 1 CPU or by a registered
>   *		device). It can be a total power: static and dynamic.
>   * @cost:	The cost coefficient associated with this level, used during
> - *		energy calculation. Equal to: power * max_frequency / frequency
> + *		energy calculation. Equal to:
> +		power * 10000 * max_frequency / frequency
> + *		To increase the energy estimation presision use different
> + *		scale in this coefficient than in @power field.
>   */
>  struct em_perf_state {
>  	unsigned long frequency;
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 0f4530b3a8cd..2724f0ac417d 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -170,7 +170,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
>  	/* Compute the cost of each performance state. */
>  	fmax = (u64) table[nr_states - 1].frequency;
>  	for (i = 0; i < nr_states; i++) {
> -		table[i].cost = div64_u64(fmax * table[i].power,
> +		u64 power_res = (u64)table[i].power * 10000;
> +		table[i].cost = div64_u64(fmax * power_res,
>  					  table[i].frequency);
>  	}
>  
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values
  2021-07-05 12:44   ` Dietmar Eggemann
@ 2021-07-06 19:44     ` Lukasz Luba
  0 siblings, 0 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-07-06 19:44 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Chris.Redpath, morten.rasmussen, qperret, linux-pm,
	peterz, rjw, viresh.kumar, vincent.guittot, mingo, juri.lelli,
	rostedt, segall, mgorman, bristot, CCj.Yeh



On 7/5/21 1:44 PM, Dietmar Eggemann wrote:
> On 25/06/2021 17:26, Lukasz Luba wrote:
>> The Energy Model (EM) em_cpu_energy() is responsible for providing good
>> estimation regarding CPUs energy. It contains proper data structures which
>> are then used during calculation. The values stored in there are in
>> milli-Watts precision (or in abstract scale) smaller that 0xffff, which use
> 
> I guess you refer to 'if (... || power > EM_MAX_POWER)' check in
> em_create_perf_table() [kernel/power/energy_model.c].

Correct

> 
>> sufficient unsigned long even on 32-bit machines. There are scenarios where
>                                                                ^^^^^^^^^
> 
> Can you describe these scenarios better with one example (EAS placement
> of an example task on a 2 PD system) which highlights the issue and how
> it this patch-set solves it?

There are two places in the code where it makes a difference:

1. In the find_energy_efficient_cpu() where we are searching for
best_delta. We might suffer there when two PDs return the same result,
like in the example below.

Scenario:
Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There
are quite a few small tasks ~10-15 util. These tasks would suffer for
the rounding error. Such system utilization has been seen while playing
some simple games. In such condition our partner reported 5..10mA less
battery drain.

Some details:
We have two Perf Domains (PDs): PD0 (big) and PD1 (little)
Let's compare w/o patch set ('old') and w/ patch set ('new')
We are comparing energy w/ task and w/o task placed in the PDs

a) 'old' w/o patch set, PD0
task_util = 13
cost = 480
sum_util_w/o_task = 215
sum_util_w_task = 228
scale_cpu = 1024
energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100
energy_w_task = 480 * 228 / 1024 = 106.87 => 106
energy_diff = 106 - 100 = 6 (this is equal to 'old' PD1's energy_diff in 
'c)')

b) 'new' w/ patch set, PD0
task_util = 13
cost = 480 * 10000 = 4800000
sum_util_w/o_task = 215
sum_util_w_task = 228
energy_w/o_task = 4800000 * 215 / 1024 = 1007812
energy_w_task = 4800000 * 228 / 1024  = 1068750
energy_diff = 1068750 - 1007812 = 60938 (this is not equal to 'new' 
PD1's energy_diff in 'd)')

c) 'old' w/o patch set, PD1
task_util = 13
cost = 160
sum_util_w/o_task = 283
sum_util_w_task = 293
scale_cpu = 355
energy_w/o_task = 160 * 283 / 355 = 127.55 => 127
energy_w_task = 160 * 296 / 355 = 133.41 => 133
energy_diff = 133 - 127 = 6 (this is equal to 'old' PD0's energy_diff in 
'a)')

d) 'new' w/ patch set, PD1
task_util = 13
cost = 160 * 10000 = 1600000
sum_util_w/o_task = 283
sum_util_w_task = 293
scale_cpu = 355
(no '/ scale_cpu' needed here)
energy_w/o_task = 1600000 * 283 / 355 = 1275492
energy_w_task = 1600000 * 296 / 355 =   1334084
energy_diff = 1334084 - 1275492 = 58592 (this is not equal to 'new' 
PD0's energy_diff in 'b)')

2. Difference in the the last feec() step: margin filter
With the patch set the margin comparison also has better resolution,
so it's possible to hit better placement thanks to that.

Please see the traces below.
How to interpret these values:
In the first trace below, there is diff=124964 and margin=123381
the EM 'cost' is multiplied by 10000, so we we divide these two,
it will be '12 > 12', so it won't be placed into the better PD
with lower best delta.

In the last 2 examples you would see close values in the
prev_delta=49390 best_delta=43945
Without the patch they would be rounded to
prev_delta=4 best_delta=4
and the task might be placed wrongly.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
   systemd-logind-440     [000] d..5    82.164218: compute_energy: 
energy=43945, sum_util=9 cpu=4
   systemd-logind-440     [000] d..5    82.164232: compute_energy: 
energy=766601, sum_util=157 cpu=4
   systemd-logind-440     [000] d..5    82.164242: compute_energy: 
energy=766601, sum_util=157 cpu=4
   systemd-logind-440     [000] d..5    82.164253: compute_energy: 
energy=1207500, sum_util=299 cpu=0
   systemd-logind-440     [000] d..5    82.164263: compute_energy: 
energy=1805192, sum_util=447 cpu=0
   systemd-logind-440     [000] d..5    82.164273: select_task_rq_fair: 
EAS: prev_delta=722656 best_delta=597692 diff=124964 margin=123381
   systemd-logind-440     [000] d..5    82.164278: select_task_rq_fair: 
EAS: hit!!!


   systemd-logind-440     [000] d.h4   134.954038: compute_energy: 
energy=366210, sum_util=75 cpu=4
   systemd-logind-440     [000] d.h4   134.954067: compute_energy: 
energy=463867, sum_util=95 cpu=4
   systemd-logind-440     [000] d.h4   134.954090: compute_energy: 
energy=463867, sum_util=95 cpu=4
   systemd-logind-440     [000] d.h4   134.954117: compute_energy: 
energy=257347, sum_util=99 cpu=0
   systemd-logind-440     [000] d.h4   134.954137: compute_energy: 
energy=309336, sum_util=119 cpu=0
   systemd-logind-440     [000] d.h4   134.954160: select_task_rq_fair: 
EAS: prev_delta=97657 best_delta=51989 diff=45668 margin=45075
   systemd-logind-440     [000] d.h4   134.954171: select_task_rq_fair: 
EAS: hit!!!


           <idle>-0       [001] d.s4   226.019763: compute_energy: 
energy=0, sum_util=0 cpu=4
           <idle>-0       [001] d.s4   226.019790: compute_energy: 
energy=43945, sum_util=9 cpu=4
           <idle>-0       [001] d.s4   226.019817: compute_energy: 
energy=5198, sum_util=2 cpu=0
           <idle>-0       [001] d.s4   226.019838: compute_energy: 
energy=54588, sum_util=21 cpu=0
           <idle>-0       [001] d.s4   226.019858: compute_energy: 
energy=54588, sum_util=21 cpu=0
           <idle>-0       [001] d.s4   226.019881: select_task_rq_fair: 
EAS: prev_delta=49390 best_delta=43945 diff=5445 margin=3411
           <idle>-0       [001] d.s4   226.019891: select_task_rq_fair: 
EAS: hit!!!


           <idle>-0       [001] d.s4   270.019780: compute_energy: 
energy=0, sum_util=0 cpu=4
           <idle>-0       [001] d.s4   270.019807: compute_energy: 
energy=43945, sum_util=9 cpu=4
           <idle>-0       [001] d.s4   270.019833: compute_energy: 
energy=5198, sum_util=2 cpu=0
           <idle>-0       [001] d.s4   270.019854: compute_energy: 
energy=54588, sum_util=21 cpu=0
           <idle>-0       [001] d.s4   270.019874: compute_energy: 
energy=54588, sum_util=21 cpu=0
           <idle>-0       [001] d.s4   270.019897: select_task_rq_fair: 
EAS: prev_delta=49390 best_delta=43945 diff=5445 margin=3411
           <idle>-0       [001] d.s4   270.019908: select_task_rq_fair: 
EAS: hit!!!

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


> 
> In this example you can list all the things which must be there to
> create a situation in EAS in which the patch-set helps.

I hope the description above now add more light into this issue.

> 
>> we would like to provide calculated estimations in a better precision and
>> the values might be 1000 times bigger. This patch makes possible to use
> 
> Where is this `1000` coming from?

It's just a statement that in the next patches we would increase the
resolution by a few orders of magnitude. In patch 3/3 it's 10000.
I can align with that value also in this statement.

Thank you Dietmar for having a look at this!

Regards,
Lukasz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] PM: EM: Increase energy calculation precision
  2021-07-05 12:45   ` Dietmar Eggemann
@ 2021-07-06 19:51     ` Lukasz Luba
  0 siblings, 0 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-07-06 19:51 UTC (permalink / raw)
  To: Dietmar Eggemann, linux-kernel
  Cc: Chris.Redpath, morten.rasmussen, qperret, linux-pm, peterz, rjw,
	viresh.kumar, vincent.guittot, mingo, juri.lelli, rostedt,
	segall, mgorman, bristot, CCj.Yeh



On 7/5/21 1:45 PM, Dietmar Eggemann wrote:
> On 25/06/2021 17:26, Lukasz Luba wrote:
>> The Energy Model (EM) provides useful information about device power in
>> each performance state to other subsystems like: Energy Aware Scheduler
>> (EAS). The energy calculation in EAS does arithmetic operation based on
>> the EM em_cpu_energy(). Current implementation of that function uses
>> em_perf_state::cost as a pre-computed cost coefficient equal to:
>> cost = power * max_frequency / frequency.
>> The 'power' is expressed in milli-Watts (or in abstract scale).
>>
>> There are corner cases then the EAS energy calculation for two Performance
>              ^^^^^^^^^^^^
> 
> Again, an easy to understand example to describe in which situation this
> change would bring a benefit would help.
> 
>> Domains (PDs) return the same value, e.g. 10mW. The EAS compares these
>> values to choose smaller one. It might happen that this values are equal
>> due to rounding error. In such scenario, we need better precision, e.g.
>> 10000 times better. To provide this possibility increase the precision on
>> the em_perf_state::cost.
>>
>> This patch allows to avoid the rounding to milli-Watt errors, which might
>> occur in EAS energy estimation for each Performance Domains (PD). The
>> rounding error is common for small tasks which have small utilization
>> values.
> 
> What's the influence of the CPU utilization 'cpu_util_next()' here?
> 
> compute_energy()
>      em_cpu_energy()
>              return ps->cost * sum_util / scale_cpu
>                                ^^^^^^^^

This is the place where the rounding error triggers. If sum_util is
small and scale_cpu is e.g. 1024, then we have a small fraction here.
It depends on the EM 'cost', but for most platforms we have small
power and cost values, so we suffer this rounding.
The example that I gave in my response in patch 2/3 shows this.

>> The rest of the EM code doesn't change, em_perf_state::power is still
>> expressed in milli-Watts (or in abstract scale). Thus, all existing
>> platforms don't have to change their reported power. The same applies to
> 
> Not only existing platforms since there are no changes. So why
> highlighting `existing` here.?

I just wanted to be clear that it doesn't affect existing platforms
at all. We don't require to report power in better resolution e.g.
micro-Watts.
Also, the clients in the kernel won't be affected, since they use
EM 'power' filed, not 'cost'.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-06-25 15:26 ` [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy Lukasz Luba
  2021-06-30 17:01   ` Rafael J. Wysocki
@ 2021-07-07  7:07   ` Vincent Guittot
  2021-07-07  7:49     ` Lukasz Luba
  1 sibling, 1 reply; 33+ messages in thread
From: Vincent Guittot @ 2021-07-07  7:07 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh

On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
> task. It probes many possibilities and compares the estimated energy values
> for different scenarios. For calculating those energy values it relies on
> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
> EM data is in milli-Watts (or abstract scale), which sometimes is not
> sufficient. In some cases it might happen that two CPUs from different
> Performance Domains (PDs) get the same calculated value for a given task
> placement, but in more precised scale, they might differ. This rounding
> error has to be addressed. This patch prepares EAS code for better
> precision in the coming EM improvements.

Could you explain why 32bits results are not enough and you need to
move to 64bits ?

Right now the result is in the range [0..2^32[ mW. If you need more
precision and you want to return uW instead, you will have a result in
the range  [0..4kW[ which seems to be still enough

>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  kernel/sched/fair.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7b8990fd4896..b517c9e79768 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6582,7 +6582,7 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
>   * to compute what would be the energy if we decided to actually migrate that
>   * task.
>   */
> -static long
> +static u64
>  compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
>  {
>         struct cpumask *pd_mask = perf_domain_span(pd);
> @@ -6689,12 +6689,13 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
>   */
>  static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  {
> -       unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
>         struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
> +       u64 prev_delta = ULLONG_MAX, best_delta = ULLONG_MAX;
>         int cpu, best_energy_cpu = prev_cpu, target = -1;
> -       unsigned long cpu_cap, util, base_energy = 0;
> +       unsigned long cpu_cap, util;
>         struct sched_domain *sd;
>         struct perf_domain *pd;
> +       u64 base_energy = 0;
>
>         rcu_read_lock();
>         pd = rcu_dereference(rd->pd);
> @@ -6718,9 +6719,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>                 goto unlock;
>
>         for (; pd; pd = pd->next) {
> -               unsigned long cur_delta, spare_cap, max_spare_cap = 0;
> +               unsigned long spare_cap, max_spare_cap = 0;
>                 bool compute_prev_delta = false;
> -               unsigned long base_energy_pd;
> +               u64 base_energy_pd, cur_delta;
>                 int max_spare_cap_cpu = -1;
>
>                 for_each_cpu_and(cpu, perf_domain_span(pd), sched_domain_span(sd)) {
> @@ -6790,7 +6791,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>          * Pick the best CPU if prev_cpu cannot be used, or if it saves at
>          * least 6% of the energy used by prev_cpu.
>          */
> -       if ((prev_delta == ULONG_MAX) ||
> +       if ((prev_delta == ULLONG_MAX) ||
>             (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
>                 target = best_energy_cpu;
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values
  2021-06-25 15:26 ` [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values Lukasz Luba
  2021-07-05 12:44   ` Dietmar Eggemann
@ 2021-07-07  7:07   ` Peter Zijlstra
  2021-07-07  8:09     ` Lukasz Luba
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2021-07-07  7:07 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris.Redpath, dietmar.eggemann, morten.rasmussen,
	qperret, linux-pm, rjw, viresh.kumar, vincent.guittot, mingo,
	juri.lelli, rostedt, segall, mgorman, bristot, CCj.Yeh

On Fri, Jun 25, 2021 at 04:26:02PM +0100, Lukasz Luba wrote:
> The Energy Model (EM) em_cpu_energy() is responsible for providing good
> estimation regarding CPUs energy. It contains proper data structures which
> are then used during calculation. The values stored in there are in
> milli-Watts precision (or in abstract scale) smaller that 0xffff, which use
> sufficient unsigned long even on 32-bit machines. There are scenarios where
> we would like to provide calculated estimations in a better precision and
> the values might be 1000 times bigger. This patch makes possible to use
> quite big values for also 32-bit machines.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  include/linux/energy_model.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 3f221dbf5f95..2016f5a706e0 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -101,7 +101,7 @@ void em_dev_unregister_perf_domain(struct device *dev);
>   * Return: the sum of the energy consumed by the CPUs of the domain assuming
>   * a capacity state satisfying the max utilization of the domain.
>   */
> -static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
> +static inline u64 em_cpu_energy(struct em_perf_domain *pd,
>  				unsigned long max_util, unsigned long sum_util,
>  				unsigned long allowed_cpu_cap)
>  {
> @@ -180,7 +180,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>  	 *   pd_nrg = ------------------------                       (4)
>  	 *                  scale_cpu
>  	 */
> -	return ps->cost * sum_util / scale_cpu;
> +	return div_u64((u64)ps->cost * sum_util, scale_cpu);

So these patches are all rather straight forward, however.. the above is
pretty horrific on a 32bit box, and we do quite a few of them per
wakeup. Is this really worth the performance penalty on 32bit CPUs?

Do you really still care about 32bit CPUs, or is this mostly an artifact
of wanting to unconditionally increase the precision?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07  7:07   ` Vincent Guittot
@ 2021-07-07  7:49     ` Lukasz Luba
  2021-07-07  8:00       ` Vincent Guittot
  0 siblings, 1 reply; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07  7:49 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 7/7/21 8:07 AM, Vincent Guittot wrote:
> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
>> task. It probes many possibilities and compares the estimated energy values
>> for different scenarios. For calculating those energy values it relies on
>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
>> EM data is in milli-Watts (or abstract scale), which sometimes is not
>> sufficient. In some cases it might happen that two CPUs from different
>> Performance Domains (PDs) get the same calculated value for a given task
>> placement, but in more precised scale, they might differ. This rounding
>> error has to be addressed. This patch prepares EAS code for better
>> precision in the coming EM improvements.
> 
> Could you explain why 32bits results are not enough and you need to
> move to 64bits ?
> 
> Right now the result is in the range [0..2^32[ mW. If you need more
> precision and you want to return uW instead, you will have a result in
> the range  [0..4kW[ which seems to be still enough
> 

Currently we have the max value limit for 'power' in EM which is
EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
pre-calculate 'cost' fields:
cost[i] = power[i] * freq_max / freq[i]
So, for max freq the cost == power. Let's use that in the example.

Then the em_cpu_energy() calculates as follow:
cost * sum_util / scale_cpu
We are interested in the first part - the value of multiplication.

The sum_util values that we can see for x CPUs which have scale_cap=1024
can be close to 800, let's use it in the example:
cost * sum_util = 64k * (x * 800), where
x=4: ~200mln
x=8: ~400mln
x=16: ~800mln
x=64: ~3200mln (last one which would fit in u32)

When we increase the precision by even 100, then the above values won't
fit in the u32. Even a max cost of e.g. 10k mW and 100 precision has
issues:
cost * sum_util = (10k *100) * (x * 800), where
x=4: ~3200mln
x=8: ~6400mln

For *1000 precision even a power of 1Watt becomes an issue:
cost * sum_util = (1k *1000) * (x * 800), where
x=4: ~3200mln
x=8: ~6400mln

That's why to make the code safe for bigger power values, I had to use
the u64 on 32bit machines.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07  7:49     ` Lukasz Luba
@ 2021-07-07  8:00       ` Vincent Guittot
  2021-07-07  8:23         ` Lukasz Luba
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent Guittot @ 2021-07-07  8:00 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh

On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 7/7/21 8:07 AM, Vincent Guittot wrote:
> > On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
> >> task. It probes many possibilities and compares the estimated energy values
> >> for different scenarios. For calculating those energy values it relies on
> >> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
> >> EM data is in milli-Watts (or abstract scale), which sometimes is not
> >> sufficient. In some cases it might happen that two CPUs from different
> >> Performance Domains (PDs) get the same calculated value for a given task
> >> placement, but in more precised scale, they might differ. This rounding
> >> error has to be addressed. This patch prepares EAS code for better
> >> precision in the coming EM improvements.
> >
> > Could you explain why 32bits results are not enough and you need to
> > move to 64bits ?
> >
> > Right now the result is in the range [0..2^32[ mW. If you need more
> > precision and you want to return uW instead, you will have a result in
> > the range  [0..4kW[ which seems to be still enough
> >
>
> Currently we have the max value limit for 'power' in EM which is
> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
> pre-calculate 'cost' fields:
> cost[i] = power[i] * freq_max / freq[i]
> So, for max freq the cost == power. Let's use that in the example.
>
> Then the em_cpu_energy() calculates as follow:
> cost * sum_util / scale_cpu
> We are interested in the first part - the value of multiplication.

But all these are internal computations of the energy model. At the
end, the computed energy that is returned by compute_energy() and
em_cpu_energy(), fits in a long

>
> The sum_util values that we can see for x CPUs which have scale_cap=1024
> can be close to 800, let's use it in the example:
> cost * sum_util = 64k * (x * 800), where
> x=4: ~200mln
> x=8: ~400mln
> x=16: ~800mln
> x=64: ~3200mln (last one which would fit in u32)
>
> When we increase the precision by even 100, then the above values won't
> fit in the u32. Even a max cost of e.g. 10k mW and 100 precision has
> issues:
> cost * sum_util = (10k *100) * (x * 800), where
> x=4: ~3200mln
> x=8: ~6400mln
>
> For *1000 precision even a power of 1Watt becomes an issue:
> cost * sum_util = (1k *1000) * (x * 800), where
> x=4: ~3200mln
> x=8: ~6400mln
>
> That's why to make the code safe for bigger power values, I had to use
> the u64 on 32bit machines.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values
  2021-07-07  7:07   ` Peter Zijlstra
@ 2021-07-07  8:09     ` Lukasz Luba
  2021-07-07 10:01       ` Peter Zijlstra
  0 siblings, 1 reply; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07  8:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Chris.Redpath, dietmar.eggemann, morten.rasmussen,
	qperret, linux-pm, rjw, viresh.kumar, vincent.guittot, mingo,
	juri.lelli, rostedt, segall, mgorman, bristot, CCj.Yeh



On 7/7/21 8:07 AM, Peter Zijlstra wrote:
> On Fri, Jun 25, 2021 at 04:26:02PM +0100, Lukasz Luba wrote:
>> The Energy Model (EM) em_cpu_energy() is responsible for providing good
>> estimation regarding CPUs energy. It contains proper data structures which
>> are then used during calculation. The values stored in there are in
>> milli-Watts precision (or in abstract scale) smaller that 0xffff, which use
>> sufficient unsigned long even on 32-bit machines. There are scenarios where
>> we would like to provide calculated estimations in a better precision and
>> the values might be 1000 times bigger. This patch makes possible to use
>> quite big values for also 32-bit machines.
>>
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>> ---
>>   include/linux/energy_model.h | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
>> index 3f221dbf5f95..2016f5a706e0 100644
>> --- a/include/linux/energy_model.h
>> +++ b/include/linux/energy_model.h
>> @@ -101,7 +101,7 @@ void em_dev_unregister_perf_domain(struct device *dev);
>>    * Return: the sum of the energy consumed by the CPUs of the domain assuming
>>    * a capacity state satisfying the max utilization of the domain.
>>    */
>> -static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>> +static inline u64 em_cpu_energy(struct em_perf_domain *pd,
>>   				unsigned long max_util, unsigned long sum_util,
>>   				unsigned long allowed_cpu_cap)
>>   {
>> @@ -180,7 +180,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>>   	 *   pd_nrg = ------------------------                       (4)
>>   	 *                  scale_cpu
>>   	 */
>> -	return ps->cost * sum_util / scale_cpu;
>> +	return div_u64((u64)ps->cost * sum_util, scale_cpu);
> 
> So these patches are all rather straight forward, however.. the above is
> pretty horrific on a 32bit box, and we do quite a few of them per
> wakeup. Is this really worth the performance penalty on 32bit CPUs?

True, for 2 cluster SoC we might do this 5 times (or less, depends on
system state). We don't have new 32bit big.LITTLE platforms, the newest
is ~7years old and is actually the only one using EAS. It's not put
into new devices AFAIK.

> 
> Do you really still care about 32bit CPUs, or is this mostly an artifact
> of wanting to unconditionally increase the precision?
> 

We discussed this internally and weighted the 32bit old big.little.

There is a solution, but needs more work and a lot of changes in the
whole kernel due to modified EM (affects IPA, DTPM, registration, ...).

I have been working on a next step for code that you've pointed:
get rid of this runtime division.
It would be possible to pre-calculate the:
'ps->cost / scale_cpu' at the moment when EM is registered and store
it in the ps->cost. So we would have just:
return ps->cost * sum_util

The only issue is a late boot of biggest cores, which would destroy
the old scale_cpu values for other PDs. I need to probably add
RCU locking into the EM and update the other PDs' EMs when
the last biggest CPU boots after a few second and registers its
EM.

For now we would live with this simple code which improves
all recent 64bit platforms and is easy to take it into Android
common kernel. The next step would be more scattered across
other subsystems, so harder to backport to Android 5.4 and others.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07  8:00       ` Vincent Guittot
@ 2021-07-07  8:23         ` Lukasz Luba
  2021-07-07  9:37           ` Vincent Guittot
  2021-07-07  9:45           ` Dietmar Eggemann
  0 siblings, 2 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07  8:23 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 7/7/21 9:00 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 7/7/21 8:07 AM, Vincent Guittot wrote:
>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
>>>> task. It probes many possibilities and compares the estimated energy values
>>>> for different scenarios. For calculating those energy values it relies on
>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not
>>>> sufficient. In some cases it might happen that two CPUs from different
>>>> Performance Domains (PDs) get the same calculated value for a given task
>>>> placement, but in more precised scale, they might differ. This rounding
>>>> error has to be addressed. This patch prepares EAS code for better
>>>> precision in the coming EM improvements.
>>>
>>> Could you explain why 32bits results are not enough and you need to
>>> move to 64bits ?
>>>
>>> Right now the result is in the range [0..2^32[ mW. If you need more
>>> precision and you want to return uW instead, you will have a result in
>>> the range  [0..4kW[ which seems to be still enough
>>>
>>
>> Currently we have the max value limit for 'power' in EM which is
>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
>> pre-calculate 'cost' fields:
>> cost[i] = power[i] * freq_max / freq[i]
>> So, for max freq the cost == power. Let's use that in the example.
>>
>> Then the em_cpu_energy() calculates as follow:
>> cost * sum_util / scale_cpu
>> We are interested in the first part - the value of multiplication.
> 
> But all these are internal computations of the energy model. At the
> end, the computed energy that is returned by compute_energy() and
> em_cpu_energy(), fits in a long

Let's take a look at existing *10000 precision for x CPUs:
cost * sum_util / scale_cpu =
(64k *10000) * (x * 800) / 1024
which is:
x * ~500mln

So to be close to overflowing u32 the 'x' has to be > (?=) 8
(depends on sum_util).

> 
>>
>> The sum_util values that we can see for x CPUs which have scale_cap=1024
>> can be close to 800, let's use it in the example:
>> cost * sum_util = 64k * (x * 800), where
>> x=4: ~200mln
>> x=8: ~400mln
>> x=16: ~800mln
>> x=64: ~3200mln (last one which would fit in u32)
>>
>> When we increase the precision by even 100, then the above values won't
>> fit in the u32. Even a max cost of e.g. 10k mW and 100 precision has
>> issues:
>> cost * sum_util = (10k *100) * (x * 800), where
>> x=4: ~3200mln
>> x=8: ~6400mln
>>
>> For *1000 precision even a power of 1Watt becomes an issue:
>> cost * sum_util = (1k *1000) * (x * 800), where
>> x=4: ~3200mln
>> x=8: ~6400mln
>>
>> That's why to make the code safe for bigger power values, I had to use
>> the u64 on 32bit machines.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07  8:23         ` Lukasz Luba
@ 2021-07-07  9:37           ` Vincent Guittot
  2021-07-07  9:48             ` Lukasz Luba
  2021-07-07  9:45           ` Dietmar Eggemann
  1 sibling, 1 reply; 33+ messages in thread
From: Vincent Guittot @ 2021-07-07  9:37 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh

On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 7/7/21 9:00 AM, Vincent Guittot wrote:
> > On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >>
> >>
> >> On 7/7/21 8:07 AM, Vincent Guittot wrote:
> >>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>
> >>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
> >>>> task. It probes many possibilities and compares the estimated energy values
> >>>> for different scenarios. For calculating those energy values it relies on
> >>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
> >>>> EM data is in milli-Watts (or abstract scale), which sometimes is not
> >>>> sufficient. In some cases it might happen that two CPUs from different
> >>>> Performance Domains (PDs) get the same calculated value for a given task
> >>>> placement, but in more precised scale, they might differ. This rounding
> >>>> error has to be addressed. This patch prepares EAS code for better
> >>>> precision in the coming EM improvements.
> >>>
> >>> Could you explain why 32bits results are not enough and you need to
> >>> move to 64bits ?
> >>>
> >>> Right now the result is in the range [0..2^32[ mW. If you need more
> >>> precision and you want to return uW instead, you will have a result in
> >>> the range  [0..4kW[ which seems to be still enough
> >>>
> >>
> >> Currently we have the max value limit for 'power' in EM which is
> >> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
> >> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
> >> pre-calculate 'cost' fields:
> >> cost[i] = power[i] * freq_max / freq[i]
> >> So, for max freq the cost == power. Let's use that in the example.
> >>
> >> Then the em_cpu_energy() calculates as follow:
> >> cost * sum_util / scale_cpu
> >> We are interested in the first part - the value of multiplication.
> >
> > But all these are internal computations of the energy model. At the
> > end, the computed energy that is returned by compute_energy() and
> > em_cpu_energy(), fits in a long
>
> Let's take a look at existing *10000 precision for x CPUs:
> cost * sum_util / scale_cpu =
> (64k *10000) * (x * 800) / 1024
> which is:
> x * ~500mln
>
> So to be close to overflowing u32 the 'x' has to be > (?=) 8
> (depends on sum_util).

Sorry but I don't get your point.
This patch is about the return type of compute_energy() and
em_cpu_energy(). And even if we decide to return uW instead of mW,
there is still a lot of margin.

It's not because you need u64 for computing intermediate value that
you must returns u64

>
> >
> >>
> >> The sum_util values that we can see for x CPUs which have scale_cap=1024
> >> can be close to 800, let's use it in the example:
> >> cost * sum_util = 64k * (x * 800), where
> >> x=4: ~200mln
> >> x=8: ~400mln
> >> x=16: ~800mln
> >> x=64: ~3200mln (last one which would fit in u32)
> >>
> >> When we increase the precision by even 100, then the above values won't
> >> fit in the u32. Even a max cost of e.g. 10k mW and 100 precision has
> >> issues:
> >> cost * sum_util = (10k *100) * (x * 800), where
> >> x=4: ~3200mln
> >> x=8: ~6400mln
> >>
> >> For *1000 precision even a power of 1Watt becomes an issue:
> >> cost * sum_util = (1k *1000) * (x * 800), where
> >> x=4: ~3200mln
> >> x=8: ~6400mln
> >>
> >> That's why to make the code safe for bigger power values, I had to use
> >> the u64 on 32bit machines.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07  8:23         ` Lukasz Luba
  2021-07-07  9:37           ` Vincent Guittot
@ 2021-07-07  9:45           ` Dietmar Eggemann
  2021-07-07  9:54             ` Lukasz Luba
  1 sibling, 1 reply; 33+ messages in thread
From: Dietmar Eggemann @ 2021-07-07  9:45 UTC (permalink / raw)
  To: Lukasz Luba, Vincent Guittot
  Cc: linux-kernel, Chris Redpath, Morten Rasmussen, Quentin Perret,
	open list:THERMAL, Peter Zijlstra, Rafael J. Wysocki,
	Viresh Kumar, Ingo Molnar, Juri Lelli, Steven Rostedt, segall,
	Mel Gorman, Daniel Bristot de Oliveira, CCj.Yeh

On 07/07/2021 10:23, Lukasz Luba wrote:
>  
> On 7/7/21 9:00 AM, Vincent Guittot wrote:
>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>
>>>
>>>
>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:
>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

[...]

>>>> Could you explain why 32bits results are not enough and you need to
>>>> move to 64bits ?
>>>>
>>>> Right now the result is in the range [0..2^32[ mW. If you need more
>>>> precision and you want to return uW instead, you will have a result in
>>>> the range  [0..4kW[ which seems to be still enough
>>>>
>>>
>>> Currently we have the max value limit for 'power' in EM which is
>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
>>> pre-calculate 'cost' fields:
>>> cost[i] = power[i] * freq_max / freq[i]
>>> So, for max freq the cost == power. Let's use that in the example.
>>>
>>> Then the em_cpu_energy() calculates as follow:
>>> cost * sum_util / scale_cpu
>>> We are interested in the first part - the value of multiplication.
>>
>> But all these are internal computations of the energy model. At the
>> end, the computed energy that is returned by compute_energy() and
>> em_cpu_energy(), fits in a long
> 
> Let's take a look at existing *10000 precision for x CPUs:
> cost * sum_util / scale_cpu =
> (64k *10000) * (x * 800) / 1024
> which is:
> x * ~500mln
> 
> So to be close to overflowing u32 the 'x' has to be > (?=) 8
> (depends on sum_util).

I assume the worst case is `x * 1024` (max return value of
effective_cpu_util = effective_cpu_util()) so x ~ 6.7.

I'm not aware of any arm32 b.L. systems with > 4 CPUs in a PD.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07  9:37           ` Vincent Guittot
@ 2021-07-07  9:48             ` Lukasz Luba
  2021-07-07  9:56               ` Vincent Guittot
  0 siblings, 1 reply; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07  9:48 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 7/7/21 10:37 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 7/7/21 9:00 AM, Vincent Guittot wrote:
>>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:
>>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>>>
>>>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
>>>>>> task. It probes many possibilities and compares the estimated energy values
>>>>>> for different scenarios. For calculating those energy values it relies on
>>>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
>>>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not
>>>>>> sufficient. In some cases it might happen that two CPUs from different
>>>>>> Performance Domains (PDs) get the same calculated value for a given task
>>>>>> placement, but in more precised scale, they might differ. This rounding
>>>>>> error has to be addressed. This patch prepares EAS code for better
>>>>>> precision in the coming EM improvements.
>>>>>
>>>>> Could you explain why 32bits results are not enough and you need to
>>>>> move to 64bits ?
>>>>>
>>>>> Right now the result is in the range [0..2^32[ mW. If you need more
>>>>> precision and you want to return uW instead, you will have a result in
>>>>> the range  [0..4kW[ which seems to be still enough
>>>>>
>>>>
>>>> Currently we have the max value limit for 'power' in EM which is
>>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
>>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
>>>> pre-calculate 'cost' fields:
>>>> cost[i] = power[i] * freq_max / freq[i]
>>>> So, for max freq the cost == power. Let's use that in the example.
>>>>
>>>> Then the em_cpu_energy() calculates as follow:
>>>> cost * sum_util / scale_cpu
>>>> We are interested in the first part - the value of multiplication.
>>>
>>> But all these are internal computations of the energy model. At the
>>> end, the computed energy that is returned by compute_energy() and
>>> em_cpu_energy(), fits in a long
>>
>> Let's take a look at existing *10000 precision for x CPUs:
>> cost * sum_util / scale_cpu =
>> (64k *10000) * (x * 800) / 1024
>> which is:
>> x * ~500mln
>>
>> So to be close to overflowing u32 the 'x' has to be > (?=) 8
>> (depends on sum_util).
> 
> Sorry but I don't get your point.
> This patch is about the return type of compute_energy() and
> em_cpu_energy(). And even if we decide to return uW instead of mW,
> there is still a lot of margin.
> 
> It's not because you need u64 for computing intermediate value that
> you must returns u64

The example above shows the need of u64 return value for platforms
which are:
- 32bit
- have e.g. 16 CPUs
- has big power value e.g. ~64k mW
Then let's to the calc:
(64k * 10000) * (16 * 800) / 1024 = ~8000mln = ~8bln

The returned value after applying the whole patch set
won't fit in u32 for such cluster.

We might make *assumption* that the 32bit platforms will not
have bigger number of CPUs in the cluster or won't report
big power values. But I didn't wanted to make such assumption.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07  9:45           ` Dietmar Eggemann
@ 2021-07-07  9:54             ` Lukasz Luba
  0 siblings, 0 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07  9:54 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Vincent Guittot, linux-kernel, Chris Redpath, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 7/7/21 10:45 AM, Dietmar Eggemann wrote:
> On 07/07/2021 10:23, Lukasz Luba wrote:
>>   
>> On 7/7/21 9:00 AM, Vincent Guittot wrote:
>>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:
>>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
> 
> [...]
> 
>>>>> Could you explain why 32bits results are not enough and you need to
>>>>> move to 64bits ?
>>>>>
>>>>> Right now the result is in the range [0..2^32[ mW. If you need more
>>>>> precision and you want to return uW instead, you will have a result in
>>>>> the range  [0..4kW[ which seems to be still enough
>>>>>
>>>>
>>>> Currently we have the max value limit for 'power' in EM which is
>>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
>>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
>>>> pre-calculate 'cost' fields:
>>>> cost[i] = power[i] * freq_max / freq[i]
>>>> So, for max freq the cost == power. Let's use that in the example.
>>>>
>>>> Then the em_cpu_energy() calculates as follow:
>>>> cost * sum_util / scale_cpu
>>>> We are interested in the first part - the value of multiplication.
>>>
>>> But all these are internal computations of the energy model. At the
>>> end, the computed energy that is returned by compute_energy() and
>>> em_cpu_energy(), fits in a long
>>
>> Let's take a look at existing *10000 precision for x CPUs:
>> cost * sum_util / scale_cpu =
>> (64k *10000) * (x * 800) / 1024
>> which is:
>> x * ~500mln
>>
>> So to be close to overflowing u32 the 'x' has to be > (?=) 8
>> (depends on sum_util).
> 
> I assume the worst case is `x * 1024` (max return value of
> effective_cpu_util = effective_cpu_util()) so x ~ 6.7.
> 
> I'm not aware of any arm32 b.L. systems with > 4 CPUs in a PD.
> 

True, arm32 didn't support bigger number than 4 CPUs in the cluster.
We would be safe for them, but I don't want to break with this
assumption any other 32bit platform from competitors, which might
create such 32bit 16cores clusters.

If Peter, Vincent and you are OK to put this assumption about
max safe CPUs number, then we can get rid of patch 1/3.

But the temporary division of u64 must stay, because there is
arm32 platform which need it. So returning also u64 is not a big
harm and looks more consistent.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07  9:48             ` Lukasz Luba
@ 2021-07-07  9:56               ` Vincent Guittot
  2021-07-07 10:06                 ` Lukasz Luba
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent Guittot @ 2021-07-07  9:56 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh

On Wed, 7 Jul 2021 at 11:48, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 7/7/21 10:37 AM, Vincent Guittot wrote:
> > On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >>
> >>
> >> On 7/7/21 9:00 AM, Vincent Guittot wrote:
> >>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:
> >>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>>>
> >>>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
> >>>>>> task. It probes many possibilities and compares the estimated energy values
> >>>>>> for different scenarios. For calculating those energy values it relies on
> >>>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
> >>>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not
> >>>>>> sufficient. In some cases it might happen that two CPUs from different
> >>>>>> Performance Domains (PDs) get the same calculated value for a given task
> >>>>>> placement, but in more precised scale, they might differ. This rounding
> >>>>>> error has to be addressed. This patch prepares EAS code for better
> >>>>>> precision in the coming EM improvements.
> >>>>>
> >>>>> Could you explain why 32bits results are not enough and you need to
> >>>>> move to 64bits ?
> >>>>>
> >>>>> Right now the result is in the range [0..2^32[ mW. If you need more
> >>>>> precision and you want to return uW instead, you will have a result in
> >>>>> the range  [0..4kW[ which seems to be still enough
> >>>>>
> >>>>
> >>>> Currently we have the max value limit for 'power' in EM which is
> >>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
> >>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
> >>>> pre-calculate 'cost' fields:
> >>>> cost[i] = power[i] * freq_max / freq[i]
> >>>> So, for max freq the cost == power. Let's use that in the example.
> >>>>
> >>>> Then the em_cpu_energy() calculates as follow:
> >>>> cost * sum_util / scale_cpu
> >>>> We are interested in the first part - the value of multiplication.
> >>>
> >>> But all these are internal computations of the energy model. At the
> >>> end, the computed energy that is returned by compute_energy() and
> >>> em_cpu_energy(), fits in a long
> >>
> >> Let's take a look at existing *10000 precision for x CPUs:
> >> cost * sum_util / scale_cpu =
> >> (64k *10000) * (x * 800) / 1024
> >> which is:
> >> x * ~500mln
> >>
> >> So to be close to overflowing u32 the 'x' has to be > (?=) 8
> >> (depends on sum_util).
> >
> > Sorry but I don't get your point.
> > This patch is about the return type of compute_energy() and
> > em_cpu_energy(). And even if we decide to return uW instead of mW,
> > there is still a lot of margin.
> >
> > It's not because you need u64 for computing intermediate value that
> > you must returns u64
>
> The example above shows the need of u64 return value for platforms
> which are:
> - 32bit
> - have e.g. 16 CPUs
> - has big power value e.g. ~64k mW
> Then let's to the calc:
> (64k * 10000) * (16 * 800) / 1024 = ~8000mln = ~8bln

so you return a power consumption of 8kW !!!

>
> The returned value after applying the whole patch set
> won't fit in u32 for such cluster.
>
> We might make *assumption* that the 32bit platforms will not
> have bigger number of CPUs in the cluster or won't report
> big power values. But I didn't wanted to make such assumption.
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values
  2021-07-07  8:09     ` Lukasz Luba
@ 2021-07-07 10:01       ` Peter Zijlstra
  2021-07-07 10:23         ` Lukasz Luba
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2021-07-07 10:01 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris.Redpath, dietmar.eggemann, morten.rasmussen,
	qperret, linux-pm, rjw, viresh.kumar, vincent.guittot, mingo,
	juri.lelli, rostedt, segall, mgorman, bristot, CCj.Yeh

On Wed, Jul 07, 2021 at 09:09:08AM +0100, Lukasz Luba wrote:
> For now we would live with this simple code which improves
> all recent 64bit platforms and is easy to take it into Android
> common kernel. The next step would be more scattered across
> other subsystems, so harder to backport to Android 5.4 and others.

Ah, you *do* only care about 64bit :-) So one option is to only increase
precision for 64BIT builds, just like we do for scale_load() and
friends.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07  9:56               ` Vincent Guittot
@ 2021-07-07 10:06                 ` Lukasz Luba
  2021-07-07 10:11                   ` Vincent Guittot
  0 siblings, 1 reply; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07 10:06 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 7/7/21 10:56 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 11:48, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 7/7/21 10:37 AM, Vincent Guittot wrote:
>>> On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/7/21 9:00 AM, Vincent Guittot wrote:
>>>>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:
>>>>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>>>>>
>>>>>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
>>>>>>>> task. It probes many possibilities and compares the estimated energy values
>>>>>>>> for different scenarios. For calculating those energy values it relies on
>>>>>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
>>>>>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not
>>>>>>>> sufficient. In some cases it might happen that two CPUs from different
>>>>>>>> Performance Domains (PDs) get the same calculated value for a given task
>>>>>>>> placement, but in more precised scale, they might differ. This rounding
>>>>>>>> error has to be addressed. This patch prepares EAS code for better
>>>>>>>> precision in the coming EM improvements.
>>>>>>>
>>>>>>> Could you explain why 32bits results are not enough and you need to
>>>>>>> move to 64bits ?
>>>>>>>
>>>>>>> Right now the result is in the range [0..2^32[ mW. If you need more
>>>>>>> precision and you want to return uW instead, you will have a result in
>>>>>>> the range  [0..4kW[ which seems to be still enough
>>>>>>>
>>>>>>
>>>>>> Currently we have the max value limit for 'power' in EM which is
>>>>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
>>>>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
>>>>>> pre-calculate 'cost' fields:
>>>>>> cost[i] = power[i] * freq_max / freq[i]
>>>>>> So, for max freq the cost == power. Let's use that in the example.
>>>>>>
>>>>>> Then the em_cpu_energy() calculates as follow:
>>>>>> cost * sum_util / scale_cpu
>>>>>> We are interested in the first part - the value of multiplication.
>>>>>
>>>>> But all these are internal computations of the energy model. At the
>>>>> end, the computed energy that is returned by compute_energy() and
>>>>> em_cpu_energy(), fits in a long
>>>>
>>>> Let's take a look at existing *10000 precision for x CPUs:
>>>> cost * sum_util / scale_cpu =
>>>> (64k *10000) * (x * 800) / 1024
>>>> which is:
>>>> x * ~500mln
>>>>
>>>> So to be close to overflowing u32 the 'x' has to be > (?=) 8
>>>> (depends on sum_util).
>>>
>>> Sorry but I don't get your point.
>>> This patch is about the return type of compute_energy() and
>>> em_cpu_energy(). And even if we decide to return uW instead of mW,
>>> there is still a lot of margin.
>>>
>>> It's not because you need u64 for computing intermediate value that
>>> you must returns u64
>>
>> The example above shows the need of u64 return value for platforms
>> which are:
>> - 32bit
>> - have e.g. 16 CPUs
>> - has big power value e.g. ~64k mW
>> Then let's to the calc:
>> (64k * 10000) * (16 * 800) / 1024 = ~8000mln = ~8bln
> 
> so you return a power consumption of 8kW !!!
> 

No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts
each at max freq and 80% load.

Max power can be < 64Watts, which is 64k milli-Watts (< EM_MAX_POWER)
64k mW * 10000 --> is the 0.1uW precision


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07 10:06                 ` Lukasz Luba
@ 2021-07-07 10:11                   ` Vincent Guittot
  2021-07-07 10:29                     ` Lukasz Luba
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent Guittot @ 2021-07-07 10:11 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh

On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 7/7/21 10:56 AM, Vincent Guittot wrote:
> > On Wed, 7 Jul 2021 at 11:48, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >>
> >>
> >> On 7/7/21 10:37 AM, Vincent Guittot wrote:
> >>> On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/7/21 9:00 AM, Vincent Guittot wrote:
> >>>>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:
> >>>>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>>>>>
> >>>>>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
> >>>>>>>> task. It probes many possibilities and compares the estimated energy values
> >>>>>>>> for different scenarios. For calculating those energy values it relies on
> >>>>>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in
> >>>>>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not
> >>>>>>>> sufficient. In some cases it might happen that two CPUs from different
> >>>>>>>> Performance Domains (PDs) get the same calculated value for a given task
> >>>>>>>> placement, but in more precised scale, they might differ. This rounding
> >>>>>>>> error has to be addressed. This patch prepares EAS code for better
> >>>>>>>> precision in the coming EM improvements.
> >>>>>>>
> >>>>>>> Could you explain why 32bits results are not enough and you need to
> >>>>>>> move to 64bits ?
> >>>>>>>
> >>>>>>> Right now the result is in the range [0..2^32[ mW. If you need more
> >>>>>>> precision and you want to return uW instead, you will have a result in
> >>>>>>> the range  [0..4kW[ which seems to be still enough
> >>>>>>>
> >>>>>>
> >>>>>> Currently we have the max value limit for 'power' in EM which is
> >>>>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
> >>>>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
> >>>>>> pre-calculate 'cost' fields:
> >>>>>> cost[i] = power[i] * freq_max / freq[i]
> >>>>>> So, for max freq the cost == power. Let's use that in the example.
> >>>>>>
> >>>>>> Then the em_cpu_energy() calculates as follow:
> >>>>>> cost * sum_util / scale_cpu
> >>>>>> We are interested in the first part - the value of multiplication.
> >>>>>
> >>>>> But all these are internal computations of the energy model. At the
> >>>>> end, the computed energy that is returned by compute_energy() and
> >>>>> em_cpu_energy(), fits in a long
> >>>>
> >>>> Let's take a look at existing *10000 precision for x CPUs:
> >>>> cost * sum_util / scale_cpu =
> >>>> (64k *10000) * (x * 800) / 1024
> >>>> which is:
> >>>> x * ~500mln
> >>>>
> >>>> So to be close to overflowing u32 the 'x' has to be > (?=) 8
> >>>> (depends on sum_util).
> >>>
> >>> Sorry but I don't get your point.
> >>> This patch is about the return type of compute_energy() and
> >>> em_cpu_energy(). And even if we decide to return uW instead of mW,
> >>> there is still a lot of margin.
> >>>
> >>> It's not because you need u64 for computing intermediate value that
> >>> you must returns u64
> >>
> >> The example above shows the need of u64 return value for platforms
> >> which are:
> >> - 32bit
> >> - have e.g. 16 CPUs
> >> - has big power value e.g. ~64k mW
> >> Then let's to the calc:
> >> (64k * 10000) * (16 * 800) / 1024 = ~8000mln = ~8bln
> >
> > so you return a power consumption of 8kW !!!
> >
>
> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts

Oh! you want 0.1uW precision .... This doesn't seem realistic at all.
I'm not even sure that the power model can even reach an accuracy of
1mW

> each at max freq and 80% load.
>
> Max power can be < 64Watts, which is 64k milli-Watts (< EM_MAX_POWER)
> 64k mW * 10000 --> is the 0.1uW precision
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values
  2021-07-07 10:01       ` Peter Zijlstra
@ 2021-07-07 10:23         ` Lukasz Luba
  0 siblings, 0 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07 10:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Chris.Redpath, dietmar.eggemann, morten.rasmussen,
	qperret, linux-pm, rjw, viresh.kumar, vincent.guittot, mingo,
	juri.lelli, rostedt, segall, mgorman, bristot, CCj.Yeh



On 7/7/21 11:01 AM, Peter Zijlstra wrote:
> On Wed, Jul 07, 2021 at 09:09:08AM +0100, Lukasz Luba wrote:
>> For now we would live with this simple code which improves
>> all recent 64bit platforms and is easy to take it into Android
>> common kernel. The next step would be more scattered across
>> other subsystems, so harder to backport to Android 5.4 and others.
> 
> Ah, you *do* only care about 64bit :-) So one option is to only increase
> precision for 64BIT builds, just like we do for scale_load() and
> friends.
> 

Your suggestion is potentially a good compromise :-)
We could leave the 32bit alone and they would have old code and
precision.

Thank you for the comments. Let me discuss this internally with my team.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07 10:11                   ` Vincent Guittot
@ 2021-07-07 10:29                     ` Lukasz Luba
  2021-07-07 10:32                       ` Vincent Guittot
  0 siblings, 1 reply; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07 10:29 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 7/7/21 11:11 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>

[snip]

>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts
> 
> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.
> I'm not even sure that the power model can even reach an accuracy of
> 1mW
> 

True, the EM is registering platform with 1mW precision, but 1uW
precision makes more sense for internal EAS calculation. I don't
force platforms to report 1uW power, I just want to operate on
it internally. PowerCap and DTPM also operate internally on 1uW,
so it's not that unrealistic that some kernel components want
better resolution.

But as Peter suggested, we might skip 32bit platforms for this issue.
I have to discussed this internally.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07 10:29                     ` Lukasz Luba
@ 2021-07-07 10:32                       ` Vincent Guittot
  2021-07-07 10:41                         ` Lukasz Luba
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent Guittot @ 2021-07-07 10:32 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh

On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 7/7/21 11:11 AM, Vincent Guittot wrote:
> > On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
>
> [snip]
>
> >> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts
> >
> > Oh! you want 0.1uW precision .... This doesn't seem realistic at all.
> > I'm not even sure that the power model can even reach an accuracy of
> > 1mW
> >
>
> True, the EM is registering platform with 1mW precision, but 1uW

Do you mean 1uW or 0.1uW ?

> precision makes more sense for internal EAS calculation. I don't
> force platforms to report 1uW power, I just want to operate on
> it internally. PowerCap and DTPM also operate internally on 1uW,
> so it's not that unrealistic that some kernel components want
> better resolution.
>
> But as Peter suggested, we might skip 32bit platforms for this issue.
> I have to discussed this internally.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07 10:32                       ` Vincent Guittot
@ 2021-07-07 10:41                         ` Lukasz Luba
  2021-07-07 10:50                           ` Vincent Guittot
  0 siblings, 1 reply; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07 10:41 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 7/7/21 11:32 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 7/7/21 11:11 AM, Vincent Guittot wrote:
>>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>
>> [snip]
>>
>>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts
>>>
>>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.
>>> I'm not even sure that the power model can even reach an accuracy of
>>> 1mW
>>>
>>
>> True, the EM is registering platform with 1mW precision, but 1uW
> 
> Do you mean 1uW or 0.1uW ?

In this patch set I've proposed 0.1uW, but I'm open to drop one
order of magnitude. The 1uW still be good.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07 10:41                         ` Lukasz Luba
@ 2021-07-07 10:50                           ` Vincent Guittot
  2021-07-07 11:02                             ` Lukasz Luba
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent Guittot @ 2021-07-07 10:50 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh

On Wed, 7 Jul 2021 at 12:41, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 7/7/21 11:32 AM, Vincent Guittot wrote:
> > On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >>
> >>
> >> On 7/7/21 11:11 AM, Vincent Guittot wrote:
> >>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>
> >>
> >> [snip]
> >>
> >>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts
> >>>
> >>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.
> >>> I'm not even sure that the power model can even reach an accuracy of
> >>> 1mW
> >>>
> >>
> >> True, the EM is registering platform with 1mW precision, but 1uW
> >
> > Do you mean 1uW or 0.1uW ?
>
> In this patch set I've proposed 0.1uW, but I'm open to drop one
> order of magnitude. The 1uW still be good.

I don't want to underestimate the capabilities of the power model but
I don't see which benefit you will get with 0.1uW precision
With a 1uW precision the long type currently used for the returned
value is fine for 32bits machine AFAICT

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07 10:50                           ` Vincent Guittot
@ 2021-07-07 11:02                             ` Lukasz Luba
  2021-07-07 13:53                               ` Vincent Guittot
  0 siblings, 1 reply; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07 11:02 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 7/7/21 11:50 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 12:41, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 7/7/21 11:32 AM, Vincent Guittot wrote:
>>> On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/7/21 11:11 AM, Vincent Guittot wrote:
>>>>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>>>
>>>>
>>>> [snip]
>>>>
>>>>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts
>>>>>
>>>>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.
>>>>> I'm not even sure that the power model can even reach an accuracy of
>>>>> 1mW
>>>>>
>>>>
>>>> True, the EM is registering platform with 1mW precision, but 1uW
>>>
>>> Do you mean 1uW or 0.1uW ?
>>
>> In this patch set I've proposed 0.1uW, but I'm open to drop one
>> order of magnitude. The 1uW still be good.
> 
> I don't want to underestimate the capabilities of the power model but
> I don't see which benefit you will get with 0.1uW precision
> With a 1uW precision the long type currently used for the returned
> value is fine for 32bits machine AFAICT
> 

For 1uW and 1.2Watts for one core, 4 CPUs in cluster we get:
(1200 * 1000) * (4 * 1024) = ~4.9bln
so it would need div 64 version

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07 11:02                             ` Lukasz Luba
@ 2021-07-07 13:53                               ` Vincent Guittot
  2021-07-07 14:25                                 ` Lukasz Luba
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent Guittot @ 2021-07-07 13:53 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh

On Wed, 7 Jul 2021 at 13:02, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 7/7/21 11:50 AM, Vincent Guittot wrote:
> > On Wed, 7 Jul 2021 at 12:41, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >>
> >>
> >> On 7/7/21 11:32 AM, Vincent Guittot wrote:
> >>> On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/7/21 11:11 AM, Vincent Guittot wrote:
> >>>>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>>>
> >>>>
> >>>> [snip]
> >>>>
> >>>>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts
> >>>>>
> >>>>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.
> >>>>> I'm not even sure that the power model can even reach an accuracy of
> >>>>> 1mW
> >>>>>
> >>>>
> >>>> True, the EM is registering platform with 1mW precision, but 1uW
> >>>
> >>> Do you mean 1uW or 0.1uW ?
> >>
> >> In this patch set I've proposed 0.1uW, but I'm open to drop one
> >> order of magnitude. The 1uW still be good.
> >
> > I don't want to underestimate the capabilities of the power model but
> > I don't see which benefit you will get with 0.1uW precision
> > With a 1uW precision the long type currently used for the returned
> > value is fine for 32bits machine AFAICT
> >
>
> For 1uW and 1.2Watts for one core, 4 CPUs in cluster we get:
> (1200 * 1000) * (4 * 1024) = ~4.9bln
> so it would need div 64 version

But as stated before, this is an internal computation step and doesn't
have to be reflected in the returned value which can stay a long

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy
  2021-07-07 13:53                               ` Vincent Guittot
@ 2021-07-07 14:25                                 ` Lukasz Luba
  0 siblings, 0 replies; 33+ messages in thread
From: Lukasz Luba @ 2021-07-07 14:25 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Chris Redpath, Dietmar Eggemann, Morten Rasmussen,
	Quentin Perret, open list:THERMAL, Peter Zijlstra,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Steven Rostedt, segall, Mel Gorman, Daniel Bristot de Oliveira,
	CCj.Yeh



On 7/7/21 2:53 PM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 13:02, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 7/7/21 11:50 AM, Vincent Guittot wrote:
>>> On Wed, 7 Jul 2021 at 12:41, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/7/21 11:32 AM, Vincent Guittot wrote:
>>>>> On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7/7/21 11:11 AM, Vincent Guittot wrote:
>>>>>>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>>>>>
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts
>>>>>>>
>>>>>>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.
>>>>>>> I'm not even sure that the power model can even reach an accuracy of
>>>>>>> 1mW
>>>>>>>
>>>>>>
>>>>>> True, the EM is registering platform with 1mW precision, but 1uW
>>>>>
>>>>> Do you mean 1uW or 0.1uW ?
>>>>
>>>> In this patch set I've proposed 0.1uW, but I'm open to drop one
>>>> order of magnitude. The 1uW still be good.
>>>
>>> I don't want to underestimate the capabilities of the power model but
>>> I don't see which benefit you will get with 0.1uW precision
>>> With a 1uW precision the long type currently used for the returned
>>> value is fine for 32bits machine AFAICT
>>>
>>
>> For 1uW and 1.2Watts for one core, 4 CPUs in cluster we get:
>> (1200 * 1000) * (4 * 1024) = ~4.9bln
>> so it would need div 64 version
> 
> But as stated before, this is an internal computation step and doesn't
> have to be reflected in the returned value which can stay a long
> 

I agree, we might scale down the result if it's too big, before the
return. We could figure this out at the EM registration point, so a
proper shift might be applied for such platform. It might enable both
32bit and 64bit platforms to avoid the rounding error. Let me experiment
with some code to check all the cases.
Thank you for the comments!


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-07-07 14:25 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-25 15:26 [PATCH 0/3] Improve EAS energy estimation and increase precision Lukasz Luba
2021-06-25 15:26 ` [PATCH 1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy Lukasz Luba
2021-06-30 17:01   ` Rafael J. Wysocki
2021-06-30 17:28     ` Lukasz Luba
2021-07-02 19:07       ` Lukasz Luba
2021-07-07  7:07   ` Vincent Guittot
2021-07-07  7:49     ` Lukasz Luba
2021-07-07  8:00       ` Vincent Guittot
2021-07-07  8:23         ` Lukasz Luba
2021-07-07  9:37           ` Vincent Guittot
2021-07-07  9:48             ` Lukasz Luba
2021-07-07  9:56               ` Vincent Guittot
2021-07-07 10:06                 ` Lukasz Luba
2021-07-07 10:11                   ` Vincent Guittot
2021-07-07 10:29                     ` Lukasz Luba
2021-07-07 10:32                       ` Vincent Guittot
2021-07-07 10:41                         ` Lukasz Luba
2021-07-07 10:50                           ` Vincent Guittot
2021-07-07 11:02                             ` Lukasz Luba
2021-07-07 13:53                               ` Vincent Guittot
2021-07-07 14:25                                 ` Lukasz Luba
2021-07-07  9:45           ` Dietmar Eggemann
2021-07-07  9:54             ` Lukasz Luba
2021-06-25 15:26 ` [PATCH 2/3] PM: EM: Make em_cpu_energy() able to return bigger values Lukasz Luba
2021-07-05 12:44   ` Dietmar Eggemann
2021-07-06 19:44     ` Lukasz Luba
2021-07-07  7:07   ` Peter Zijlstra
2021-07-07  8:09     ` Lukasz Luba
2021-07-07 10:01       ` Peter Zijlstra
2021-07-07 10:23         ` Lukasz Luba
2021-06-25 15:26 ` [PATCH 3/3] PM: EM: Increase energy calculation precision Lukasz Luba
2021-07-05 12:45   ` Dietmar Eggemann
2021-07-06 19:51     ` Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).