linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/23] Introduce runtime modifiable Energy Model
@ 2024-01-17  9:56 Lukasz Luba
  2024-01-17  9:56 ` [PATCH v7 01/23] PM: EM: Add missing newline for the message log Lukasz Luba
                   ` (24 more replies)
  0 siblings, 25 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:56 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Hi all,

This patch set adds a new feature which allows to modify Energy Model (EM)
power values at runtime. It will allow to better reflect power model of
a recent SoCs and silicon. Different characteristics of the power usage
can be leveraged and thus better decisions made during task placement in EAS.

It also optimizes the EAS code hot path, by removing 2 division and 1
multiplication operations in the em_cpu_energy(). Speed-up results:
the em_cpu_energy() should run faster on the Big CPU by 1.43x and on the
Little CPU by 1.69x (mainline board RockPi 4B).

This patch set is part of feature set known as Dynamic Energy Model. It has been
presented and discussed recently at OSPM2023 [3].


The concepts:
1. The CPU power usage can vary due to the workload that it's running or due
to the temperature of the SoC. The same workload can use more power when the
temperature of the silicon has increased (e.g. due to hot GPU or ISP).
In such situation the EM can be adjusted and reflect the fact of increased
power usage. That power increase is due to static power
(sometimes called simply: leakage). The CPUs in recent SoCs are different.
We have heterogeneous SoCs with 3 (or even 4) different microarchitectures.
They are also built differently with High Performance (HP) cells or
Low Power (LP) cells. They are affected by the temperature increase
differently: HP cells have bigger leakage. The SW model can leverage that
knowledge.

2. It is also possible to change the EM to better reflect the currently
running workload. Usually the EM is derived from some average power values
taken from experiments with benchmark (e.g. Dhrystone). The model derived
from such scenario might not represent properly the workloads usually running
on the device. Therefore, runtime modification of the EM allows to switch to
a different model, when there is a need.

3. The EM can be adjusted after boot, when all the modules are loaded and
more information about the SoC is available e.g. chip binning. This would help
to better reflect the silicon characteristics. Thus, this EM modification
API allows it now. It wasn't possible in the past and the EM had to be
'set in stone'.

Example of such runtime modification after boot can be found in a follow-up
patch set. It adds the OPP API and usage in Exynos5 SoC driver after the
voltage values has been adjusted and power changes [5].

More detailed explanation and background can be found in presentations
during LPC2022 [1][2] or in the documentation patches.

Some test results:
The EM can be updated to fit better the workload type. In the case below the EM
has been updated for the Jankbench test on Pixel6 (running v5.18 w/ mainline backports
for the scheduler bits). The Jankbench was run 10 times for those two configurations,
to get more reliable data.

1. Janky frames percentage
+--------+-----------------+---------------------+-------+-----------+
| metric |    variable     |       kernel        | value | perc_diff |
+--------+-----------------+---------------------+-------+-----------+
| gmean  | jank_percentage | EM_default          |  2.0  |   0.0%    |
| gmean  | jank_percentage | EM_modified_runtime |  1.3  |  -35.33%  |
+--------+-----------------+---------------------+-------+-----------+

2. Avg frame render time duration
+--------+---------------------+---------------------+-------+-----------+
| metric |      variable       |       kernel        | value | perc_diff |
+--------+---------------------+---------------------+-------+-----------+
| gmean  | mean_frame_duration | EM_default          | 10.5  |   0.0%    |
| gmean  | mean_frame_duration | EM_modified_runtime |  9.6  |  -8.52%   |
+--------+---------------------+---------------------+-------+-----------+

3. Max frame render time duration
+--------+--------------------+---------------------+-------+-----------+
| metric |      variable      |       kernel        | value | perc_diff |
+--------+--------------------+---------------------+-------+-----------+
| gmean  | max_frame_duration | EM_default          | 251.6 |   0.0%    |
| gmean  | max_frame_duration | EM_modified_runtime | 115.5 |  -54.09%  |
+--------+--------------------+---------------------+-------+-----------+

4. OS overutilized state percentage (when EAS is not working)
+--------------+---------------------+------+------------+------------+
|    metric    |       wa_path       | time | total_time | percentage |
+--------------+---------------------+------+------------+------------+
| overutilized | EM_default          | 1.65 |   253.38   |    0.65    |
| overutilized | EM_modified_runtime | 1.4  |   277.5    |    0.51    |
+--------------+---------------------+------+------------+------------+

5. All CPUs (Little+Mid+Big) power values in mW
+------------+--------+---------------------+-------+-----------+
|  channel   | metric |       kernel        | value | perc_diff |
+------------+--------+---------------------+-------+-----------+
|    CPU     | gmean  | EM_default          | 142.1 |   0.0%    |
|    CPU     | gmean  | EM_modified_runtime | 131.8 |  -7.27%   |
+------------+--------+---------------------+-------+-----------+

The time cost to update the EM decreased in this v5 vs v4:
big: 5us vs 2us -> 2.6x faster
mid: 9us vs 3us -> 3x faster
little: 16us vs 16us -> no change

We still have to update the inefficiency in the cpufreq framework, thus
a bit of overhead will be there.

These series is based on linux next tree, tag 'next-20240112', since there
are changes and revert which touch em_cpu_energy().

Changelog:
v7:
- dropped em_table_get/put() (Rafael)
- renamed memory function to em_table_alloc/free() (Rafael)
- use explicit rcu_read_lock/unlock() instead of wrappers and aligned
  frameworks & drivers using EM (Rafael)
- adjusted documentation to the new functions
- fixed doxygen comments (Rafael)
- renamed 'refcount' to 'kref' (Rafael)
- changed patch headers according to comments (Rafael)
- rebased on 'next-20240112' to get Ingo's revert affecting energy_model.h
v6 [6]:
- renamed 'runtime_table' to 'em_table' (Dietmar, Rafael)
- dropped kref increment during allocation (Qais)
- renamed em_inc/dec_usage() to em_table_inc/dec() (Qais)
- fixed comment description and left old comment block with small
  adjustment in em_cpu_energy() patch 15/23 (Dietmar)
- added platform name which was used for speed-up testing (Dietmar)
- changed patch header description keep it small not repeating the in-code
  comment describing 'cost' in em_cpu_energy() patch 15/23 (Dietmar)
- added check and warning in em_cpu_energy() about RCU lock held (Qais, Xuewen)
- changed nr_perf_states usage in the patch 7/23 (Dietmar)
- changed documentation according to comments (Dietmar)
- changed in-code comment in patch 11/23 according to comments (Dietmar)
- changed example driver function 'ctx' argument in the documentation (Xuewen)
- changed the example driver in documentation, dropped module_exit and
  added em_free_table() explicit in the update function
- fixed comments in various patch headers (Dietmar)
- fixed Doxygen comment s/@state/@table patch 4/23 (Dietmar)
- added information in the cover letter about:
-- optimization in EAS hot code path
-- follow-up patch set which adds OPP support and modifies EM for Exynos5
- rebased on 'next-20240104' to avoid collision with other code touching
  em_cpu_energy()
v5 changes are here [4]

Regards,
Lukasz Luba

[1] https://lpc.events/event/16/contributions/1341/attachments/955/1873/Dynamic_Energy_Model_to_handle_leakage_power.pdf
[2] https://lpc.events/event/16/contributions/1194/attachments/1114/2139/LPC2022_Energy_model_accuracy.pdf
[3] https://www.youtube.com/watch?v=2C-5uikSbtM&list=PL0fKordpLTjKsBOUcZqnzlHShri4YBL1H
[4] https://lore.kernel.org/lkml/20231129110853.94344-1-lukasz.luba@arm.com/
[5] https://lore.kernel.org/lkml/20231220110339.1065505-1-lukasz.luba@arm.com/
[6] https://lore.kernel.org/lkml/20240104171553.2080674-1-lukasz.luba@arm.com/


Lukasz Luba (23):
  PM: EM: Add missing newline for the message log
  PM: EM: Extend em_cpufreq_update_efficiencies() argument list
  PM: EM: Find first CPU active while updating OPP efficiency
  PM: EM: Refactor em_pd_get_efficient_state() to be more flexible
  PM: EM: Introduce em_compute_costs()
  PM: EM: Check if the get_cost() callback is present in
    em_compute_costs()
  PM: EM: Split the allocation and initialization of the EM table
  PM: EM: Introduce runtime modifiable table
  PM: EM: Use runtime modified EM for CPUs energy estimation in EAS
  PM: EM: Add functions for memory allocations for new EM tables
  PM: EM: Introduce em_dev_update_perf_domain() for EM updates
  PM: EM: Add em_perf_state_from_pd() to get performance states table
  PM: EM: Add performance field to struct em_perf_state and optimize
  PM: EM: Support late CPUs booting and capacity adjustment
  PM: EM: Optimize em_cpu_energy() and remove division
  powercap/dtpm_cpu: Use new Energy Model interface to get table
  powercap/dtpm_devfreq: Use new Energy Model interface to get table
  drivers/thermal/cpufreq_cooling: Use new Energy Model interface
  drivers/thermal/devfreq_cooling: Use new Energy Model interface
  PM: EM: Change debugfs configuration to use runtime EM table data
  PM: EM: Remove old table
  PM: EM: Add em_dev_compute_costs()
  Documentation: EM: Update with runtime modification design

 Documentation/power/energy-model.rst | 183 ++++++++++-
 drivers/powercap/dtpm_cpu.c          |  39 ++-
 drivers/powercap/dtpm_devfreq.c      |  34 +-
 drivers/thermal/cpufreq_cooling.c    |  45 ++-
 drivers/thermal/devfreq_cooling.c    |  49 ++-
 include/linux/energy_model.h         | 165 ++++++----
 kernel/power/energy_model.c          | 472 +++++++++++++++++++++++----
 7 files changed, 819 insertions(+), 168 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v7 01/23] PM: EM: Add missing newline for the message log
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
@ 2024-01-17  9:56 ` Lukasz Luba
  2024-01-17 11:02   ` Hongyan Xia
  2024-01-17  9:56 ` [PATCH v7 02/23] PM: EM: Extend em_cpufreq_update_efficiencies() argument list Lukasz Luba
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:56 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Fix missing newline for the string long in the error code path.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/power/energy_model.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 7b44f5b89fa1..8b9dd4a39f63 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -250,7 +250,7 @@ static void em_cpufreq_update_efficiencies(struct device *dev)
 
 	policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd)));
 	if (!policy) {
-		dev_warn(dev, "EM: Access to CPUFreq policy failed");
+		dev_warn(dev, "EM: Access to CPUFreq policy failed\n");
 		return;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 02/23] PM: EM: Extend em_cpufreq_update_efficiencies() argument list
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
  2024-01-17  9:56 ` [PATCH v7 01/23] PM: EM: Add missing newline for the message log Lukasz Luba
@ 2024-01-17  9:56 ` Lukasz Luba
  2024-01-17 11:10   ` Hongyan Xia
  2024-01-17  9:56 ` [PATCH v7 03/23] PM: EM: Find first CPU active while updating OPP efficiency Lukasz Luba
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:56 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

In order to prepare the code for the modifiable EM perf_state table,
make em_cpufreq_update_efficiencies() take a pointer to the EM table
as its second argument and modify it to use that new argument instead
of the 'table' member of dev->em_pd.

No functional impact.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/power/energy_model.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 8b9dd4a39f63..42486674b834 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -237,10 +237,10 @@ static int em_create_pd(struct device *dev, int nr_states,
 	return 0;
 }
 
-static void em_cpufreq_update_efficiencies(struct device *dev)
+static void
+em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table)
 {
 	struct em_perf_domain *pd = dev->em_pd;
-	struct em_perf_state *table;
 	struct cpufreq_policy *policy;
 	int found = 0;
 	int i;
@@ -254,8 +254,6 @@ static void em_cpufreq_update_efficiencies(struct device *dev)
 		return;
 	}
 
-	table = pd->table;
-
 	for (i = 0; i < pd->nr_perf_states; i++) {
 		if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
 			continue;
@@ -397,7 +395,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 
 	dev->em_pd->flags |= flags;
 
-	em_cpufreq_update_efficiencies(dev);
+	em_cpufreq_update_efficiencies(dev, dev->em_pd->table);
 
 	em_debug_create_pd(dev);
 	dev_info(dev, "EM: created perf domain\n");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 03/23] PM: EM: Find first CPU active while updating OPP efficiency
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
  2024-01-17  9:56 ` [PATCH v7 01/23] PM: EM: Add missing newline for the message log Lukasz Luba
  2024-01-17  9:56 ` [PATCH v7 02/23] PM: EM: Extend em_cpufreq_update_efficiencies() argument list Lukasz Luba
@ 2024-01-17  9:56 ` Lukasz Luba
  2024-01-17 12:05   ` Hongyan Xia
  2024-01-17  9:56 ` [PATCH v7 04/23] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible Lukasz Luba
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:56 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

The Energy Model might be updated at runtime and the energy efficiency
for each OPP may change. Thus, there is a need to update also the
cpufreq framework and make it aligned to the new values. In order to
do that, use a first active CPU from the Performance Domain. This is
needed since the first CPU in the cpumask might be offline when we
run this code path.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/power/energy_model.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 42486674b834..aa7c89f9e115 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -243,12 +243,19 @@ em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table)
 	struct em_perf_domain *pd = dev->em_pd;
 	struct cpufreq_policy *policy;
 	int found = 0;
-	int i;
+	int i, cpu;
 
 	if (!_is_cpu_device(dev) || !pd)
 		return;
 
-	policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd)));
+	/* Try to get a CPU which is active and in this PD */
+	cpu = cpumask_first_and(em_span_cpus(pd), cpu_active_mask);
+	if (cpu >= nr_cpu_ids) {
+		dev_warn(dev, "EM: No online CPU for CPUFreq policy\n");
+		return;
+	}
+
+	policy = cpufreq_cpu_get(cpu);
 	if (!policy) {
 		dev_warn(dev, "EM: Access to CPUFreq policy failed\n");
 		return;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 04/23] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (2 preceding siblings ...)
  2024-01-17  9:56 ` [PATCH v7 03/23] PM: EM: Find first CPU active while updating OPP efficiency Lukasz Luba
@ 2024-01-17  9:56 ` Lukasz Luba
  2024-01-17 12:45   ` Hongyan Xia
  2024-01-17  9:56 ` [PATCH v7 05/23] PM: EM: Introduce em_compute_costs() Lukasz Luba
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:56 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

The Energy Model (EM) is going to support runtime modification. There
are going to be 2 EM tables which store information. This patch aims
to prepare the code to be generic and use one of the tables. The function
will no longer get a pointer to 'struct em_perf_domain' (the EM) but
instead a pointer to 'struct em_perf_state' (which is one of the EM's
tables).

Prepare em_pd_get_efficient_state() for the upcoming changes and
make it possible to be re-used. Return an index for the best performance
state for a given EM table. The function arguments that are introduced
should allow to work on different performance state arrays. The caller of
em_pd_get_efficient_state() should be able to use the index either
on the default or the modifiable EM table.

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index c19e7effe764..b01277b17946 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -175,33 +175,35 @@ void em_dev_unregister_perf_domain(struct device *dev);
 
 /**
  * em_pd_get_efficient_state() - Get an efficient performance state from the EM
- * @pd   : Performance domain for which we want an efficient frequency
- * @freq : Frequency to map with the EM
+ * @table:		List of performance states, in ascending order
+ * @nr_perf_states:	Number of performance states
+ * @freq:		Frequency to map with the EM
+ * @pd_flags:		Performance Domain flags
  *
  * It is called from the scheduler code quite frequently and as a consequence
  * doesn't implement any check.
  *
- * Return: An efficient performance state, high enough to meet @freq
+ * Return: An efficient performance state id, high enough to meet @freq
  * requirement.
  */
-static inline
-struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd,
-						unsigned long freq)
+static inline int
+em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states,
+			  unsigned long freq, unsigned long pd_flags)
 {
 	struct em_perf_state *ps;
 	int i;
 
-	for (i = 0; i < pd->nr_perf_states; i++) {
-		ps = &pd->table[i];
+	for (i = 0; i < nr_perf_states; i++) {
+		ps = &table[i];
 		if (ps->frequency >= freq) {
-			if (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
+			if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
 			    ps->flags & EM_PERF_STATE_INEFFICIENT)
 				continue;
-			break;
+			return i;
 		}
 	}
 
-	return ps;
+	return nr_perf_states - 1;
 }
 
 /**
@@ -226,7 +228,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 {
 	unsigned long freq, ref_freq, scale_cpu;
 	struct em_perf_state *ps;
-	int cpu;
+	int cpu, i;
 
 	if (!sum_util)
 		return 0;
@@ -251,7 +253,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 * Find the lowest performance state of the Energy Model above the
 	 * requested frequency.
 	 */
-	ps = em_pd_get_efficient_state(pd, freq);
+	i = em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq,
+				      pd->flags);
+	ps = &pd->table[i];
 
 	/*
 	 * The capacity of a CPU in the domain at the performance state (ps)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 05/23] PM: EM: Introduce em_compute_costs()
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (3 preceding siblings ...)
  2024-01-17  9:56 ` [PATCH v7 04/23] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible Lukasz Luba
@ 2024-01-17  9:56 ` Lukasz Luba
  2024-01-17  9:56 ` [PATCH v7 06/23] PM: EM: Check if the get_cost() callback is present in em_compute_costs() Lukasz Luba
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:56 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Move the EM costs computation code into a new dedicated function,
em_compute_costs(), that can be reused in other places in the future.

This change is not expected to alter the general functionality.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/power/energy_model.c | 72 ++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 29 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index aa7c89f9e115..3bea930410c6 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -103,14 +103,52 @@ static void em_debug_create_pd(struct device *dev) {}
 static void em_debug_remove_pd(struct device *dev) {}
 #endif
 
+static int em_compute_costs(struct device *dev, struct em_perf_state *table,
+			    struct em_data_callback *cb, int nr_states,
+			    unsigned long flags)
+{
+	unsigned long prev_cost = ULONG_MAX;
+	u64 fmax;
+	int i, ret;
+
+	/* Compute the cost of each performance state. */
+	fmax = (u64) table[nr_states - 1].frequency;
+	for (i = nr_states - 1; i >= 0; i--) {
+		unsigned long power_res, cost;
+
+		if (flags & EM_PERF_DOMAIN_ARTIFICIAL) {
+			ret = cb->get_cost(dev, table[i].frequency, &cost);
+			if (ret || !cost || cost > EM_MAX_POWER) {
+				dev_err(dev, "EM: invalid cost %lu %d\n",
+					cost, ret);
+				return -EINVAL;
+			}
+		} else {
+			power_res = table[i].power;
+			cost = div64_u64(fmax * power_res, table[i].frequency);
+		}
+
+		table[i].cost = cost;
+
+		if (table[i].cost >= prev_cost) {
+			table[i].flags = EM_PERF_STATE_INEFFICIENT;
+			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
+				table[i].frequency);
+		} else {
+			prev_cost = table[i].cost;
+		}
+	}
+
+	return 0;
+}
+
 static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 				int nr_states, struct em_data_callback *cb,
 				unsigned long flags)
 {
-	unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX;
+	unsigned long power, freq, prev_freq = 0;
 	struct em_perf_state *table;
 	int i, ret;
-	u64 fmax;
 
 	table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL);
 	if (!table)
@@ -154,33 +192,9 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 		table[i].frequency = prev_freq = freq;
 	}
 
-	/* Compute the cost of each performance state. */
-	fmax = (u64) table[nr_states - 1].frequency;
-	for (i = nr_states - 1; i >= 0; i--) {
-		unsigned long power_res, cost;
-
-		if (flags & EM_PERF_DOMAIN_ARTIFICIAL) {
-			ret = cb->get_cost(dev, table[i].frequency, &cost);
-			if (ret || !cost || cost > EM_MAX_POWER) {
-				dev_err(dev, "EM: invalid cost %lu %d\n",
-					cost, ret);
-				goto free_ps_table;
-			}
-		} else {
-			power_res = table[i].power;
-			cost = div64_u64(fmax * power_res, table[i].frequency);
-		}
-
-		table[i].cost = cost;
-
-		if (table[i].cost >= prev_cost) {
-			table[i].flags = EM_PERF_STATE_INEFFICIENT;
-			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
-				table[i].frequency);
-		} else {
-			prev_cost = table[i].cost;
-		}
-	}
+	ret = em_compute_costs(dev, table, cb, nr_states, flags);
+	if (ret)
+		goto free_ps_table;
 
 	pd->table = table;
 	pd->nr_perf_states = nr_states;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 06/23] PM: EM: Check if the get_cost() callback is present in em_compute_costs()
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (4 preceding siblings ...)
  2024-01-17  9:56 ` [PATCH v7 05/23] PM: EM: Introduce em_compute_costs() Lukasz Luba
@ 2024-01-17  9:56 ` Lukasz Luba
  2024-01-17  9:56 ` [PATCH v7 07/23] PM: EM: Split the allocation and initialization of the EM table Lukasz Luba
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:56 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Subsequent changes will introduce a case in which 'cb->get_cost' may
not be set in em_compute_costs(), so add a check to ensure that it is
not NULL before attempting to dereference it.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/power/energy_model.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 3bea930410c6..3c8542443dd4 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -116,7 +116,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 	for (i = nr_states - 1; i >= 0; i--) {
 		unsigned long power_res, cost;
 
-		if (flags & EM_PERF_DOMAIN_ARTIFICIAL) {
+		if ((flags & EM_PERF_DOMAIN_ARTIFICIAL) && cb->get_cost) {
 			ret = cb->get_cost(dev, table[i].frequency, &cost);
 			if (ret || !cost || cost > EM_MAX_POWER) {
 				dev_err(dev, "EM: invalid cost %lu %d\n",
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 07/23] PM: EM: Split the allocation and initialization of the EM table
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (5 preceding siblings ...)
  2024-01-17  9:56 ` [PATCH v7 06/23] PM: EM: Check if the get_cost() callback is present in em_compute_costs() Lukasz Luba
@ 2024-01-17  9:56 ` Lukasz Luba
  2024-01-17  9:56 ` [PATCH v7 08/23] PM: EM: Introduce runtime modifiable table Lukasz Luba
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:56 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Split the process of allocation and data initialization for the EM table.
The upcoming changes for modifiable EM will use it.

This change is not expected to alter the general functionality.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/power/energy_model.c | 55 ++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 3c8542443dd4..e7826403ae1d 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -142,18 +142,26 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 	return 0;
 }
 
+static int em_allocate_perf_table(struct em_perf_domain *pd,
+				  int nr_states)
+{
+	pd->table = kcalloc(nr_states, sizeof(struct em_perf_state),
+			    GFP_KERNEL);
+	if (!pd->table)
+		return -ENOMEM;
+
+	return 0;
+}
+
 static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
-				int nr_states, struct em_data_callback *cb,
+				struct em_perf_state *table,
+				struct em_data_callback *cb,
 				unsigned long flags)
 {
 	unsigned long power, freq, prev_freq = 0;
-	struct em_perf_state *table;
+	int nr_states = pd->nr_perf_states;
 	int i, ret;
 
-	table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL);
-	if (!table)
-		return -ENOMEM;
-
 	/* Build the list of performance states for this performance domain */
 	for (i = 0, freq = 0; i < nr_states; i++, freq++) {
 		/*
@@ -165,7 +173,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 		if (ret) {
 			dev_err(dev, "EM: invalid perf. state: %d\n",
 				ret);
-			goto free_ps_table;
+			return -EINVAL;
 		}
 
 		/*
@@ -175,7 +183,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 		if (freq <= prev_freq) {
 			dev_err(dev, "EM: non-increasing freq: %lu\n",
 				freq);
-			goto free_ps_table;
+			return -EINVAL;
 		}
 
 		/*
@@ -185,7 +193,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 		if (!power || power > EM_MAX_POWER) {
 			dev_err(dev, "EM: invalid power: %lu\n",
 				power);
-			goto free_ps_table;
+			return -EINVAL;
 		}
 
 		table[i].power = power;
@@ -194,16 +202,9 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 
 	ret = em_compute_costs(dev, table, cb, nr_states, flags);
 	if (ret)
-		goto free_ps_table;
-
-	pd->table = table;
-	pd->nr_perf_states = nr_states;
+		return -EINVAL;
 
 	return 0;
-
-free_ps_table:
-	kfree(table);
-	return -EINVAL;
 }
 
 static int em_create_pd(struct device *dev, int nr_states,
@@ -234,11 +235,15 @@ static int em_create_pd(struct device *dev, int nr_states,
 			return -ENOMEM;
 	}
 
-	ret = em_create_perf_table(dev, pd, nr_states, cb, flags);
-	if (ret) {
-		kfree(pd);
-		return ret;
-	}
+	pd->nr_perf_states = nr_states;
+
+	ret = em_allocate_perf_table(pd, nr_states);
+	if (ret)
+		goto free_pd;
+
+	ret = em_create_perf_table(dev, pd, pd->table, cb, flags);
+	if (ret)
+		goto free_pd_table;
 
 	if (_is_cpu_device(dev))
 		for_each_cpu(cpu, cpus) {
@@ -249,6 +254,12 @@ static int em_create_pd(struct device *dev, int nr_states,
 	dev->em_pd = pd;
 
 	return 0;
+
+free_pd_table:
+	kfree(pd->table);
+free_pd:
+	kfree(pd);
+	return -EINVAL;
 }
 
 static void
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 08/23] PM: EM: Introduce runtime modifiable table
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (6 preceding siblings ...)
  2024-01-17  9:56 ` [PATCH v7 07/23] PM: EM: Split the allocation and initialization of the EM table Lukasz Luba
@ 2024-01-17  9:56 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 09/23] PM: EM: Use runtime modified EM for CPUs energy estimation in EAS Lukasz Luba
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:56 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

The new runtime table can be populated with a new power data to better
reflect the actual efficiency of the device e.g. CPU. The power can vary
over time e.g. due to the SoC temperature change. Higher temperature can
increase power values. For longer running scenarios, such as game or
camera, when also other devices are used (e.g. GPU, ISP) the CPU power can
change. The new EM framework is able to addresses this issue and change
the EM data at runtime safely.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 12 ++++++++
 kernel/power/energy_model.c  | 53 ++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index b01277b17946..585c5ffc898b 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -36,9 +36,20 @@ struct em_perf_state {
  */
 #define EM_PERF_STATE_INEFFICIENT BIT(0)
 
+/**
+ * struct em_perf_table - Performance states table
+ * @rcu:	RCU used for safe access and destruction
+ * @state:	List of performance states, in ascending order
+ */
+struct em_perf_table {
+	struct rcu_head rcu;
+	struct em_perf_state state[];
+};
+
 /**
  * struct em_perf_domain - Performance domain
  * @table:		List of performance states, in ascending order
+ * @em_table:		Pointer to the runtime modifiable em_perf_table
  * @nr_perf_states:	Number of performance states
  * @flags:		See "em_perf_domain flags"
  * @cpus:		Cpumask covering the CPUs of the domain. It's here
@@ -54,6 +65,7 @@ struct em_perf_state {
  */
 struct em_perf_domain {
 	struct em_perf_state *table;
+	struct em_perf_table __rcu *em_table;
 	int nr_perf_states;
 	unsigned long flags;
 	unsigned long cpus[];
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index e7826403ae1d..c03010084208 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -23,6 +23,9 @@
  */
 static DEFINE_MUTEX(em_pd_mutex);
 
+static void em_cpufreq_update_efficiencies(struct device *dev,
+					   struct em_perf_state *table);
+
 static bool _is_cpu_device(struct device *dev)
 {
 	return (dev->bus == &cpu_subsys);
@@ -103,6 +106,31 @@ static void em_debug_create_pd(struct device *dev) {}
 static void em_debug_remove_pd(struct device *dev) {}
 #endif
 
+static void em_destroy_table_rcu(struct rcu_head *rp)
+{
+	struct em_perf_table __rcu *table;
+
+	table = container_of(rp, struct em_perf_table, rcu);
+	kfree(table);
+}
+
+static void em_free_table(struct em_perf_table __rcu *table)
+{
+	call_rcu(&table->rcu, em_destroy_table_rcu);
+}
+
+static struct em_perf_table __rcu *
+em_allocate_table(struct em_perf_domain *pd)
+{
+	struct em_perf_table __rcu *table;
+	int table_size;
+
+	table_size = sizeof(struct em_perf_state) * pd->nr_perf_states;
+
+	table = kzalloc(sizeof(*table) + table_size, GFP_KERNEL);
+	return table;
+}
+
 static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 			    struct em_data_callback *cb, int nr_states,
 			    unsigned long flags)
@@ -153,6 +181,24 @@ static int em_allocate_perf_table(struct em_perf_domain *pd,
 	return 0;
 }
 
+static int em_create_runtime_table(struct em_perf_domain *pd)
+{
+	struct em_perf_table __rcu *table;
+	int table_size;
+
+	table = em_allocate_table(pd);
+	if (!table)
+		return -ENOMEM;
+
+	/* Initialize runtime table with existing data */
+	table_size = sizeof(struct em_perf_state) * pd->nr_perf_states;
+	memcpy(table->state, pd->table, table_size);
+
+	rcu_assign_pointer(pd->em_table, table);
+
+	return 0;
+}
+
 static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 				struct em_perf_state *table,
 				struct em_data_callback *cb,
@@ -245,6 +291,10 @@ static int em_create_pd(struct device *dev, int nr_states,
 	if (ret)
 		goto free_pd_table;
 
+	ret = em_create_runtime_table(pd);
+	if (ret)
+		goto free_pd_table;
+
 	if (_is_cpu_device(dev))
 		for_each_cpu(cpu, cpus) {
 			cpu_dev = get_cpu_device(cpu);
@@ -461,6 +511,9 @@ void em_dev_unregister_perf_domain(struct device *dev)
 	em_debug_remove_pd(dev);
 
 	kfree(dev->em_pd->table);
+
+	em_free_table(dev->em_pd->em_table);
+
 	kfree(dev->em_pd);
 	dev->em_pd = NULL;
 	mutex_unlock(&em_pd_mutex);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 09/23] PM: EM: Use runtime modified EM for CPUs energy estimation in EAS
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (7 preceding siblings ...)
  2024-01-17  9:56 ` [PATCH v7 08/23] PM: EM: Introduce runtime modifiable table Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 10/23] PM: EM: Add functions for memory allocations for new EM tables Lukasz Luba
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

The new Energy Model (EM) supports runtime modification of the performance
state table to better model the power used by the SoC. Use this new
feature to improve energy estimation and therefore task placement in
Energy Aware Scheduler (EAS).

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 585c5ffc898b..fcd8de1a2dbd 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -239,9 +239,14 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 				unsigned long allowed_cpu_cap)
 {
 	unsigned long freq, ref_freq, scale_cpu;
+	struct em_perf_table *em_table;
 	struct em_perf_state *ps;
 	int cpu, i;
 
+#ifdef CONFIG_SCHED_DEBUG
+	WARN_ONCE(!rcu_read_lock_held(), "EM: rcu read lock needed\n");
+#endif
+
 	if (!sum_util)
 		return 0;
 
@@ -265,9 +270,10 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 * Find the lowest performance state of the Energy Model above the
 	 * requested frequency.
 	 */
-	i = em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq,
-				      pd->flags);
-	ps = &pd->table[i];
+	em_table = rcu_dereference(pd->em_table);
+	i = em_pd_get_efficient_state(em_table->state, pd->nr_perf_states,
+				      freq, pd->flags);
+	ps = &em_table->state[i];
 
 	/*
 	 * The capacity of a CPU in the domain at the performance state (ps)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 10/23] PM: EM: Add functions for memory allocations for new EM tables
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (8 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 09/23] PM: EM: Use runtime modified EM for CPUs energy estimation in EAS Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 11/23] PM: EM: Introduce em_dev_update_perf_domain() for EM updates Lukasz Luba
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

The runtime modified EM table can be provided from drivers. Create
mechanism which allows safely allocate and free the table for device
drivers. The same table can be used by the EAS in task scheduler code
paths, so make sure the memory is not freed when the device driver module
is unloaded.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 11 +++++++++++
 kernel/power/energy_model.c  | 38 +++++++++++++++++++++++++++++++-----
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index fcd8de1a2dbd..e44c5080407f 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -5,6 +5,7 @@
 #include <linux/device.h>
 #include <linux/jump_label.h>
 #include <linux/kobject.h>
+#include <linux/kref.h>
 #include <linux/rcupdate.h>
 #include <linux/sched/cpufreq.h>
 #include <linux/sched/topology.h>
@@ -39,10 +40,12 @@ struct em_perf_state {
 /**
  * struct em_perf_table - Performance states table
  * @rcu:	RCU used for safe access and destruction
+ * @kref:	Reference counter to track the users
  * @state:	List of performance states, in ascending order
  */
 struct em_perf_table {
 	struct rcu_head rcu;
+	struct kref kref;
 	struct em_perf_state state[];
 };
 
@@ -184,6 +187,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 				struct em_data_callback *cb, cpumask_t *span,
 				bool microwatts);
 void em_dev_unregister_perf_domain(struct device *dev);
+struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd);
+void em_table_free(struct em_perf_table __rcu *table);
 
 /**
  * em_pd_get_efficient_state() - Get an efficient performance state from the EM
@@ -366,6 +371,12 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
 {
 	return 0;
 }
+static inline
+struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd)
+{
+	return NULL;
+}
+static inline void em_table_free(struct em_perf_table __rcu *table) {}
 #endif
 
 #endif
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index c03010084208..ffe94614f004 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -114,13 +114,36 @@ static void em_destroy_table_rcu(struct rcu_head *rp)
 	kfree(table);
 }
 
-static void em_free_table(struct em_perf_table __rcu *table)
+static void em_release_table_kref(struct kref *kref)
 {
+	struct em_perf_table __rcu *table;
+
+	/* It was the last owner of this table so we can free */
+	table = container_of(kref, struct em_perf_table, kref);
+
 	call_rcu(&table->rcu, em_destroy_table_rcu);
 }
 
-static struct em_perf_table __rcu *
-em_allocate_table(struct em_perf_domain *pd)
+/**
+ * em_table_free() - Handles safe free of the EM table when needed
+ * @table : EM table which is going to be freed
+ *
+ * No return values.
+ */
+void em_table_free(struct em_perf_table __rcu *table)
+{
+	kref_put(&table->kref, em_release_table_kref);
+}
+
+/**
+ * em_table_alloc() - Allocate a new EM table
+ * @pd		: EM performance domain for which this must be done
+ *
+ * Allocate a new EM table and initialize its kref to indicate that it
+ * has a user.
+ * Returns allocated table or NULL.
+ */
+struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd)
 {
 	struct em_perf_table __rcu *table;
 	int table_size;
@@ -128,6 +151,11 @@ em_allocate_table(struct em_perf_domain *pd)
 	table_size = sizeof(struct em_perf_state) * pd->nr_perf_states;
 
 	table = kzalloc(sizeof(*table) + table_size, GFP_KERNEL);
+	if (!table)
+		return NULL;
+
+	kref_init(&table->kref);
+
 	return table;
 }
 
@@ -186,7 +214,7 @@ static int em_create_runtime_table(struct em_perf_domain *pd)
 	struct em_perf_table __rcu *table;
 	int table_size;
 
-	table = em_allocate_table(pd);
+	table = em_table_alloc(pd);
 	if (!table)
 		return -ENOMEM;
 
@@ -512,7 +540,7 @@ void em_dev_unregister_perf_domain(struct device *dev)
 
 	kfree(dev->em_pd->table);
 
-	em_free_table(dev->em_pd->em_table);
+	em_table_free(dev->em_pd->em_table);
 
 	kfree(dev->em_pd);
 	dev->em_pd = NULL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 11/23] PM: EM: Introduce em_dev_update_perf_domain() for EM updates
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (9 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 10/23] PM: EM: Add functions for memory allocations for new EM tables Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 12/23] PM: EM: Add em_perf_state_from_pd() to get performance states table Lukasz Luba
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Add API function em_dev_update_perf_domain() which allows the EM to be
changed safely.

Concurrent updaters are serialized with a mutex and the removal of memory
that will not be used any more is carried out with the help of RCU.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h |  8 +++++++
 kernel/power/energy_model.c  | 44 ++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index e44c5080407f..494df6942cf7 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -183,6 +183,8 @@ struct em_data_callback {
 
 struct em_perf_domain *em_cpu_get(int cpu);
 struct em_perf_domain *em_pd_get(struct device *dev);
+int em_dev_update_perf_domain(struct device *dev,
+			      struct em_perf_table __rcu *new_table);
 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 				struct em_data_callback *cb, cpumask_t *span,
 				bool microwatts);
@@ -377,6 +379,12 @@ struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd)
 	return NULL;
 }
 static inline void em_table_free(struct em_perf_table __rcu *table) {}
+static inline
+int em_dev_update_perf_domain(struct device *dev,
+			      struct em_perf_table __rcu *new_table)
+{
+	return -EINVAL;
+}
 #endif
 
 #endif
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index ffe94614f004..190042640935 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -209,6 +209,50 @@ static int em_allocate_perf_table(struct em_perf_domain *pd,
 	return 0;
 }
 
+/**
+ * em_dev_update_perf_domain() - Update runtime EM table for a device
+ * @dev		: Device for which the EM is to be updated
+ * @new_table	: The new EM table that is going to be used from now
+ *
+ * Update EM runtime modifiable table for the @dev using the provided @table.
+ *
+ * This function uses a mutex to serialize writers, so it must not be called
+ * from a non-sleeping context.
+ *
+ * Return 0 on success or an error code on failure.
+ */
+int em_dev_update_perf_domain(struct device *dev,
+			      struct em_perf_table __rcu *new_table)
+{
+	struct em_perf_table __rcu *old_table;
+	struct em_perf_domain *pd;
+
+	if (!dev)
+		return -EINVAL;
+
+	/* Serialize update/unregister or concurrent updates */
+	mutex_lock(&em_pd_mutex);
+
+	if (!dev->em_pd) {
+		mutex_unlock(&em_pd_mutex);
+		return -EINVAL;
+	}
+	pd = dev->em_pd;
+
+	kref_get(&new_table->kref);
+
+	old_table = pd->em_table;
+	rcu_assign_pointer(pd->em_table, new_table);
+
+	em_cpufreq_update_efficiencies(dev, new_table->state);
+
+	em_table_free(old_table);
+
+	mutex_unlock(&em_pd_mutex);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(em_dev_update_perf_domain);
+
 static int em_create_runtime_table(struct em_perf_domain *pd)
 {
 	struct em_perf_table __rcu *table;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 12/23] PM: EM: Add em_perf_state_from_pd() to get performance states table
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (10 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 11/23] PM: EM: Introduce em_dev_update_perf_domain() for EM updates Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-29 18:13   ` Dietmar Eggemann
  2024-01-17  9:57 ` [PATCH v7 13/23] PM: EM: Add performance field to struct em_perf_state and optimize Lukasz Luba
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Introduce a wrapper to get the performance states table of the performance
domain. The function should be called within the RCU read critical
section.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 494df6942cf7..5ebe9dbec8e1 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -339,6 +339,23 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
 	return pd->nr_perf_states;
 }
 
+/**
+ * em_perf_state_from_pd() - Get the performance states table of perf.
+ *				domain
+ * @pd		: performance domain for which this must be done
+ *
+ * To use this function the rcu_read_lock() should be hold. After the usage
+ * of the performance states table is finished, the rcu_read_unlock() should
+ * be called.
+ *
+ * Return: the pointer to performance states table of the performance domain
+ */
+static inline
+struct em_perf_state *em_perf_state_from_pd(struct em_perf_domain *pd)
+{
+	return rcu_dereference(pd->em_table)->state;
+}
+
 #else
 struct em_data_callback {};
 #define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) { }
@@ -385,6 +402,11 @@ int em_dev_update_perf_domain(struct device *dev,
 {
 	return -EINVAL;
 }
+static inline
+struct em_perf_state *em_perf_state_from_pd(struct em_perf_domain *pd)
+{
+	return NULL;
+}
 #endif
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 13/23] PM: EM: Add performance field to struct em_perf_state and optimize
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (11 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 12/23] PM: EM: Add em_perf_state_from_pd() to get performance states table Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-29 18:13   ` Dietmar Eggemann
  2024-01-17  9:57 ` [PATCH v7 14/23] PM: EM: Support late CPUs booting and capacity adjustment Lukasz Luba
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

The performance doesn't scale linearly with the frequency. Also, it may
be different in different workloads. Some CPUs are designed to be
particularly good at some applications e.g. images or video processing
and other CPUs in different. When those different types of CPUs are
combined in one SoC they should be properly modeled to get max of the HW
in Energy Aware Scheduler (EAS). The Energy Model (EM) provides the
power vs. performance curves to the EAS, but assumes the CPUs capacity
is fixed and scales linearly with the frequency. This patch allows to
adjust the curve on the 'performance' axis as well.

Code speed optimization:
Removing map_util_freq() allows to avoid one division and one
multiplication operations from the EAS hot code path.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 24 ++++++++++++------------
 kernel/power/energy_model.c  | 27 +++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 5ebe9dbec8e1..689d71f6b56f 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -13,6 +13,7 @@
 
 /**
  * struct em_perf_state - Performance state of a performance domain
+ * @performance:	CPU performance (capacity) at a given frequency
  * @frequency:	The frequency in KHz, for consistency with CPUFreq
  * @power:	The power consumed at this level (by 1 CPU or by a registered
  *		device). It can be a total power: static and dynamic.
@@ -21,6 +22,7 @@
  * @flags:	see "em_perf_state flags" description below.
  */
 struct em_perf_state {
+	unsigned long performance;
 	unsigned long frequency;
 	unsigned long power;
 	unsigned long cost;
@@ -196,25 +198,25 @@ void em_table_free(struct em_perf_table __rcu *table);
  * em_pd_get_efficient_state() - Get an efficient performance state from the EM
  * @table:		List of performance states, in ascending order
  * @nr_perf_states:	Number of performance states
- * @freq:		Frequency to map with the EM
+ * @max_util:		Max utilization to map with the EM
  * @pd_flags:		Performance Domain flags
  *
  * It is called from the scheduler code quite frequently and as a consequence
  * doesn't implement any check.
  *
- * Return: An efficient performance state id, high enough to meet @freq
+ * Return: An efficient performance state id, high enough to meet @max_util
  * requirement.
  */
 static inline int
 em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states,
-			  unsigned long freq, unsigned long pd_flags)
+			  unsigned long max_util, unsigned long pd_flags)
 {
 	struct em_perf_state *ps;
 	int i;
 
 	for (i = 0; i < nr_perf_states; i++) {
 		ps = &table[i];
-		if (ps->frequency >= freq) {
+		if (ps->performance >= max_util) {
 			if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
 			    ps->flags & EM_PERF_STATE_INEFFICIENT)
 				continue;
@@ -245,9 +247,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 				unsigned long max_util, unsigned long sum_util,
 				unsigned long allowed_cpu_cap)
 {
-	unsigned long freq, ref_freq, scale_cpu;
 	struct em_perf_table *em_table;
 	struct em_perf_state *ps;
+	unsigned long scale_cpu;
 	int cpu, i;
 
 #ifdef CONFIG_SCHED_DEBUG
@@ -260,26 +262,24 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	/*
 	 * In order to predict the performance state, map the utilization of
 	 * the most utilized CPU of the performance domain to a requested
-	 * frequency, like schedutil. Take also into account that the real
-	 * frequency might be set lower (due to thermal capping). Thus, clamp
+	 * performance, like schedutil. Take also into account that the real
+	 * performance might be set lower (due to thermal capping). Thus, clamp
 	 * max utilization to the allowed CPU capacity before calculating
-	 * effective frequency.
+	 * effective performance.
 	 */
 	cpu = cpumask_first(to_cpumask(pd->cpus));
 	scale_cpu = arch_scale_cpu_capacity(cpu);
-	ref_freq = arch_scale_freq_ref(cpu);
 
 	max_util = map_util_perf(max_util);
 	max_util = min(max_util, allowed_cpu_cap);
-	freq = map_util_freq(max_util, ref_freq, scale_cpu);
 
 	/*
 	 * Find the lowest performance state of the Energy Model above the
-	 * requested frequency.
+	 * requested performance.
 	 */
 	em_table = rcu_dereference(pd->em_table);
 	i = em_pd_get_efficient_state(em_table->state, pd->nr_perf_states,
-				      freq, pd->flags);
+				      max_util, pd->flags);
 	ps = &em_table->state[i];
 
 	/*
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 190042640935..2a817b92804b 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -46,6 +46,7 @@ static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
 	debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
 	debugfs_create_ulong("power", 0444, d, &ps->power);
 	debugfs_create_ulong("cost", 0444, d, &ps->cost);
+	debugfs_create_ulong("performance", 0444, d, &ps->performance);
 	debugfs_create_ulong("inefficient", 0444, d, &ps->flags);
 }
 
@@ -159,6 +160,30 @@ struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd)
 	return table;
 }
 
+static void em_init_performance(struct device *dev, struct em_perf_domain *pd,
+				struct em_perf_state *table, int nr_states)
+{
+	u64 fmax, max_cap;
+	int i, cpu;
+
+	/* This is needed only for CPUs and EAS skip other devices */
+	if (!_is_cpu_device(dev))
+		return;
+
+	cpu = cpumask_first(em_span_cpus(pd));
+
+	/*
+	 * Calculate the performance value for each frequency with
+	 * linear relationship. The final CPU capacity might not be ready at
+	 * boot time, but the EM will be updated a bit later with correct one.
+	 */
+	fmax = (u64) table[nr_states - 1].frequency;
+	max_cap = (u64) arch_scale_cpu_capacity(cpu);
+	for (i = 0; i < nr_states; i++)
+		table[i].performance = div64_u64(max_cap * table[i].frequency,
+						 fmax);
+}
+
 static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 			    struct em_data_callback *cb, int nr_states,
 			    unsigned long flags)
@@ -318,6 +343,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 		table[i].frequency = prev_freq = freq;
 	}
 
+	em_init_performance(dev, pd, table, nr_states);
+
 	ret = em_compute_costs(dev, table, cb, nr_states, flags);
 	if (ret)
 		return -EINVAL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 14/23] PM: EM: Support late CPUs booting and capacity adjustment
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (12 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 13/23] PM: EM: Add performance field to struct em_perf_state and optimize Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 15/23] PM: EM: Optimize em_cpu_energy() and remove division Lukasz Luba
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

The patch adds needed infrastructure to handle the late CPUs boot, which
might change the previous CPUs capacity values. With this changes the new
CPUs which try to register EM will trigger the needed re-calculations for
other CPUs EMs. Thanks to that the em_per_state::performance values will
be aligned with the CPU capacity information after all CPUs finish the
boot and EM registrations.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/power/energy_model.c | 124 ++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 2a817b92804b..548d54e55b08 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -25,6 +25,9 @@ static DEFINE_MUTEX(em_pd_mutex);
 
 static void em_cpufreq_update_efficiencies(struct device *dev,
 					   struct em_perf_state *table);
+static void em_check_capacity_update(void);
+static void em_update_workfn(struct work_struct *work);
+static DECLARE_DELAYED_WORK(em_update_work, em_update_workfn);
 
 static bool _is_cpu_device(struct device *dev)
 {
@@ -583,6 +586,10 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 
 unlock:
 	mutex_unlock(&em_pd_mutex);
+
+	if (_is_cpu_device(dev))
+		em_check_capacity_update();
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
@@ -618,3 +625,120 @@ void em_dev_unregister_perf_domain(struct device *dev)
 	mutex_unlock(&em_pd_mutex);
 }
 EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain);
+
+/*
+ * Adjustment of CPU performance values after boot, when all CPUs capacites
+ * are correctly calculated.
+ */
+static void em_adjust_new_capacity(struct device *dev,
+				   struct em_perf_domain *pd,
+				   u64 max_cap)
+{
+	struct em_perf_state *table, *new_table;
+	struct em_perf_table __rcu *em_table;
+	int ret, table_size;
+
+	em_table = em_table_alloc(pd);
+	if (!em_table) {
+		dev_warn(dev, "EM: allocation failed\n");
+		return;
+	}
+
+	new_table = em_table->state;
+
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
+	/* Initialize data based on old table */
+	table_size = sizeof(struct em_perf_state) * pd->nr_perf_states;
+	memcpy(new_table, table, table_size);
+
+	rcu_read_unlock();
+
+	em_init_performance(dev, pd, new_table, pd->nr_perf_states);
+	ret = em_compute_costs(dev, new_table, NULL, pd->nr_perf_states,
+			       pd->flags);
+	if (ret) {
+		dev_warn(dev, "EM: compute costs failed\n");
+		return;
+	}
+
+	ret = em_dev_update_perf_domain(dev, em_table);
+	if (ret)
+		dev_warn(dev, "EM: update failed %d\n", ret);
+
+	/*
+	 * This is one-time-update, so give up the ownership in this updater.
+	 * The EM framework has incremented the usage counter and from now
+	 * will keep the reference (then free the memory when needed).
+	 */
+	em_table_free(em_table);
+}
+
+static void em_check_capacity_update(void)
+{
+	cpumask_var_t cpu_done_mask;
+	struct em_perf_state *table;
+	struct em_perf_domain *pd;
+	unsigned long cpu_capacity;
+	int cpu;
+
+	if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) {
+		pr_warn("no free memory\n");
+		return;
+	}
+
+	/* Check if CPUs capacity has changed than update EM */
+	for_each_possible_cpu(cpu) {
+		struct cpufreq_policy *policy;
+		unsigned long em_max_perf;
+		struct device *dev;
+		int nr_states;
+
+		if (cpumask_test_cpu(cpu, cpu_done_mask))
+			continue;
+
+		policy = cpufreq_cpu_get(cpu);
+		if (!policy) {
+			pr_debug("Accessing cpu%d policy failed\n", cpu);
+			schedule_delayed_work(&em_update_work,
+					      msecs_to_jiffies(1000));
+			break;
+		}
+		cpufreq_cpu_put(policy);
+
+		pd = em_cpu_get(cpu);
+		if (!pd || em_is_artificial(pd))
+			continue;
+
+		cpumask_or(cpu_done_mask, cpu_done_mask,
+			   em_span_cpus(pd));
+
+		nr_states = pd->nr_perf_states;
+		cpu_capacity = arch_scale_cpu_capacity(cpu);
+
+		rcu_read_lock();
+		table = em_perf_state_from_pd(pd);
+		em_max_perf = table[pd->nr_perf_states - 1].performance;
+		rcu_read_unlock();
+
+		/*
+		 * Check if the CPU capacity has been adjusted during boot
+		 * and trigger the update for new performance values.
+		 */
+		if (em_max_perf == cpu_capacity)
+			continue;
+
+		pr_debug("updating cpu%d cpu_cap=%lu old capacity=%lu\n",
+			 cpu, cpu_capacity, em_max_perf);
+
+		dev = get_cpu_device(cpu);
+		em_adjust_new_capacity(dev, pd, cpu_capacity);
+	}
+
+	free_cpumask_var(cpu_done_mask);
+}
+
+static void em_update_workfn(struct work_struct *work)
+{
+	em_check_capacity_update();
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 15/23] PM: EM: Optimize em_cpu_energy() and remove division
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (13 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 14/23] PM: EM: Support late CPUs booting and capacity adjustment Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-02-07 11:40   ` Hongyan Xia
  2024-01-17  9:57 ` [PATCH v7 16/23] powercap/dtpm_cpu: Use new Energy Model interface to get table Lukasz Luba
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

The Energy Model (EM) can be modified at runtime which brings new
possibilities. The em_cpu_energy() is called by the Energy Aware Scheduler
(EAS) in its hot path. The energy calculation uses power value for
a given performance state (ps) and the CPU busy time as percentage for that
given frequency.

It is possible to avoid the division by 'scale_cpu' at runtime, because
EM is updated whenever new max capacity CPU is set in the system.

Use that feature and do the needed division during the calculation of the
coefficient 'ps->cost'. That enhanced 'ps->cost' value can be then just
multiplied simply by utilization:

pd_nrg = ps->cost * \Sum cpu_util

to get the needed energy for whole Performance Domain (PD).

With this optimization and earlier removal of map_util_freq(), the
em_cpu_energy() should run faster on the Big CPU by 1.43x and on the Little
CPU by 1.69x (RockPi 4B board).

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 54 ++++++++++--------------------------
 kernel/power/energy_model.c  |  7 ++---
 2 files changed, 17 insertions(+), 44 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 689d71f6b56f..aabfc26fcd31 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -115,27 +115,6 @@ struct em_perf_domain {
 #define EM_MAX_NUM_CPUS 16
 #endif
 
-/*
- * To avoid an overflow on 32bit machines while calculating the energy
- * use a different order in the operation. First divide by the 'cpu_scale'
- * which would reduce big value stored in the 'cost' field, then multiply by
- * the 'sum_util'. This would allow to handle existing platforms, which have
- * e.g. power ~1.3 Watt at max freq, so the 'cost' value > 1mln micro-Watts.
- * In such scenario, where there are 4 CPUs in the Perf. Domain the 'sum_util'
- * could be 4096, then multiplication: 'cost' * 'sum_util'  would overflow.
- * This reordering of operations has some limitations, we lose small
- * precision in the estimation (comparing to 64bit platform w/o reordering).
- *
- * We are safe on 64bit machine.
- */
-#ifdef CONFIG_64BIT
-#define em_estimate_energy(cost, sum_util, scale_cpu) \
-	(((cost) * (sum_util)) / (scale_cpu))
-#else
-#define em_estimate_energy(cost, sum_util, scale_cpu) \
-	(((cost) / (scale_cpu)) * (sum_util))
-#endif
-
 struct em_data_callback {
 	/**
 	 * active_power() - Provide power at the next performance state of
@@ -249,8 +228,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 {
 	struct em_perf_table *em_table;
 	struct em_perf_state *ps;
-	unsigned long scale_cpu;
-	int cpu, i;
+	int i;
 
 #ifdef CONFIG_SCHED_DEBUG
 	WARN_ONCE(!rcu_read_lock_held(), "EM: rcu read lock needed\n");
@@ -267,9 +245,6 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 * max utilization to the allowed CPU capacity before calculating
 	 * effective performance.
 	 */
-	cpu = cpumask_first(to_cpumask(pd->cpus));
-	scale_cpu = arch_scale_cpu_capacity(cpu);
-
 	max_util = map_util_perf(max_util);
 	max_util = min(max_util, allowed_cpu_cap);
 
@@ -283,12 +258,12 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	ps = &em_table->state[i];
 
 	/*
-	 * The capacity of a CPU in the domain at the performance state (ps)
-	 * can be computed as:
+	 * The performance (capacity) of a CPU in the domain at the performance
+	 * state (ps) can be computed as:
 	 *
-	 *             ps->freq * scale_cpu
-	 *   ps->cap = --------------------                          (1)
-	 *                 cpu_max_freq
+	 *                     ps->freq * scale_cpu
+	 *   ps->performance = --------------------                  (1)
+	 *                         cpu_max_freq
 	 *
 	 * So, ignoring the costs of idle states (which are not available in
 	 * the EM), the energy consumed by this CPU at that performance state
@@ -296,9 +271,10 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 *
 	 *             ps->power * cpu_util
 	 *   cpu_nrg = --------------------                          (2)
-	 *                   ps->cap
+	 *               ps->performance
 	 *
-	 * since 'cpu_util / ps->cap' represents its percentage of busy time.
+	 * since 'cpu_util / ps->performance' represents its percentage of busy
+	 * time.
 	 *
 	 *   NOTE: Although the result of this computation actually is in
 	 *         units of power, it can be manipulated as an energy value
@@ -308,9 +284,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 * By injecting (1) in (2), 'cpu_nrg' can be re-expressed as a product
 	 * of two terms:
 	 *
-	 *             ps->power * cpu_max_freq   cpu_util
-	 *   cpu_nrg = ------------------------ * ---------          (3)
-	 *                    ps->freq            scale_cpu
+	 *             ps->power * cpu_max_freq
+	 *   cpu_nrg = ------------------------ * cpu_util           (3)
+	 *               ps->freq * scale_cpu
 	 *
 	 * The first term is static, and is stored in the em_perf_state struct
 	 * as 'ps->cost'.
@@ -320,11 +296,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 * total energy of the domain (which is the simple sum of the energy of
 	 * all of its CPUs) can be factorized as:
 	 *
-	 *            ps->cost * \Sum cpu_util
-	 *   pd_nrg = ------------------------                       (4)
-	 *                  scale_cpu
+	 *   pd_nrg = ps->cost * \Sum cpu_util                       (4)
 	 */
-	return em_estimate_energy(ps->cost, sum_util, scale_cpu);
+	return ps->cost * sum_util;
 }
 
 /**
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 548d54e55b08..4529a0469353 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -192,11 +192,9 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 			    unsigned long flags)
 {
 	unsigned long prev_cost = ULONG_MAX;
-	u64 fmax;
 	int i, ret;
 
 	/* Compute the cost of each performance state. */
-	fmax = (u64) table[nr_states - 1].frequency;
 	for (i = nr_states - 1; i >= 0; i--) {
 		unsigned long power_res, cost;
 
@@ -208,8 +206,9 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 				return -EINVAL;
 			}
 		} else {
-			power_res = table[i].power;
-			cost = div64_u64(fmax * power_res, table[i].frequency);
+			/* increase resolution of 'cost' precision */
+			power_res = table[i].power * 10;
+			cost = power_res / table[i].performance;
 		}
 
 		table[i].cost = cost;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 16/23] powercap/dtpm_cpu: Use new Energy Model interface to get table
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (14 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 15/23] PM: EM: Optimize em_cpu_energy() and remove division Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-29 18:14   ` Dietmar Eggemann
  2024-01-17  9:57 ` [PATCH v7 17/23] powercap/dtpm_devfreq: " Lukasz Luba
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Energy Model framework support modifications at runtime of the power
values. Use the new EM table API which is protected with RCU. Align the
code so that this RCU read section is short.

This change is not expected to alter the general functionality.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/powercap/dtpm_cpu.c | 39 +++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
index 9193c3b8edeb..ee0d1aa3e023 100644
--- a/drivers/powercap/dtpm_cpu.c
+++ b/drivers/powercap/dtpm_cpu.c
@@ -42,6 +42,7 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
 {
 	struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm);
 	struct em_perf_domain *pd = em_cpu_get(dtpm_cpu->cpu);
+	struct em_perf_state *table;
 	struct cpumask cpus;
 	unsigned long freq;
 	u64 power;
@@ -50,20 +51,22 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
 	cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus));
 	nr_cpus = cpumask_weight(&cpus);
 
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
 	for (i = 0; i < pd->nr_perf_states; i++) {
 
-		power = pd->table[i].power * nr_cpus;
+		power = table[i].power * nr_cpus;
 
 		if (power > power_limit)
 			break;
 	}
 
-	freq = pd->table[i - 1].frequency;
+	freq = table[i - 1].frequency;
+	power_limit = table[i - 1].power * nr_cpus;
+	rcu_read_unlock();
 
 	freq_qos_update_request(&dtpm_cpu->qos_req, freq);
 
-	power_limit = pd->table[i - 1].power * nr_cpus;
-
 	return power_limit;
 }
 
@@ -87,9 +90,11 @@ static u64 scale_pd_power_uw(struct cpumask *pd_mask, u64 power)
 static u64 get_pd_power_uw(struct dtpm *dtpm)
 {
 	struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm);
+	struct em_perf_state *table;
 	struct em_perf_domain *pd;
 	struct cpumask *pd_mask;
 	unsigned long freq;
+	u64 power = 0;
 	int i;
 
 	pd = em_cpu_get(dtpm_cpu->cpu);
@@ -98,33 +103,43 @@ static u64 get_pd_power_uw(struct dtpm *dtpm)
 
 	freq = cpufreq_quick_get(dtpm_cpu->cpu);
 
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
 	for (i = 0; i < pd->nr_perf_states; i++) {
 
-		if (pd->table[i].frequency < freq)
+		if (table[i].frequency < freq)
 			continue;
 
-		return scale_pd_power_uw(pd_mask, pd->table[i].power);
+		power = scale_pd_power_uw(pd_mask, table[i].power);
+		break;
 	}
+	rcu_read_unlock();
 
-	return 0;
+	return power;
 }
 
 static int update_pd_power_uw(struct dtpm *dtpm)
 {
 	struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm);
 	struct em_perf_domain *em = em_cpu_get(dtpm_cpu->cpu);
+	struct em_perf_state *table;
 	struct cpumask cpus;
 	int nr_cpus;
 
 	cpumask_and(&cpus, cpu_online_mask, to_cpumask(em->cpus));
 	nr_cpus = cpumask_weight(&cpus);
 
-	dtpm->power_min = em->table[0].power;
+	rcu_read_lock();
+	table = em_perf_state_from_pd(em);
+
+	dtpm->power_min = table[0].power;
 	dtpm->power_min *= nr_cpus;
 
-	dtpm->power_max = em->table[em->nr_perf_states - 1].power;
+	dtpm->power_max = table[em->nr_perf_states - 1].power;
 	dtpm->power_max *= nr_cpus;
 
+	rcu_read_unlock();
+
 	return 0;
 }
 
@@ -180,6 +195,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent)
 {
 	struct dtpm_cpu *dtpm_cpu;
 	struct cpufreq_policy *policy;
+	struct em_perf_state *table;
 	struct em_perf_domain *pd;
 	char name[CPUFREQ_NAME_LEN];
 	int ret = -ENOMEM;
@@ -216,9 +232,12 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent)
 	if (ret)
 		goto out_kfree_dtpm_cpu;
 
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
 	ret = freq_qos_add_request(&policy->constraints,
 				   &dtpm_cpu->qos_req, FREQ_QOS_MAX,
-				   pd->table[pd->nr_perf_states - 1].frequency);
+				   table[pd->nr_perf_states - 1].frequency);
+	rcu_read_unlock();
 	if (ret)
 		goto out_dtpm_unregister;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 17/23] powercap/dtpm_devfreq: Use new Energy Model interface to get table
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (15 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 16/23] powercap/dtpm_cpu: Use new Energy Model interface to get table Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 18/23] drivers/thermal/cpufreq_cooling: Use new Energy Model interface Lukasz Luba
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Energy Model framework support modifications at runtime of the power
values. Use the new EM table API which is protected with RCU. Align the
code so that this RCU read section is short.

This change is not expected to alter the general functionality.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/powercap/dtpm_devfreq.c | 34 ++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/powercap/dtpm_devfreq.c b/drivers/powercap/dtpm_devfreq.c
index 612c3b59dd5b..f40bce8176df 100644
--- a/drivers/powercap/dtpm_devfreq.c
+++ b/drivers/powercap/dtpm_devfreq.c
@@ -37,11 +37,16 @@ static int update_pd_power_uw(struct dtpm *dtpm)
 	struct devfreq *devfreq = dtpm_devfreq->devfreq;
 	struct device *dev = devfreq->dev.parent;
 	struct em_perf_domain *pd = em_pd_get(dev);
+	struct em_perf_state *table;
 
-	dtpm->power_min = pd->table[0].power;
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
 
-	dtpm->power_max = pd->table[pd->nr_perf_states - 1].power;
+	dtpm->power_min = table[0].power;
 
+	dtpm->power_max = table[pd->nr_perf_states - 1].power;
+
+	rcu_read_unlock();
 	return 0;
 }
 
@@ -51,20 +56,23 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
 	struct devfreq *devfreq = dtpm_devfreq->devfreq;
 	struct device *dev = devfreq->dev.parent;
 	struct em_perf_domain *pd = em_pd_get(dev);
+	struct em_perf_state *table;
 	unsigned long freq;
 	int i;
 
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
 	for (i = 0; i < pd->nr_perf_states; i++) {
-		if (pd->table[i].power > power_limit)
+		if (table[i].power > power_limit)
 			break;
 	}
 
-	freq = pd->table[i - 1].frequency;
+	freq = table[i - 1].frequency;
+	power_limit = table[i - 1].power;
+	rcu_read_unlock();
 
 	dev_pm_qos_update_request(&dtpm_devfreq->qos_req, freq);
 
-	power_limit = pd->table[i - 1].power;
-
 	return power_limit;
 }
 
@@ -89,8 +97,9 @@ static u64 get_pd_power_uw(struct dtpm *dtpm)
 	struct device *dev = devfreq->dev.parent;
 	struct em_perf_domain *pd = em_pd_get(dev);
 	struct devfreq_dev_status status;
+	struct em_perf_state *table;
 	unsigned long freq;
-	u64 power;
+	u64 power = 0;
 	int i;
 
 	mutex_lock(&devfreq->lock);
@@ -100,19 +109,22 @@ static u64 get_pd_power_uw(struct dtpm *dtpm)
 	freq = DIV_ROUND_UP(status.current_frequency, HZ_PER_KHZ);
 	_normalize_load(&status);
 
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
 	for (i = 0; i < pd->nr_perf_states; i++) {
 
-		if (pd->table[i].frequency < freq)
+		if (table[i].frequency < freq)
 			continue;
 
-		power = pd->table[i].power;
+		power = table[i].power;
 		power *= status.busy_time;
 		power >>= 10;
 
-		return power;
+		break;
 	}
+	rcu_read_unlock();
 
-	return 0;
+	return power;
 }
 
 static void pd_release(struct dtpm *dtpm)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 18/23] drivers/thermal/cpufreq_cooling: Use new Energy Model interface
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (16 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 17/23] powercap/dtpm_devfreq: " Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 19/23] drivers/thermal/devfreq_cooling: " Lukasz Luba
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Energy Model framework support modifications at runtime of the power
values. Use the new EM table which is protected with RCU. Align the
code so that this RCU read section is short.

This change is not expected to alter the general functionality.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/cpufreq_cooling.c | 45 +++++++++++++++++++++++++------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index e2cc7bd30862..9d1b1459700d 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -91,12 +91,16 @@ struct cpufreq_cooling_device {
 static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev,
 			       unsigned int freq)
 {
+	struct em_perf_state *table;
 	int i;
 
+	rcu_read_lock();
+	table = em_perf_state_from_pd(cpufreq_cdev->em);
 	for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) {
-		if (freq > cpufreq_cdev->em->table[i].frequency)
+		if (freq > table[i].frequency)
 			break;
 	}
+	rcu_read_unlock();
 
 	return cpufreq_cdev->max_level - i - 1;
 }
@@ -104,16 +108,20 @@ static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev,
 static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_cdev,
 			     u32 freq)
 {
+	struct em_perf_state *table;
 	unsigned long power_mw;
 	int i;
 
+	rcu_read_lock();
+	table = em_perf_state_from_pd(cpufreq_cdev->em);
 	for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) {
-		if (freq > cpufreq_cdev->em->table[i].frequency)
+		if (freq > table[i].frequency)
 			break;
 	}
 
-	power_mw = cpufreq_cdev->em->table[i + 1].power;
+	power_mw = table[i + 1].power;
 	power_mw /= MICROWATT_PER_MILLIWATT;
+	rcu_read_unlock();
 
 	return power_mw;
 }
@@ -121,18 +129,24 @@ static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_cdev,
 static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
 			     u32 power)
 {
+	struct em_perf_state *table;
 	unsigned long em_power_mw;
+	u32 freq;
 	int i;
 
+	rcu_read_lock();
+	table = em_perf_state_from_pd(cpufreq_cdev->em);
 	for (i = cpufreq_cdev->max_level; i > 0; i--) {
 		/* Convert EM power to milli-Watts to make safe comparison */
-		em_power_mw = cpufreq_cdev->em->table[i].power;
+		em_power_mw = table[i].power;
 		em_power_mw /= MICROWATT_PER_MILLIWATT;
 		if (power >= em_power_mw)
 			break;
 	}
+	freq = table[i].frequency;
+	rcu_read_unlock();
 
-	return cpufreq_cdev->em->table[i].frequency;
+	return freq;
 }
 
 /**
@@ -262,8 +276,9 @@ static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev,
 static int cpufreq_state2power(struct thermal_cooling_device *cdev,
 			       unsigned long state, u32 *power)
 {
-	unsigned int freq, num_cpus, idx;
 	struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
+	unsigned int freq, num_cpus, idx;
+	struct em_perf_state *table;
 
 	/* Request state should be less than max_level */
 	if (state > cpufreq_cdev->max_level)
@@ -272,7 +287,12 @@ static int cpufreq_state2power(struct thermal_cooling_device *cdev,
 	num_cpus = cpumask_weight(cpufreq_cdev->policy->cpus);
 
 	idx = cpufreq_cdev->max_level - state;
-	freq = cpufreq_cdev->em->table[idx].frequency;
+
+	rcu_read_lock();
+	table = em_perf_state_from_pd(cpufreq_cdev->em);
+	freq = table[idx].frequency;
+	rcu_read_unlock();
+
 	*power = cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus;
 
 	return 0;
@@ -378,8 +398,17 @@ static unsigned int get_state_freq(struct cpufreq_cooling_device *cpufreq_cdev,
 #ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR
 	/* Use the Energy Model table if available */
 	if (cpufreq_cdev->em) {
+		struct em_perf_state *table;
+		unsigned int freq;
+
 		idx = cpufreq_cdev->max_level - state;
-		return cpufreq_cdev->em->table[idx].frequency;
+
+		rcu_read_lock();
+		table = em_perf_state_from_pd(cpufreq_cdev->em);
+		freq = table[idx].frequency;
+		rcu_read_unlock();
+
+		return freq;
 	}
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 19/23] drivers/thermal/devfreq_cooling: Use new Energy Model interface
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (17 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 18/23] drivers/thermal/cpufreq_cooling: Use new Energy Model interface Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 20/23] PM: EM: Change debugfs configuration to use runtime EM table data Lukasz Luba
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Energy Model framework support modifications at runtime of the power
values. Use the new EM table which is protected with RCU. Align the
code so that this RCU read section is short.

This change is not expected to alter the general functionality.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/devfreq_cooling.c | 49 +++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index 262e62ab6cf2..50dec24e967a 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -87,6 +87,7 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev,
 	struct devfreq_cooling_device *dfc = cdev->devdata;
 	struct devfreq *df = dfc->devfreq;
 	struct device *dev = df->dev.parent;
+	struct em_perf_state *table;
 	unsigned long freq;
 	int perf_idx;
 
@@ -100,7 +101,11 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev,
 
 	if (dfc->em_pd) {
 		perf_idx = dfc->max_state - state;
-		freq = dfc->em_pd->table[perf_idx].frequency * 1000;
+
+		rcu_read_lock();
+		table = em_perf_state_from_pd(dfc->em_pd);
+		freq = table[perf_idx].frequency * 1000;
+		rcu_read_unlock();
 	} else {
 		freq = dfc->freq_table[state];
 	}
@@ -123,14 +128,21 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev,
  */
 static int get_perf_idx(struct em_perf_domain *em_pd, unsigned long freq)
 {
-	int i;
+	struct em_perf_state *table;
+	int i, idx = -EINVAL;
 
+	rcu_read_lock();
+	table = em_perf_state_from_pd(em_pd);
 	for (i = 0; i < em_pd->nr_perf_states; i++) {
-		if (em_pd->table[i].frequency == freq)
-			return i;
+		if (table[i].frequency != freq)
+			continue;
+
+		idx = i;
+		break;
 	}
+	rcu_read_unlock();
 
-	return -EINVAL;
+	return idx;
 }
 
 static unsigned long get_voltage(struct devfreq *df, unsigned long freq)
@@ -181,6 +193,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 	struct devfreq_cooling_device *dfc = cdev->devdata;
 	struct devfreq *df = dfc->devfreq;
 	struct devfreq_dev_status status;
+	struct em_perf_state *table;
 	unsigned long state;
 	unsigned long freq;
 	unsigned long voltage;
@@ -204,7 +217,11 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 			state = dfc->capped_state;
 
 			/* Convert EM power into milli-Watts first */
-			dfc->res_util = dfc->em_pd->table[state].power;
+			rcu_read_lock();
+			table = em_perf_state_from_pd(dfc->em_pd);
+			dfc->res_util = table[state].power;
+			rcu_read_unlock();
+
 			dfc->res_util /= MICROWATT_PER_MILLIWATT;
 
 			dfc->res_util *= SCALE_ERROR_MITIGATION;
@@ -225,7 +242,11 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 		_normalize_load(&status);
 
 		/* Convert EM power into milli-Watts first */
-		*power = dfc->em_pd->table[perf_idx].power;
+		rcu_read_lock();
+		table = em_perf_state_from_pd(dfc->em_pd);
+		*power = table[perf_idx].power;
+		rcu_read_unlock();
+
 		*power /= MICROWATT_PER_MILLIWATT;
 		/* Scale power for utilization */
 		*power *= status.busy_time;
@@ -245,13 +266,19 @@ static int devfreq_cooling_state2power(struct thermal_cooling_device *cdev,
 				       unsigned long state, u32 *power)
 {
 	struct devfreq_cooling_device *dfc = cdev->devdata;
+	struct em_perf_state *table;
 	int perf_idx;
 
 	if (state > dfc->max_state)
 		return -EINVAL;
 
 	perf_idx = dfc->max_state - state;
-	*power = dfc->em_pd->table[perf_idx].power;
+
+	rcu_read_lock();
+	table = em_perf_state_from_pd(dfc->em_pd);
+	*power = table[perf_idx].power;
+	rcu_read_unlock();
+
 	*power /= MICROWATT_PER_MILLIWATT;
 
 	return 0;
@@ -264,6 +291,7 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev,
 	struct devfreq *df = dfc->devfreq;
 	struct devfreq_dev_status status;
 	unsigned long freq, em_power_mw;
+	struct em_perf_state *table;
 	s32 est_power;
 	int i;
 
@@ -288,13 +316,16 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev,
 	 * Find the first cooling state that is within the power
 	 * budget. The EM power table is sorted ascending.
 	 */
+	rcu_read_lock();
+	table = em_perf_state_from_pd(dfc->em_pd);
 	for (i = dfc->max_state; i > 0; i--) {
 		/* Convert EM power to milli-Watts to make safe comparison */
-		em_power_mw = dfc->em_pd->table[i].power;
+		em_power_mw = table[i].power;
 		em_power_mw /= MICROWATT_PER_MILLIWATT;
 		if (est_power >= em_power_mw)
 			break;
 	}
+	rcu_read_unlock();
 
 	*state = dfc->max_state - i;
 	dfc->capped_state = *state;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 20/23] PM: EM: Change debugfs configuration to use runtime EM table data
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (18 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 19/23] drivers/thermal/devfreq_cooling: " Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 21/23] PM: EM: Remove old table Lukasz Luba
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Dump the runtime EM table values which can be modified in time. In order
to do that allocate chunk of debug memory which can be later freed
automatically thanks to devm_kcalloc().

This design can handle the fact that the EM table memory can change
after EM update, so debug code cannot use the pointer from initialization
phase.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/power/energy_model.c | 67 ++++++++++++++++++++++++++++++++-----
 1 file changed, 59 insertions(+), 8 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 4529a0469353..76aab2801bf0 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -37,20 +37,65 @@ static bool _is_cpu_device(struct device *dev)
 #ifdef CONFIG_DEBUG_FS
 static struct dentry *rootdir;
 
-static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
+struct em_dbg_info {
+	struct em_perf_domain *pd;
+	int ps_id;
+};
+
+#define DEFINE_EM_DBG_SHOW(name, fname)					\
+static int em_debug_##fname##_show(struct seq_file *s, void *unused)	\
+{									\
+	struct em_dbg_info *em_dbg = s->private;			\
+	struct em_perf_state *table;					\
+	unsigned long val;						\
+									\
+	rcu_read_lock();						\
+	table = em_perf_state_from_pd(em_dbg->pd);			\
+	val = table[em_dbg->ps_id].name;				\
+	rcu_read_unlock();						\
+									\
+	seq_printf(s, "%lu\n", val);					\
+	return 0;							\
+}									\
+DEFINE_SHOW_ATTRIBUTE(em_debug_##fname)
+
+DEFINE_EM_DBG_SHOW(frequency, frequency);
+DEFINE_EM_DBG_SHOW(power, power);
+DEFINE_EM_DBG_SHOW(cost, cost);
+DEFINE_EM_DBG_SHOW(performance, performance);
+DEFINE_EM_DBG_SHOW(flags, inefficiency);
+
+static void em_debug_create_ps(struct em_perf_domain *em_pd,
+			       struct em_dbg_info *em_dbg, int i,
+			       struct dentry *pd)
 {
+	struct em_perf_state *table;
+	unsigned long freq;
 	struct dentry *d;
 	char name[24];
 
-	snprintf(name, sizeof(name), "ps:%lu", ps->frequency);
+	em_dbg[i].pd = em_pd;
+	em_dbg[i].ps_id = i;
+
+	rcu_read_lock();
+	table = em_perf_state_from_pd(em_pd);
+	freq = table[i].frequency;
+	rcu_read_unlock();
+
+	snprintf(name, sizeof(name), "ps:%lu", freq);
 
 	/* Create per-ps directory */
 	d = debugfs_create_dir(name, pd);
-	debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
-	debugfs_create_ulong("power", 0444, d, &ps->power);
-	debugfs_create_ulong("cost", 0444, d, &ps->cost);
-	debugfs_create_ulong("performance", 0444, d, &ps->performance);
-	debugfs_create_ulong("inefficient", 0444, d, &ps->flags);
+	debugfs_create_file("frequency", 0444, d, &em_dbg[i],
+			    &em_debug_frequency_fops);
+	debugfs_create_file("power", 0444, d, &em_dbg[i],
+			    &em_debug_power_fops);
+	debugfs_create_file("cost", 0444, d, &em_dbg[i],
+			    &em_debug_cost_fops);
+	debugfs_create_file("performance", 0444, d, &em_dbg[i],
+			    &em_debug_performance_fops);
+	debugfs_create_file("inefficient", 0444, d, &em_dbg[i],
+			    &em_debug_inefficiency_fops);
 }
 
 static int em_debug_cpus_show(struct seq_file *s, void *unused)
@@ -73,6 +118,7 @@ DEFINE_SHOW_ATTRIBUTE(em_debug_flags);
 
 static void em_debug_create_pd(struct device *dev)
 {
+	struct em_dbg_info *em_dbg;
 	struct dentry *d;
 	int i;
 
@@ -86,9 +132,14 @@ static void em_debug_create_pd(struct device *dev)
 	debugfs_create_file("flags", 0444, d, dev->em_pd,
 			    &em_debug_flags_fops);
 
+	em_dbg = devm_kcalloc(dev, dev->em_pd->nr_perf_states,
+			      sizeof(*em_dbg), GFP_KERNEL);
+	if (!em_dbg)
+		return;
+
 	/* Create a sub-directory for each performance state */
 	for (i = 0; i < dev->em_pd->nr_perf_states; i++)
-		em_debug_create_ps(&dev->em_pd->table[i], d);
+		em_debug_create_ps(dev->em_pd, em_dbg, i, d);
 
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 21/23] PM: EM: Remove old table
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (19 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 20/23] PM: EM: Change debugfs configuration to use runtime EM table data Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-17  9:57 ` [PATCH v7 22/23] PM: EM: Add em_dev_compute_costs() Lukasz Luba
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Remove the old EM table which wasn't able to modify the data. Clean the
unneeded function and refactor the code a bit.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h |  2 --
 kernel/power/energy_model.c  | 46 ++++++------------------------------
 2 files changed, 7 insertions(+), 41 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index aabfc26fcd31..92866a81abe4 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -53,7 +53,6 @@ struct em_perf_table {
 
 /**
  * struct em_perf_domain - Performance domain
- * @table:		List of performance states, in ascending order
  * @em_table:		Pointer to the runtime modifiable em_perf_table
  * @nr_perf_states:	Number of performance states
  * @flags:		See "em_perf_domain flags"
@@ -69,7 +68,6 @@ struct em_perf_table {
  * field is unused.
  */
 struct em_perf_domain {
-	struct em_perf_state *table;
 	struct em_perf_table __rcu *em_table;
 	int nr_perf_states;
 	unsigned long flags;
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 76aab2801bf0..e91c8efb5361 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -276,17 +276,6 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 	return 0;
 }
 
-static int em_allocate_perf_table(struct em_perf_domain *pd,
-				  int nr_states)
-{
-	pd->table = kcalloc(nr_states, sizeof(struct em_perf_state),
-			    GFP_KERNEL);
-	if (!pd->table)
-		return -ENOMEM;
-
-	return 0;
-}
-
 /**
  * em_dev_update_perf_domain() - Update runtime EM table for a device
  * @dev		: Device for which the EM is to be updated
@@ -331,24 +320,6 @@ int em_dev_update_perf_domain(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(em_dev_update_perf_domain);
 
-static int em_create_runtime_table(struct em_perf_domain *pd)
-{
-	struct em_perf_table __rcu *table;
-	int table_size;
-
-	table = em_table_alloc(pd);
-	if (!table)
-		return -ENOMEM;
-
-	/* Initialize runtime table with existing data */
-	table_size = sizeof(struct em_perf_state) * pd->nr_perf_states;
-	memcpy(table->state, pd->table, table_size);
-
-	rcu_assign_pointer(pd->em_table, table);
-
-	return 0;
-}
-
 static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 				struct em_perf_state *table,
 				struct em_data_callback *cb,
@@ -409,6 +380,7 @@ static int em_create_pd(struct device *dev, int nr_states,
 			struct em_data_callback *cb, cpumask_t *cpus,
 			unsigned long flags)
 {
+	struct em_perf_table __rcu *em_table;
 	struct em_perf_domain *pd;
 	struct device *cpu_dev;
 	int cpu, ret, num_cpus;
@@ -435,17 +407,15 @@ static int em_create_pd(struct device *dev, int nr_states,
 
 	pd->nr_perf_states = nr_states;
 
-	ret = em_allocate_perf_table(pd, nr_states);
-	if (ret)
+	em_table = em_table_alloc(pd);
+	if (!em_table)
 		goto free_pd;
 
-	ret = em_create_perf_table(dev, pd, pd->table, cb, flags);
+	ret = em_create_perf_table(dev, pd, em_table->state, cb, flags);
 	if (ret)
 		goto free_pd_table;
 
-	ret = em_create_runtime_table(pd);
-	if (ret)
-		goto free_pd_table;
+	rcu_assign_pointer(pd->em_table, em_table);
 
 	if (_is_cpu_device(dev))
 		for_each_cpu(cpu, cpus) {
@@ -458,7 +428,7 @@ static int em_create_pd(struct device *dev, int nr_states,
 	return 0;
 
 free_pd_table:
-	kfree(pd->table);
+	kfree(em_table);
 free_pd:
 	kfree(pd);
 	return -EINVAL;
@@ -629,7 +599,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 
 	dev->em_pd->flags |= flags;
 
-	em_cpufreq_update_efficiencies(dev, dev->em_pd->table);
+	em_cpufreq_update_efficiencies(dev, dev->em_pd->em_table->state);
 
 	em_debug_create_pd(dev);
 	dev_info(dev, "EM: created perf domain\n");
@@ -666,8 +636,6 @@ void em_dev_unregister_perf_domain(struct device *dev)
 	mutex_lock(&em_pd_mutex);
 	em_debug_remove_pd(dev);
 
-	kfree(dev->em_pd->table);
-
 	em_table_free(dev->em_pd->em_table);
 
 	kfree(dev->em_pd);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 22/23] PM: EM: Add em_dev_compute_costs()
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (20 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 21/23] PM: EM: Remove old table Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-29 18:15   ` Dietmar Eggemann
  2024-01-17  9:57 ` [PATCH v7 23/23] Documentation: EM: Update with runtime modification design Lukasz Luba
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

The device drivers can modify EM at runtime by providing a new EM table.
The EM is used by the EAS and the em_perf_state::cost stores
pre-calculated value to avoid overhead. This patch provides the API for
device drivers to calculate the cost values properly (and not duplicate
the same code).

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h |  8 ++++++++
 kernel/power/energy_model.c  | 18 ++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 92866a81abe4..770755df852f 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -170,6 +170,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 void em_dev_unregister_perf_domain(struct device *dev);
 struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd);
 void em_table_free(struct em_perf_table __rcu *table);
+int em_dev_compute_costs(struct device *dev, struct em_perf_state *table,
+			 int nr_states);
 
 /**
  * em_pd_get_efficient_state() - Get an efficient performance state from the EM
@@ -379,6 +381,12 @@ struct em_perf_state *em_perf_state_from_pd(struct em_perf_domain *pd)
 {
 	return NULL;
 }
+static inline
+int em_dev_compute_costs(struct device *dev, struct em_perf_state *table,
+			 int nr_states)
+{
+	return -EINVAL;
+}
 #endif
 
 #endif
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index e91c8efb5361..104cc2e2aa84 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -276,6 +276,24 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 	return 0;
 }
 
+/**
+ * em_dev_compute_costs() - Calculate cost values for new runtime EM table
+ * @dev		: Device for which the EM table is to be updated
+ * @table	: The new EM table that is going to get the costs calculated
+ *
+ * Calculate the em_perf_state::cost values for new runtime EM table. The
+ * values are used for EAS during task placement. It also calculates and sets
+ * the efficiency flag for each performance state. When the function finish
+ * successfully the EM table is ready to be updated and used by EAS.
+ *
+ * Return 0 on success or a proper error in case of failure.
+ */
+int em_dev_compute_costs(struct device *dev, struct em_perf_state *table,
+			 int nr_states)
+{
+	return em_compute_costs(dev, table, NULL, nr_states, 0);
+}
+
 /**
  * em_dev_update_perf_domain() - Update runtime EM table for a device
  * @dev		: Device for which the EM is to be updated
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 23/23] Documentation: EM: Update with runtime modification design
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (21 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 22/23] PM: EM: Add em_dev_compute_costs() Lukasz Luba
@ 2024-01-17  9:57 ` Lukasz Luba
  2024-01-29 18:16 ` [PATCH v7 00/23] Introduce runtime modifiable Energy Model Dietmar Eggemann
  2024-02-07  9:15 ` Lukasz Luba
  24 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-01-17  9:57 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, dietmar.eggemann, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94

Add a new description which covers the information about runtime EM.
It contains the design decisions, describes models and how they
reflect the reality. Remove description of the default EM. Add example
driver code which modifies EM. Add API documentation for the new feature
which allows to modify the EM in runtime.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 Documentation/power/energy-model.rst | 183 ++++++++++++++++++++++++++-
 1 file changed, 179 insertions(+), 4 deletions(-)

diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index 13225965c9a4..ada4938c37e5 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -71,6 +71,31 @@ whose performance is scaled together. Performance domains generally have a
 required to have the same micro-architecture. CPUs in different performance
 domains can have different micro-architectures.
 
+To better reflect power variation due to static power (leakage) the EM
+supports runtime modifications of the power values. The mechanism relies on
+RCU to free the modifiable EM perf_state table memory. Its user, the task
+scheduler, also uses RCU to access this memory. The EM framework provides
+API for allocating/freeing the new memory for the modifiable EM table.
+The old memory is freed automatically using RCU callback mechanism when there
+are no owners anymore for the given EM runtime table instance. This is tracked
+using kref mechanism. The device driver which provided the new EM at runtime,
+should call EM API to free it safely when it's no longer needed. The EM
+framework will handle the clean-up when it's possible.
+
+The kernel code which want to modify the EM values is protected from concurrent
+access using a mutex. Therefore, the device driver code must run in sleeping
+context when it tries to modify the EM.
+
+With the runtime modifiable EM we switch from a 'single and during the entire
+runtime static EM' (system property) design to a 'single EM which can be
+changed during runtime according e.g. to the workload' (system and workload
+property) design.
+
+It is possible also to modify the CPU performance values for each EM's
+performance state. Thus, the full power and performance profile (which
+is an exponential curve) can be changed according e.g. to the workload
+or system property.
+
 
 2. Core APIs
 ------------
@@ -175,10 +200,82 @@ CPUfreq governor is in use in case of CPU device. Currently this calculation is
 not provided for other type of devices.
 
 More details about the above APIs can be found in ``<linux/energy_model.h>``
-or in Section 2.4
+or in Section 2.5
+
+
+2.4 Runtime modifications
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Drivers willing to update the EM at runtime should use the following dedicated
+function to allocate a new instance of the modified EM. The API is listed
+below::
+
+  struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd);
+
+This allows to allocate a structure which contains the new EM table with
+also RCU and kref needed by the EM framework. The 'struct em_perf_table'
+contains array 'struct em_perf_state state[]' which is a list of performance
+states in ascending order. That list must be populated by the device driver
+which wants to update the EM. The list of frequencies can be taken from
+existing EM (created during boot). The content in the 'struct em_perf_state'
+must be populated by the driver as well.
+
+This is the API which does the EM update, using RCU pointers swap::
+
+  int em_dev_update_perf_domain(struct device *dev,
+			struct em_perf_table __rcu *new_table);
+
+Drivers must provide a pointer to the allocated and initialized new EM
+'struct em_perf_table'. That new EM will be safely used inside the EM framework
+and will be visible to other sub-systems in the kernel (thermal, powercap).
+The main design goal for this API is to be fast and avoid extra calculations
+or memory allocations at runtime. When pre-computed EMs are available in the
+device driver, than it should be possible to simply re-use them with low
+performance overhead.
+
+In order to free the EM, provided earlier by the driver (e.g. when the module
+is unloaded), there is a need to call the API::
+
+  void em_table_free(struct em_perf_table __rcu *table);
+
+It will allow the EM framework to safely remove the memory, when there is
+no other sub-system using it, e.g. EAS.
+
+To use the power values in other sub-systems (like thermal, powercap) there is
+a need to call API which protects the reader and provide consistency of the EM
+table data::
+
+  struct em_perf_state *em_perf_state_from_pd(struct em_perf_domain *pd);
+
+It returns the 'struct em_perf_state' pointer which is an array of performance
+states in ascending order.
+This function must be called in the RCU read lock section (after the
+rcu_read_lock()). When the EM table is not needed anymore there is a need to
+call rcu_real_unlock(). In this way the EM safely uses the RCU read section
+and protects the users. It also allows the EM framework to manage the memory
+and free it. More details how to use it can be found in Section 3.2 in the
+example driver.
+
+There is dedicated API for device drivers to calculate em_perf_state::cost
+values::
+
+  int em_dev_compute_costs(struct device *dev, struct em_perf_state *table,
+                           int nr_states);
+
+These 'cost' values from EM are used in EAS. The new EM table should be passed
+together with the number of entries and device pointer. When the computation
+of the cost values is done properly the return value from the function is 0.
+The function takes care for right setting of inefficiency for each performance
+state as well. It updates em_perf_state::flags accordingly.
+Then such prepared new EM can be passed to the em_dev_update_perf_domain()
+function, which will allow to use it.
+
+More details about the above APIs can be found in ``<linux/energy_model.h>``
+or in Section 3.2 with an example code showing simple implementation of the
+updating mechanism in a device driver.
 
 
-2.4 Description details of this API
+2.5 Description details of this API
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. kernel-doc:: include/linux/energy_model.h
    :internal:
@@ -187,8 +284,11 @@ or in Section 2.4
    :export:
 
 
-3. Example driver
------------------
+3. Examples
+-----------
+
+3.1 Example driver with EM registration
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The CPUFreq framework supports dedicated callback for registering
 the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em().
@@ -242,3 +342,78 @@ EM framework::
   39	static struct cpufreq_driver foo_cpufreq_driver = {
   40		.register_em = foo_cpufreq_register_em,
   41	};
+
+
+3.2 Example driver with EM modification
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This section provides a simple example of a thermal driver modifying the EM.
+The driver implements a foo_thermal_em_update() function. The driver is woken
+up periodically to check the temperature and modify the EM data::
+
+  -> drivers/soc/example/example_em_mod.c
+
+  01	static void foo_get_new_em(struct foo_context *ctx)
+  02	{
+  03		struct em_perf_table __rcu *em_table;
+  04		struct em_perf_state *table, *new_table;
+  05		struct device *dev = ctx->dev;
+  06		struct em_perf_domain *pd;
+  07		unsigned long freq;
+  08		int i, ret;
+  09
+  10		pd = em_pd_get(dev);
+  11		if (!pd)
+  12			return;
+  13
+  14		em_table = em_table_alloc(pd);
+  15		if (!em_table)
+  16			return;
+  17
+  18		new_table = em_table->state;
+  19
+  20		rcu_read_lock();
+  21		table = em_perf_state_from_pd(pd);
+  22		for (i = 0; i < pd->nr_perf_states; i++) {
+  23			freq = table[i].frequency;
+  24			foo_get_power_perf_values(dev, freq, &new_table[i]);
+  25		}
+  26		rcu_read_unlock();
+  27
+  28		/* Calculate 'cost' values for EAS */
+  29		ret = em_dev_compute_costs(dev, table, pd->nr_perf_states);
+  30		if (ret) {
+  31			dev_warn(dev, "EM: compute costs failed %d\n", ret);
+  32			em_free_table(em_table);
+  33			return;
+  34		}
+  35
+  36		ret = em_dev_update_perf_domain(dev, em_table);
+  37		if (ret) {
+  38			dev_warn(dev, "EM: update failed %d\n", ret);
+  39			em_free_table(em_table);
+  40			return;
+  41		}
+  42
+  43		/*
+  44		 * Since it's one-time-update drop the usage counter.
+  45		 * The EM framework will later free the table when needed.
+  46		 */
+  47		em_table_free(em_table);
+  48	}
+  49
+  50	/*
+  51	 * Function called periodically to check the temperature and
+  52	 * update the EM if needed
+  53	 */
+  54	static void foo_thermal_em_update(struct foo_context *ctx)
+  55	{
+  56		struct device *dev = ctx->dev;
+  57		int cpu;
+  58
+  59		ctx->temperature = foo_get_temp(dev, ctx);
+  60		if (ctx->temperature < FOO_EM_UPDATE_TEMP_THRESHOLD)
+  61			return;
+  62
+  63		foo_get_new_em(ctx);
+  64	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 01/23] PM: EM: Add missing newline for the message log
  2024-01-17  9:56 ` [PATCH v7 01/23] PM: EM: Add missing newline for the message log Lukasz Luba
@ 2024-01-17 11:02   ` Hongyan Xia
  0 siblings, 0 replies; 40+ messages in thread
From: Hongyan Xia @ 2024-01-17 11:02 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: dietmar.eggemann, rui.zhang, amit.kucheria, amit.kachhap,
	daniel.lezcano, viresh.kumar, len.brown, pavel, mhiramat,
	qyousef, wvw, xuewen.yan94

On 17/01/2024 09:56, Lukasz Luba wrote:
> Fix missing newline for the string long in the error code path.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>   kernel/power/energy_model.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 7b44f5b89fa1..8b9dd4a39f63 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -250,7 +250,7 @@ static void em_cpufreq_update_efficiencies(struct device *dev)
>   
>   	policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd)));
>   	if (!policy) {
> -		dev_warn(dev, "EM: Access to CPUFreq policy failed");
> +		dev_warn(dev, "EM: Access to CPUFreq policy failed\n");
>   		return;
>   	}
>   

Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 02/23] PM: EM: Extend em_cpufreq_update_efficiencies() argument list
  2024-01-17  9:56 ` [PATCH v7 02/23] PM: EM: Extend em_cpufreq_update_efficiencies() argument list Lukasz Luba
@ 2024-01-17 11:10   ` Hongyan Xia
  0 siblings, 0 replies; 40+ messages in thread
From: Hongyan Xia @ 2024-01-17 11:10 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: dietmar.eggemann, rui.zhang, amit.kucheria, amit.kachhap,
	daniel.lezcano, viresh.kumar, len.brown, pavel, mhiramat,
	qyousef, wvw, xuewen.yan94

On 17/01/2024 09:56, Lukasz Luba wrote:
> In order to prepare the code for the modifiable EM perf_state table,
> make em_cpufreq_update_efficiencies() take a pointer to the EM table
> as its second argument and modify it to use that new argument instead
> of the 'table' member of dev->em_pd.
> 
> No functional impact.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>   kernel/power/energy_model.c | 8 +++-----
>   1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 8b9dd4a39f63..42486674b834 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -237,10 +237,10 @@ static int em_create_pd(struct device *dev, int nr_states,
>   	return 0;
>   }
>   
> -static void em_cpufreq_update_efficiencies(struct device *dev)
> +static void
> +em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table)
>   {
>   	struct em_perf_domain *pd = dev->em_pd;
> -	struct em_perf_state *table;
>   	struct cpufreq_policy *policy;
>   	int found = 0;
>   	int i;
> @@ -254,8 +254,6 @@ static void em_cpufreq_update_efficiencies(struct device *dev)
>   		return;
>   	}

NIT: It's not shown here, but in the check above this line

	if (!_is_cpu_device(dev) || !pd)

The !pd check should be removed because em_cpufreq_update_efficiencies() 
is only called after doing

	dev->em_pd->flags |= flags;

So compiler will optimize the !pd out anyway. But this is not directly 
related to the PR, so just a NIT.

>   
> -	table = pd->table;
> -
>   	for (i = 0; i < pd->nr_perf_states; i++) {
>   		if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
>   			continue;
> @@ -397,7 +395,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>   
>   	dev->em_pd->flags |= flags;
>   
> -	em_cpufreq_update_efficiencies(dev);
> +	em_cpufreq_update_efficiencies(dev, dev->em_pd->table);
>   
>   	em_debug_create_pd(dev);
>   	dev_info(dev, "EM: created perf domain\n");

Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 03/23] PM: EM: Find first CPU active while updating OPP efficiency
  2024-01-17  9:56 ` [PATCH v7 03/23] PM: EM: Find first CPU active while updating OPP efficiency Lukasz Luba
@ 2024-01-17 12:05   ` Hongyan Xia
  0 siblings, 0 replies; 40+ messages in thread
From: Hongyan Xia @ 2024-01-17 12:05 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: dietmar.eggemann, rui.zhang, amit.kucheria, amit.kachhap,
	daniel.lezcano, viresh.kumar, len.brown, pavel, mhiramat,
	qyousef, wvw, xuewen.yan94

On 17/01/2024 09:56, Lukasz Luba wrote:
> The Energy Model might be updated at runtime and the energy efficiency
> for each OPP may change. Thus, there is a need to update also the
> cpufreq framework and make it aligned to the new values. In order to
> do that, use a first active CPU from the Performance Domain. This is
> needed since the first CPU in the cpumask might be offline when we
> run this code path.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>   kernel/power/energy_model.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 42486674b834..aa7c89f9e115 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -243,12 +243,19 @@ em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table)
>   	struct em_perf_domain *pd = dev->em_pd;
>   	struct cpufreq_policy *policy;
>   	int found = 0;
> -	int i;
> +	int i, cpu;
>   
>   	if (!_is_cpu_device(dev) || !pd)
>   		return;
>   
> -	policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd)));
> +	/* Try to get a CPU which is active and in this PD */
> +	cpu = cpumask_first_and(em_span_cpus(pd), cpu_active_mask);
> +	if (cpu >= nr_cpu_ids) {
> +		dev_warn(dev, "EM: No online CPU for CPUFreq policy\n");
> +		return;
> +	}
> +
> +	policy = cpufreq_cpu_get(cpu);
>   	if (!policy) {
>   		dev_warn(dev, "EM: Access to CPUFreq policy failed\n");
>   		return;

Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 04/23] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible
  2024-01-17  9:56 ` [PATCH v7 04/23] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible Lukasz Luba
@ 2024-01-17 12:45   ` Hongyan Xia
  2024-02-06 13:53     ` Lukasz Luba
  0 siblings, 1 reply; 40+ messages in thread
From: Hongyan Xia @ 2024-01-17 12:45 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: dietmar.eggemann, rui.zhang, amit.kucheria, amit.kachhap,
	daniel.lezcano, viresh.kumar, len.brown, pavel, mhiramat,
	qyousef, wvw, xuewen.yan94

On 17/01/2024 09:56, Lukasz Luba wrote:
> The Energy Model (EM) is going to support runtime modification. There
> are going to be 2 EM tables which store information. This patch aims
> to prepare the code to be generic and use one of the tables. The function
> will no longer get a pointer to 'struct em_perf_domain' (the EM) but
> instead a pointer to 'struct em_perf_state' (which is one of the EM's
> tables).
> 
> Prepare em_pd_get_efficient_state() for the upcoming changes and
> make it possible to be re-used. Return an index for the best performance
> state for a given EM table. The function arguments that are introduced
> should allow to work on different performance state arrays. The caller of
> em_pd_get_efficient_state() should be able to use the index either
> on the default or the modifiable EM table.
> 
> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>   include/linux/energy_model.h | 30 +++++++++++++++++-------------
>   1 file changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index c19e7effe764..b01277b17946 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -175,33 +175,35 @@ void em_dev_unregister_perf_domain(struct device *dev);
>   
>   /**
>    * em_pd_get_efficient_state() - Get an efficient performance state from the EM
> - * @pd   : Performance domain for which we want an efficient frequency
> - * @freq : Frequency to map with the EM
> + * @table:		List of performance states, in ascending order
> + * @nr_perf_states:	Number of performance states
> + * @freq:		Frequency to map with the EM
> + * @pd_flags:		Performance Domain flags
>    *
>    * It is called from the scheduler code quite frequently and as a consequence
>    * doesn't implement any check.
>    *
> - * Return: An efficient performance state, high enough to meet @freq
> + * Return: An efficient performance state id, high enough to meet @freq
>    * requirement.
>    */
> -static inline
> -struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd,
> -						unsigned long freq)
> +static inline int
> +em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states,
> +			  unsigned long freq, unsigned long pd_flags)
>   {
>   	struct em_perf_state *ps;
>   	int i;
>   
> -	for (i = 0; i < pd->nr_perf_states; i++) {
> -		ps = &pd->table[i];
> +	for (i = 0; i < nr_perf_states; i++) {
> +		ps = &table[i];
>   		if (ps->frequency >= freq) {
> -			if (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
> +			if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
>   			    ps->flags & EM_PERF_STATE_INEFFICIENT)
>   				continue;
> -			break;
> +			return i;
>   		}
>   	}
>   
> -	return ps;
> +	return nr_perf_states - 1;
>   }
>   
>   /**
> @@ -226,7 +228,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>   {
>   	unsigned long freq, ref_freq, scale_cpu;
>   	struct em_perf_state *ps;
> -	int cpu;
> +	int cpu, i;
>   
>   	if (!sum_util)
>   		return 0;
> @@ -251,7 +253,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>   	 * Find the lowest performance state of the Energy Model above the
>   	 * requested frequency.
>   	 */
> -	ps = em_pd_get_efficient_state(pd, freq);
> +	i = em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq,
> +				      pd->flags);
> +	ps = &pd->table[i];
>   
>   	/*
>   	 * The capacity of a CPU in the domain at the performance state (ps)

Reviewed-by: Hongyan Xia <hongyan.xia@arm.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 12/23] PM: EM: Add em_perf_state_from_pd() to get performance states table
  2024-01-17  9:57 ` [PATCH v7 12/23] PM: EM: Add em_perf_state_from_pd() to get performance states table Lukasz Luba
@ 2024-01-29 18:13   ` Dietmar Eggemann
  2024-02-06 13:55     ` Lukasz Luba
  0 siblings, 1 reply; 40+ messages in thread
From: Dietmar Eggemann @ 2024-01-29 18:13 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: rui.zhang, amit.kucheria, amit.kachhap, daniel.lezcano,
	viresh.kumar, len.brown, pavel, mhiramat, qyousef, wvw,
	xuewen.yan94

On 17/01/2024 10:57, Lukasz Luba wrote:

[...]

> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 494df6942cf7..5ebe9dbec8e1 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -339,6 +339,23 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
>  	return pd->nr_perf_states;
>  }
>  
> +/**
> + * em_perf_state_from_pd() - Get the performance states table of perf.
> + *				domain
> + * @pd		: performance domain for which this must be done
> + *
> + * To use this function the rcu_read_lock() should be hold. After the usage
> + * of the performance states table is finished, the rcu_read_unlock() should
> + * be called.
> + *
> + * Return: the pointer to performance states table of the performance domain
> + */
> +static inline
> +struct em_perf_state *em_perf_state_from_pd(struct em_perf_domain *pd)

This is IMHO hard to get since:

  struct em_perf_table {
    struct rcu_head rcu;
    struct kref kref;
    struct em_perf_state state[];
  };

So very often a 'struct em_perf_table' is named 'table' and 'struct
em_perf_table::state' as well. E.g. in em_adjust_new_capacity().

  struct em_perf_state *new_table;

  new_table = em_table->state;

In older EM code, we used 'struct em_perf_state *ps' to avoid this
confusion, I guess.

And what you get from the PD is actually a state vector so maybe:

struct em_perf_state *em_get_perf_states(struct em_perf_domain *pd)

The 'from_pd' seems obvious because of the parameter?

[...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 13/23] PM: EM: Add performance field to struct em_perf_state and optimize
  2024-01-17  9:57 ` [PATCH v7 13/23] PM: EM: Add performance field to struct em_perf_state and optimize Lukasz Luba
@ 2024-01-29 18:13   ` Dietmar Eggemann
  0 siblings, 0 replies; 40+ messages in thread
From: Dietmar Eggemann @ 2024-01-29 18:13 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: rui.zhang, amit.kucheria, amit.kachhap, daniel.lezcano,
	viresh.kumar, len.brown, pavel, mhiramat, qyousef, wvw,
	xuewen.yan94

On 17/01/2024 09:57, Lukasz Luba wrote:

[...]

>  include/linux/energy_model.h | 24 ++++++++++++------------
>  kernel/power/energy_model.c  | 27 +++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 5ebe9dbec8e1..689d71f6b56f 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -13,6 +13,7 @@
>  
>  /**
>   * struct em_perf_state - Performance state of a performance domain
> + * @performance:	CPU performance (capacity) at a given frequency

I guess this is what we called the 'current CPU capacity' in older
Android versions.

[...]

> @@ -260,26 +262,24 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>  	/*
>  	 * In order to predict the performance state, map the utilization of
>  	 * the most utilized CPU of the performance domain to a requested 
> -	 * frequency, like schedutil. Take also into account that the real
> -	 * frequency might be set lower (due to thermal capping). Thus, clamp
> +	 * performance, like schedutil. Take also into account that the real
> +	 * performance might be set lower (due to thermal capping). Thus, clamp
>  	 * max utilization to the allowed CPU capacity before calculating
> -	 * effective frequency.
> +	 * effective performance.
>  	 */
>  	cpu = cpumask_first(to_cpumask(pd->cpus));
>  	scale_cpu = arch_scale_cpu_capacity(cpu);
> -	ref_freq = arch_scale_freq_ref(cpu);
>  
>  	max_util = map_util_perf(max_util);

Didn't apply cleanly on tip sched/code for me.

Looks like it's missing:

9c0b4bb7f630 - sched/cpufreq: Rework schedutil governor performance
estimation (2023-11-23 Vincent Guittot)

>  	max_util = min(max_util, allowed_cpu_cap);
> -	freq = map_util_freq(max_util, ref_freq, scale_cpu);

Since you're removing this here, shouldn't you also remove

* In order to predict the performance state, map the utilization of
* the most utilized CPU of the performance domain to a requested

Looks like with 9c0b4bb7f630 there is no mapping anymore?

[...]

>  static int em_compute_costs(struct device *dev, struct em_perf_state *table,
>  			    struct em_data_callback *cb, int nr_states,
>  			    unsigned long flags)
> @@ -318,6 +343,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
>  		table[i].frequency = prev_freq = freq;
>  	}
>  
> +	em_init_performance(dev, pd, table, nr_states);

Looks like pd already has 'pd->nr_perf_states' initialized. so just
passing pd seems to be sufficient. Like for em_table_alloc() and
em_create_perf_table().

[...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 16/23] powercap/dtpm_cpu: Use new Energy Model interface to get table
  2024-01-17  9:57 ` [PATCH v7 16/23] powercap/dtpm_cpu: Use new Energy Model interface to get table Lukasz Luba
@ 2024-01-29 18:14   ` Dietmar Eggemann
  0 siblings, 0 replies; 40+ messages in thread
From: Dietmar Eggemann @ 2024-01-29 18:14 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: rui.zhang, amit.kucheria, amit.kachhap, daniel.lezcano,
	viresh.kumar, len.brown, pavel, mhiramat, qyousef, wvw,
	xuewen.yan94

On 17/01/2024 10:57, Lukasz Luba wrote:

[...]

>  drivers/powercap/dtpm_cpu.c | 39 +++++++++++++++++++++++++++----------
>  1 file changed, 29 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
> index 9193c3b8edeb..ee0d1aa3e023 100644
> --- a/drivers/powercap/dtpm_cpu.c
> +++ b/drivers/powercap/dtpm_cpu.c
> @@ -42,6 +42,7 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
>  {
>  	struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm);
>  	struct em_perf_domain *pd = em_cpu_get(dtpm_cpu->cpu);
> +	struct em_perf_state *table;
>  	struct cpumask cpus;
>  	unsigned long freq;
>  	u64 power;
> @@ -50,20 +51,22 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
>  	cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus));
>  	nr_cpus = cpumask_weight(&cpus);
>  
> +	rcu_read_lock();
> +	table = em_perf_state_from_pd(pd);

'table' vs. 'perf state(s)' ... another example (compare to comment in
12/23).

[...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 22/23] PM: EM: Add em_dev_compute_costs()
  2024-01-17  9:57 ` [PATCH v7 22/23] PM: EM: Add em_dev_compute_costs() Lukasz Luba
@ 2024-01-29 18:15   ` Dietmar Eggemann
  0 siblings, 0 replies; 40+ messages in thread
From: Dietmar Eggemann @ 2024-01-29 18:15 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: rui.zhang, amit.kucheria, amit.kachhap, daniel.lezcano,
	viresh.kumar, len.brown, pavel, mhiramat, qyousef, wvw,
	xuewen.yan94

On 17/01/2024 10:57, Lukasz Luba wrote:

[...]

> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index e91c8efb5361..104cc2e2aa84 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -276,6 +276,24 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
>  	return 0;
>  }
>  
> +/**
> + * em_dev_compute_costs() - Calculate cost values for new runtime EM table
> + * @dev		: Device for which the EM table is to be updated
> + * @table	: The new EM table that is going to get the costs calculated
> + *
> + * Calculate the em_perf_state::cost values for new runtime EM table. The
> + * values are used for EAS during task placement. It also calculates and sets
> + * the efficiency flag for each performance state. When the function finish
> + * successfully the EM table is ready to be updated and used by EAS.
> + *
> + * Return 0 on success or a proper error in case of failure.
> + */
> +int em_dev_compute_costs(struct device *dev, struct em_perf_state *table,
> +			 int nr_states)
> +{
> +	return em_compute_costs(dev, table, NULL, nr_states, 0);
> +}
> +

Still no user of this function in this patch-set so it could be
introduced with the follow-up patch 'OPP: Add API to update EM after
adjustment of voltage for OPPs'. Especially now since Viresh and you
have agreed that this should be part of the EM code as well:

https://lkml.kernel.org/r/a42ae8dd-383c-43c0-88b4-101303d6f548@arm.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 00/23] Introduce runtime modifiable Energy Model
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (22 preceding siblings ...)
  2024-01-17  9:57 ` [PATCH v7 23/23] Documentation: EM: Update with runtime modification design Lukasz Luba
@ 2024-01-29 18:16 ` Dietmar Eggemann
  2024-02-06 13:54   ` Lukasz Luba
  2024-02-07  9:15 ` Lukasz Luba
  24 siblings, 1 reply; 40+ messages in thread
From: Dietmar Eggemann @ 2024-01-29 18:16 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: rui.zhang, amit.kucheria, amit.kachhap, daniel.lezcano,
	viresh.kumar, len.brown, pavel, mhiramat, qyousef, wvw,
	xuewen.yan94

On 17/01/2024 10:56, Lukasz Luba wrote:

[...]

> Changelog:
> v7:
> - dropped em_table_get/put() (Rafael)
> - renamed memory function to em_table_alloc/free() (Rafael)
> - use explicit rcu_read_lock/unlock() instead of wrappers and aligned
>   frameworks & drivers using EM (Rafael)
> - adjusted documentation to the new functions
> - fixed doxygen comments (Rafael)
> - renamed 'refcount' to 'kref' (Rafael)
> - changed patch headers according to comments (Rafael)
> - rebased on 'next-20240112' to get Ingo's revert affecting energy_model.h
> v6 [6]:
> - renamed 'runtime_table' to 'em_table' (Dietmar, Rafael)
> - dropped kref increment during allocation (Qais)
> - renamed em_inc/dec_usage() to em_table_inc/dec() (Qais)
> - fixed comment description and left old comment block with small
>   adjustment in em_cpu_energy() patch 15/23 (Dietmar)
> - added platform name which was used for speed-up testing (Dietmar)
> - changed patch header description keep it small not repeating the in-code
>   comment describing 'cost' in em_cpu_energy() patch 15/23 (Dietmar)
> - added check and warning in em_cpu_energy() about RCU lock held (Qais, Xuewen)
> - changed nr_perf_states usage in the patch 7/23 (Dietmar)
> - changed documentation according to comments (Dietmar)
> - changed in-code comment in patch 11/23 according to comments (Dietmar)
> - changed example driver function 'ctx' argument in the documentation (Xuewen)
> - changed the example driver in documentation, dropped module_exit and
>   added em_free_table() explicit in the update function
> - fixed comments in various patch headers (Dietmar)
> - fixed Doxygen comment s/@state/@table patch 4/23 (Dietmar)
> - added information in the cover letter about:
> -- optimization in EAS hot code path
> -- follow-up patch set which adds OPP support and modifies EM for Exynos5
> - rebased on 'next-20240104' to avoid collision with other code touching
>   em_cpu_energy()

LGTM now. I see that my comments from v5 have been addressed. Minor
points which still exists for me I commented on in the individual patches.

For the whole series:

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

(with a simple test driver updating the EM for CPU0 on Arm64 Juno-r0)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 04/23] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible
  2024-01-17 12:45   ` Hongyan Xia
@ 2024-02-06 13:53     ` Lukasz Luba
  0 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-02-06 13:53 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: dietmar.eggemann, rui.zhang, amit.kucheria, linux-kernel,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, wvw, xuewen.yan94, linux-pm, rafael



On 1/17/24 12:45, Hongyan Xia wrote:
> On 17/01/2024 09:56, Lukasz Luba wrote:
>> The Energy Model (EM) is going to support runtime modification. There
>> are going to be 2 EM tables which store information. This patch aims
>> to prepare the code to be generic and use one of the tables. The function
>> will no longer get a pointer to 'struct em_perf_domain' (the EM) but
>> instead a pointer to 'struct em_perf_state' (which is one of the EM's
>> tables).
>>
>> Prepare em_pd_get_efficient_state() for the upcoming changes and
>> make it possible to be re-used. Return an index for the best performance
>> state for a given EM table. The function arguments that are introduced
>> should allow to work on different performance state arrays. The caller of
>> em_pd_get_efficient_state() should be able to use the index either
>> on the default or the modifiable EM table.
>>
>> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>> ---
>>   include/linux/energy_model.h | 30 +++++++++++++++++-------------
>>   1 file changed, 17 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
>> index c19e7effe764..b01277b17946 100644
>> --- a/include/linux/energy_model.h
>> +++ b/include/linux/energy_model.h
>> @@ -175,33 +175,35 @@ void em_dev_unregister_perf_domain(struct device 
>> *dev);
>>   /**
>>    * em_pd_get_efficient_state() - Get an efficient performance state 
>> from the EM
>> - * @pd   : Performance domain for which we want an efficient frequency
>> - * @freq : Frequency to map with the EM
>> + * @table:        List of performance states, in ascending order
>> + * @nr_perf_states:    Number of performance states
>> + * @freq:        Frequency to map with the EM
>> + * @pd_flags:        Performance Domain flags
>>    *
>>    * It is called from the scheduler code quite frequently and as a 
>> consequence
>>    * doesn't implement any check.
>>    *
>> - * Return: An efficient performance state, high enough to meet @freq
>> + * Return: An efficient performance state id, high enough to meet @freq
>>    * requirement.
>>    */
>> -static inline
>> -struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain 
>> *pd,
>> -                        unsigned long freq)
>> +static inline int
>> +em_pd_get_efficient_state(struct em_perf_state *table, int 
>> nr_perf_states,
>> +              unsigned long freq, unsigned long pd_flags)
>>   {
>>       struct em_perf_state *ps;
>>       int i;
>> -    for (i = 0; i < pd->nr_perf_states; i++) {
>> -        ps = &pd->table[i];
>> +    for (i = 0; i < nr_perf_states; i++) {
>> +        ps = &table[i];
>>           if (ps->frequency >= freq) {
>> -            if (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
>> +            if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
>>                   ps->flags & EM_PERF_STATE_INEFFICIENT)
>>                   continue;
>> -            break;
>> +            return i;
>>           }
>>       }
>> -    return ps;
>> +    return nr_perf_states - 1;
>>   }
>>   /**
>> @@ -226,7 +228,7 @@ static inline unsigned long em_cpu_energy(struct 
>> em_perf_domain *pd,
>>   {
>>       unsigned long freq, ref_freq, scale_cpu;
>>       struct em_perf_state *ps;
>> -    int cpu;
>> +    int cpu, i;
>>       if (!sum_util)
>>           return 0;
>> @@ -251,7 +253,9 @@ static inline unsigned long em_cpu_energy(struct 
>> em_perf_domain *pd,
>>        * Find the lowest performance state of the Energy Model above the
>>        * requested frequency.
>>        */
>> -    ps = em_pd_get_efficient_state(pd, freq);
>> +    i = em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq,
>> +                      pd->flags);
>> +    ps = &pd->table[i];
>>       /*
>>        * The capacity of a CPU in the domain at the performance state 
>> (ps)
> 
> Reviewed-by: Hongyan Xia <hongyan.xia@arm.com>
> 

Thank you Hongyan for the reviews!
I might address your NIT comment for the patch 2/24 when
I do the re-basing and sending the v8 (if there is a need).

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 00/23] Introduce runtime modifiable Energy Model
  2024-01-29 18:16 ` [PATCH v7 00/23] Introduce runtime modifiable Energy Model Dietmar Eggemann
@ 2024-02-06 13:54   ` Lukasz Luba
  0 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-02-06 13:54 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: rui.zhang, amit.kucheria, amit.kachhap, daniel.lezcano,
	viresh.kumar, len.brown, pavel, mhiramat, qyousef, wvw,
	xuewen.yan94, linux-kernel, linux-pm, rafael

Hi Dietmar,

On 1/29/24 18:16, Dietmar Eggemann wrote:
> On 17/01/2024 10:56, Lukasz Luba wrote:
> 
> [...]
> 
>> Changelog:
>> v7:
>> - dropped em_table_get/put() (Rafael)
>> - renamed memory function to em_table_alloc/free() (Rafael)
>> - use explicit rcu_read_lock/unlock() instead of wrappers and aligned
>>    frameworks & drivers using EM (Rafael)
>> - adjusted documentation to the new functions
>> - fixed doxygen comments (Rafael)
>> - renamed 'refcount' to 'kref' (Rafael)
>> - changed patch headers according to comments (Rafael)
>> - rebased on 'next-20240112' to get Ingo's revert affecting energy_model.h
>> v6 [6]:
>> - renamed 'runtime_table' to 'em_table' (Dietmar, Rafael)
>> - dropped kref increment during allocation (Qais)
>> - renamed em_inc/dec_usage() to em_table_inc/dec() (Qais)
>> - fixed comment description and left old comment block with small
>>    adjustment in em_cpu_energy() patch 15/23 (Dietmar)
>> - added platform name which was used for speed-up testing (Dietmar)
>> - changed patch header description keep it small not repeating the in-code
>>    comment describing 'cost' in em_cpu_energy() patch 15/23 (Dietmar)
>> - added check and warning in em_cpu_energy() about RCU lock held (Qais, Xuewen)
>> - changed nr_perf_states usage in the patch 7/23 (Dietmar)
>> - changed documentation according to comments (Dietmar)
>> - changed in-code comment in patch 11/23 according to comments (Dietmar)
>> - changed example driver function 'ctx' argument in the documentation (Xuewen)
>> - changed the example driver in documentation, dropped module_exit and
>>    added em_free_table() explicit in the update function
>> - fixed comments in various patch headers (Dietmar)
>> - fixed Doxygen comment s/@state/@table patch 4/23 (Dietmar)
>> - added information in the cover letter about:
>> -- optimization in EAS hot code path
>> -- follow-up patch set which adds OPP support and modifies EM for Exynos5
>> - rebased on 'next-20240104' to avoid collision with other code touching
>>    em_cpu_energy()
> 
> LGTM now. I see that my comments from v5 have been addressed. Minor
> points which still exists for me I commented on in the individual patches.
> 
> For the whole series:
> 
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> 
> (with a simple test driver updating the EM for CPU0 on Arm64 Juno-r0)

Thank you for the review and testing!

I'll probably have to re-base the v7 on top of some current PM
branch.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 12/23] PM: EM: Add em_perf_state_from_pd() to get performance states table
  2024-01-29 18:13   ` Dietmar Eggemann
@ 2024-02-06 13:55     ` Lukasz Luba
  0 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-02-06 13:55 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: rui.zhang, amit.kucheria, amit.kachhap, daniel.lezcano,
	viresh.kumar, len.brown, pavel, mhiramat, qyousef, wvw,
	xuewen.yan94, linux-kernel, linux-pm, rafael



On 1/29/24 18:13, Dietmar Eggemann wrote:
> On 17/01/2024 10:57, Lukasz Luba wrote:
> 
> [...]
> 
>> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
>> index 494df6942cf7..5ebe9dbec8e1 100644
>> --- a/include/linux/energy_model.h
>> +++ b/include/linux/energy_model.h
>> @@ -339,6 +339,23 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
>>   	return pd->nr_perf_states;
>>   }
>>   
>> +/**
>> + * em_perf_state_from_pd() - Get the performance states table of perf.
>> + *				domain
>> + * @pd		: performance domain for which this must be done
>> + *
>> + * To use this function the rcu_read_lock() should be hold. After the usage
>> + * of the performance states table is finished, the rcu_read_unlock() should
>> + * be called.
>> + *
>> + * Return: the pointer to performance states table of the performance domain
>> + */
>> +static inline
>> +struct em_perf_state *em_perf_state_from_pd(struct em_perf_domain *pd)
> 
> This is IMHO hard to get since:
> 
>    struct em_perf_table {
>      struct rcu_head rcu;
>      struct kref kref;
>      struct em_perf_state state[];
>    };
> 
> So very often a 'struct em_perf_table' is named 'table' and 'struct
> em_perf_table::state' as well. E.g. in em_adjust_new_capacity().
> 
>    struct em_perf_state *new_table;
> 
>    new_table = em_table->state;
> 
> In older EM code, we used 'struct em_perf_state *ps' to avoid this
> confusion, I guess.
> 
> And what you get from the PD is actually a state vector so maybe:
> 
> struct em_perf_state *em_get_perf_states(struct em_perf_domain *pd)
> 
> The 'from_pd' seems obvious because of the parameter?

Rafael proposed that function name in his review comments, so I
followed. I might address your comment about the:
'struct em_perf_state *ps'
while I will do re-basing and sending v8.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 00/23] Introduce runtime modifiable Energy Model
  2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
                   ` (23 preceding siblings ...)
  2024-01-29 18:16 ` [PATCH v7 00/23] Introduce runtime modifiable Energy Model Dietmar Eggemann
@ 2024-02-07  9:15 ` Lukasz Luba
  2024-02-07 10:31   ` Rafael J. Wysocki
  24 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2024-02-07  9:15 UTC (permalink / raw)
  To: rafael
  Cc: dietmar.eggemann, linux-pm, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, linux-kernel, wvw, xuewen.yan94

Hi Rafael,

On 1/17/24 09:56, Lukasz Luba wrote:
> Hi all,
> 
> This patch set adds a new feature which allows to modify Energy Model (EM)
> power values at runtime. It will allow to better reflect power model of
> a recent SoCs and silicon. Different characteristics of the power usage
> can be leveraged and thus better decisions made during task placement in EAS.
> 

[snip]

> 
> 
> Lukasz Luba (23):
>    PM: EM: Add missing newline for the message log
>    PM: EM: Extend em_cpufreq_update_efficiencies() argument list
>    PM: EM: Find first CPU active while updating OPP efficiency
>    PM: EM: Refactor em_pd_get_efficient_state() to be more flexible
>    PM: EM: Introduce em_compute_costs()
>    PM: EM: Check if the get_cost() callback is present in
>      em_compute_costs()
>    PM: EM: Split the allocation and initialization of the EM table
>    PM: EM: Introduce runtime modifiable table
>    PM: EM: Use runtime modified EM for CPUs energy estimation in EAS
>    PM: EM: Add functions for memory allocations for new EM tables
>    PM: EM: Introduce em_dev_update_perf_domain() for EM updates
>    PM: EM: Add em_perf_state_from_pd() to get performance states table
>    PM: EM: Add performance field to struct em_perf_state and optimize
>    PM: EM: Support late CPUs booting and capacity adjustment
>    PM: EM: Optimize em_cpu_energy() and remove division
>    powercap/dtpm_cpu: Use new Energy Model interface to get table
>    powercap/dtpm_devfreq: Use new Energy Model interface to get table
>    drivers/thermal/cpufreq_cooling: Use new Energy Model interface
>    drivers/thermal/devfreq_cooling: Use new Energy Model interface
>    PM: EM: Change debugfs configuration to use runtime EM table data
>    PM: EM: Remove old table
>    PM: EM: Add em_dev_compute_costs()
>    Documentation: EM: Update with runtime modification design
> 
>   Documentation/power/energy-model.rst | 183 ++++++++++-
>   drivers/powercap/dtpm_cpu.c          |  39 ++-
>   drivers/powercap/dtpm_devfreq.c      |  34 +-
>   drivers/thermal/cpufreq_cooling.c    |  45 ++-
>   drivers/thermal/devfreq_cooling.c    |  49 ++-
>   include/linux/energy_model.h         | 165 ++++++----
>   kernel/power/energy_model.c          | 472 +++++++++++++++++++++++----
>   7 files changed, 819 insertions(+), 168 deletions(-)
> 

The patch set went through decent review. If you don't have any issues,
I will collect the tags and send the v8 which will be re-based on some
recent linux next (or please tell me your preferred branch).

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 00/23] Introduce runtime modifiable Energy Model
  2024-02-07  9:15 ` Lukasz Luba
@ 2024-02-07 10:31   ` Rafael J. Wysocki
  2024-02-07 11:49     ` Lukasz Luba
  0 siblings, 1 reply; 40+ messages in thread
From: Rafael J. Wysocki @ 2024-02-07 10:31 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: rafael, dietmar.eggemann, linux-pm, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, linux-kernel, wvw, xuewen.yan94

Hi Lukasz,

On Wed, Feb 7, 2024 at 10:15 AM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> Hi Rafael,
>
> On 1/17/24 09:56, Lukasz Luba wrote:
> > Hi all,
> >
> > This patch set adds a new feature which allows to modify Energy Model (EM)
> > power values at runtime. It will allow to better reflect power model of
> > a recent SoCs and silicon. Different characteristics of the power usage
> > can be leveraged and thus better decisions made during task placement in EAS.
> >
>
> [snip]
>
> >
> >
> > Lukasz Luba (23):
> >    PM: EM: Add missing newline for the message log
> >    PM: EM: Extend em_cpufreq_update_efficiencies() argument list
> >    PM: EM: Find first CPU active while updating OPP efficiency
> >    PM: EM: Refactor em_pd_get_efficient_state() to be more flexible
> >    PM: EM: Introduce em_compute_costs()
> >    PM: EM: Check if the get_cost() callback is present in
> >      em_compute_costs()
> >    PM: EM: Split the allocation and initialization of the EM table
> >    PM: EM: Introduce runtime modifiable table
> >    PM: EM: Use runtime modified EM for CPUs energy estimation in EAS
> >    PM: EM: Add functions for memory allocations for new EM tables
> >    PM: EM: Introduce em_dev_update_perf_domain() for EM updates
> >    PM: EM: Add em_perf_state_from_pd() to get performance states table
> >    PM: EM: Add performance field to struct em_perf_state and optimize
> >    PM: EM: Support late CPUs booting and capacity adjustment
> >    PM: EM: Optimize em_cpu_energy() and remove division
> >    powercap/dtpm_cpu: Use new Energy Model interface to get table
> >    powercap/dtpm_devfreq: Use new Energy Model interface to get table
> >    drivers/thermal/cpufreq_cooling: Use new Energy Model interface
> >    drivers/thermal/devfreq_cooling: Use new Energy Model interface
> >    PM: EM: Change debugfs configuration to use runtime EM table data
> >    PM: EM: Remove old table
> >    PM: EM: Add em_dev_compute_costs()
> >    Documentation: EM: Update with runtime modification design
> >
> >   Documentation/power/energy-model.rst | 183 ++++++++++-
> >   drivers/powercap/dtpm_cpu.c          |  39 ++-
> >   drivers/powercap/dtpm_devfreq.c      |  34 +-
> >   drivers/thermal/cpufreq_cooling.c    |  45 ++-
> >   drivers/thermal/devfreq_cooling.c    |  49 ++-
> >   include/linux/energy_model.h         | 165 ++++++----
> >   kernel/power/energy_model.c          | 472 +++++++++++++++++++++++----
> >   7 files changed, 819 insertions(+), 168 deletions(-)
> >
>
> The patch set went through decent review. If you don't have any issues,
> I will collect the tags and send the v8 which will be re-based on some
> recent linux next (or please tell me your preferred branch).

Blease base it on 6.8-rc3.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 15/23] PM: EM: Optimize em_cpu_energy() and remove division
  2024-01-17  9:57 ` [PATCH v7 15/23] PM: EM: Optimize em_cpu_energy() and remove division Lukasz Luba
@ 2024-02-07 11:40   ` Hongyan Xia
  0 siblings, 0 replies; 40+ messages in thread
From: Hongyan Xia @ 2024-02-07 11:40 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: dietmar.eggemann, rui.zhang, amit.kucheria, amit.kachhap,
	daniel.lezcano, viresh.kumar, len.brown, pavel, mhiramat,
	qyousef, wvw, xuewen.yan94

On 17/01/2024 09:57, Lukasz Luba wrote:
> The Energy Model (EM) can be modified at runtime which brings new
> possibilities. The em_cpu_energy() is called by the Energy Aware Scheduler
> (EAS) in its hot path. The energy calculation uses power value for
> a given performance state (ps) and the CPU busy time as percentage for that
> given frequency.
> 
> It is possible to avoid the division by 'scale_cpu' at runtime, because
> EM is updated whenever new max capacity CPU is set in the system.
> 
> Use that feature and do the needed division during the calculation of the
> coefficient 'ps->cost'. That enhanced 'ps->cost' value can be then just
> multiplied simply by utilization:
> 
> pd_nrg = ps->cost * \Sum cpu_util
> 
> to get the needed energy for whole Performance Domain (PD).
> 
> With this optimization and earlier removal of map_util_freq(), the
> em_cpu_energy() should run faster on the Big CPU by 1.43x and on the Little
> CPU by 1.69x (RockPi 4B board).
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>   include/linux/energy_model.h | 54 ++++++++++--------------------------
>   kernel/power/energy_model.c  |  7 ++---
>   2 files changed, 17 insertions(+), 44 deletions(-)
> 
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 689d71f6b56f..aabfc26fcd31 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> [...]
> @@ -208,8 +206,9 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
>   				return -EINVAL;
>   			}
>   		} else {
> -			power_res = table[i].power;
> -			cost = div64_u64(fmax * power_res, table[i].frequency);
> +			/* increase resolution of 'cost' precision */
> +			power_res = table[i].power * 10;

NIT: Does this have to be 10, or something simple like << 3 (* 8) also 
does the job?

Although compiler these days often are clever enough to convert x * 10 
into (x << 3) + (x << 1), and this is not on the hot path anyway, so 
just a NIT.

> +			cost = power_res / table[i].performance;
>   		}
>   
>   		table[i].cost = cost;

Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 00/23] Introduce runtime modifiable Energy Model
  2024-02-07 10:31   ` Rafael J. Wysocki
@ 2024-02-07 11:49     ` Lukasz Luba
  0 siblings, 0 replies; 40+ messages in thread
From: Lukasz Luba @ 2024-02-07 11:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: dietmar.eggemann, linux-pm, rui.zhang, amit.kucheria,
	amit.kachhap, daniel.lezcano, viresh.kumar, len.brown, pavel,
	mhiramat, qyousef, linux-kernel, wvw, xuewen.yan94



On 2/7/24 10:31, Rafael J. Wysocki wrote:
> Hi Lukasz,
> 
> On Wed, Feb 7, 2024 at 10:15 AM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>> Hi Rafael,
>>
>> On 1/17/24 09:56, Lukasz Luba wrote:
>>> Hi all,
>>>
>>> This patch set adds a new feature which allows to modify Energy Model (EM)
>>> power values at runtime. It will allow to better reflect power model of
>>> a recent SoCs and silicon. Different characteristics of the power usage
>>> can be leveraged and thus better decisions made during task placement in EAS.
>>>
>>
>> [snip]
>>
>>>
>>>
>>> Lukasz Luba (23):
>>>     PM: EM: Add missing newline for the message log
>>>     PM: EM: Extend em_cpufreq_update_efficiencies() argument list
>>>     PM: EM: Find first CPU active while updating OPP efficiency
>>>     PM: EM: Refactor em_pd_get_efficient_state() to be more flexible
>>>     PM: EM: Introduce em_compute_costs()
>>>     PM: EM: Check if the get_cost() callback is present in
>>>       em_compute_costs()
>>>     PM: EM: Split the allocation and initialization of the EM table
>>>     PM: EM: Introduce runtime modifiable table
>>>     PM: EM: Use runtime modified EM for CPUs energy estimation in EAS
>>>     PM: EM: Add functions for memory allocations for new EM tables
>>>     PM: EM: Introduce em_dev_update_perf_domain() for EM updates
>>>     PM: EM: Add em_perf_state_from_pd() to get performance states table
>>>     PM: EM: Add performance field to struct em_perf_state and optimize
>>>     PM: EM: Support late CPUs booting and capacity adjustment
>>>     PM: EM: Optimize em_cpu_energy() and remove division
>>>     powercap/dtpm_cpu: Use new Energy Model interface to get table
>>>     powercap/dtpm_devfreq: Use new Energy Model interface to get table
>>>     drivers/thermal/cpufreq_cooling: Use new Energy Model interface
>>>     drivers/thermal/devfreq_cooling: Use new Energy Model interface
>>>     PM: EM: Change debugfs configuration to use runtime EM table data
>>>     PM: EM: Remove old table
>>>     PM: EM: Add em_dev_compute_costs()
>>>     Documentation: EM: Update with runtime modification design
>>>
>>>    Documentation/power/energy-model.rst | 183 ++++++++++-
>>>    drivers/powercap/dtpm_cpu.c          |  39 ++-
>>>    drivers/powercap/dtpm_devfreq.c      |  34 +-
>>>    drivers/thermal/cpufreq_cooling.c    |  45 ++-
>>>    drivers/thermal/devfreq_cooling.c    |  49 ++-
>>>    include/linux/energy_model.h         | 165 ++++++----
>>>    kernel/power/energy_model.c          | 472 +++++++++++++++++++++++----
>>>    7 files changed, 819 insertions(+), 168 deletions(-)
>>>
>>
>> The patch set went through decent review. If you don't have any issues,
>> I will collect the tags and send the v8 which will be re-based on some
>> recent linux next (or please tell me your preferred branch).
> 
> Blease base it on 6.8-rc3.

OK, thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2024-02-07 11:49 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-17  9:56 [PATCH v7 00/23] Introduce runtime modifiable Energy Model Lukasz Luba
2024-01-17  9:56 ` [PATCH v7 01/23] PM: EM: Add missing newline for the message log Lukasz Luba
2024-01-17 11:02   ` Hongyan Xia
2024-01-17  9:56 ` [PATCH v7 02/23] PM: EM: Extend em_cpufreq_update_efficiencies() argument list Lukasz Luba
2024-01-17 11:10   ` Hongyan Xia
2024-01-17  9:56 ` [PATCH v7 03/23] PM: EM: Find first CPU active while updating OPP efficiency Lukasz Luba
2024-01-17 12:05   ` Hongyan Xia
2024-01-17  9:56 ` [PATCH v7 04/23] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible Lukasz Luba
2024-01-17 12:45   ` Hongyan Xia
2024-02-06 13:53     ` Lukasz Luba
2024-01-17  9:56 ` [PATCH v7 05/23] PM: EM: Introduce em_compute_costs() Lukasz Luba
2024-01-17  9:56 ` [PATCH v7 06/23] PM: EM: Check if the get_cost() callback is present in em_compute_costs() Lukasz Luba
2024-01-17  9:56 ` [PATCH v7 07/23] PM: EM: Split the allocation and initialization of the EM table Lukasz Luba
2024-01-17  9:56 ` [PATCH v7 08/23] PM: EM: Introduce runtime modifiable table Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 09/23] PM: EM: Use runtime modified EM for CPUs energy estimation in EAS Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 10/23] PM: EM: Add functions for memory allocations for new EM tables Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 11/23] PM: EM: Introduce em_dev_update_perf_domain() for EM updates Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 12/23] PM: EM: Add em_perf_state_from_pd() to get performance states table Lukasz Luba
2024-01-29 18:13   ` Dietmar Eggemann
2024-02-06 13:55     ` Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 13/23] PM: EM: Add performance field to struct em_perf_state and optimize Lukasz Luba
2024-01-29 18:13   ` Dietmar Eggemann
2024-01-17  9:57 ` [PATCH v7 14/23] PM: EM: Support late CPUs booting and capacity adjustment Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 15/23] PM: EM: Optimize em_cpu_energy() and remove division Lukasz Luba
2024-02-07 11:40   ` Hongyan Xia
2024-01-17  9:57 ` [PATCH v7 16/23] powercap/dtpm_cpu: Use new Energy Model interface to get table Lukasz Luba
2024-01-29 18:14   ` Dietmar Eggemann
2024-01-17  9:57 ` [PATCH v7 17/23] powercap/dtpm_devfreq: " Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 18/23] drivers/thermal/cpufreq_cooling: Use new Energy Model interface Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 19/23] drivers/thermal/devfreq_cooling: " Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 20/23] PM: EM: Change debugfs configuration to use runtime EM table data Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 21/23] PM: EM: Remove old table Lukasz Luba
2024-01-17  9:57 ` [PATCH v7 22/23] PM: EM: Add em_dev_compute_costs() Lukasz Luba
2024-01-29 18:15   ` Dietmar Eggemann
2024-01-17  9:57 ` [PATCH v7 23/23] Documentation: EM: Update with runtime modification design Lukasz Luba
2024-01-29 18:16 ` [PATCH v7 00/23] Introduce runtime modifiable Energy Model Dietmar Eggemann
2024-02-06 13:54   ` Lukasz Luba
2024-02-07  9:15 ` Lukasz Luba
2024-02-07 10:31   ` Rafael J. Wysocki
2024-02-07 11:49     ` Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).