Linux-OMAP Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v7 00/15] Add support for devices in the Energy Model
@ 2020-05-11 11:18 Lukasz Luba
  2020-05-11 11:18 ` [PATCH v7 01/15] PM / EM: change naming convention from 'capacity' to 'performance' Lukasz Luba
                   ` (15 more replies)
  0 siblings, 16 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:18 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Hi all,

This patch set introduces support for devices in the Energy Model (EM)
framework. It will unify the power model for thermal subsystem. It will
make simpler to add support for new devices willing to use more
advanced features (like Intelligent Power Allocation). Now it should
require less knowledge and effort for driver developer to add e.g.
GPU driver with simple energy model. A more sophisticated energy model
in the thermal framework is also possible, driver needs to provide
a dedicated callback function. More information can be found in the
updated documentation file.

First 7 patches are refactoring Energy Model framework to add support
of other devices that CPUs. They change:
- naming convention from 'capacity' to 'performance' state,
- API arguments adding device pointer and not rely only on cpumask,
- change naming when 'cpu' was used, now it's a 'device'
- internal structure to maintain registered devices
- update users to the new API
Patch 8 updates OPP framework helper function to be more generic, not
CPU specific.
Patches 9-14 change devfreq cooling, dropping part of old power model and
adding registration with Energy Model via exported GPL function.
The last path is a simple change for Panfrost GPU driver.

The patch set is based on linux-next tag next-20200508.

Changes:
v7:
- all the EM changes are for patch 4/10 from v6
-- added em_perf_domain structure into struct device in order to simplify
  code (suggested by Daniel)
-- removed kref, em_pd, debug_dir from em_device (suggested by Daniel)
-- refactored code and removed unsed functions
-- refactored checking if EM exists for a CPU and deleted em_cpus_pd_exist()
-- simplified em_pd_get() and em_cpu_get()
-- removed em_debug_type_show()
-- removed a few unused debug prints
-- changed debug dir name in order to remove em_debug_type_show() and
  em_debug_dev_show() functions
-- removed em_dev_list and em_device since it is now possible to use
  struct device and rely on it
- thermal: split patch 9/10 from v6 as requested by Daniel and created 6 new:
-- patch v7 9/15 with only tracing change - exactly the same code so I keep
   'Reviewed-by # for tracing code' from Steven Rostedt
-- patch 10/15 and 11/14 takes more consistent state of devfreq device
-- patch 12/15 which adds Energy Model register/unregister functions
-- patch 13/15 which uses the new Energy Model
-- patch 14/15 adjusts headers license into the SPDX
v6 [6]:
- split patch 1/5 from v5 into smaller patches as requested by Daniel
  and dropped ACK from Quentin which was in the old there
- added function em_dev_register_perf_domain as suggested by Daniel, which
  would help transition into the new API
- changed 'cs' (capacity state) in different places into 'ps' (performance state),
  since now there are many smaller patches (previously skipped because
  of too big size of the patch with main features and left to do later)
- changed cpumask_equal() to cpumask_intersects() when checking if 'cpus' coming
  as an argument to registration function might overlap with already known;
  this shouldn't be an issue when cpufreq policy is OK, but a check doesn't harm
- added Reviewed-by from Alyssa into Panfrost related patch
- dropped Matthias patch with PM QoS from the series since it's in the next now
v5 [5]:
- devfreq cooling: rebased on top of pending patch introducing PM QoS limits
- devfreq cooling: added Matthias's patch to make this series build check pass
- devfreq cooling: removed OPP disable code and switched to PM QoS
- devfreq cooling: since thermal code always used a pointer to devfreq_dev_status,
  switched to work on a local copy and avoid potential race when either busy_time or
  total_time could change in the background
- devfreq cooling: added _normalize_load() and handle all scenarios when
  busy_time and total_time could have odd values (even raw counters)
- Energy Model patch 2/4: removed prints from cpufreq drivers and added print inside
  dev_pm_opp_of_register_em()
- update patch 2/4 description to better reflect upcoming changes
- collected ACK from Quentin for patch 1/4 and Reviewed-by from Steven for 4/4
v4 [4]:
- devfreq cooling: added two new registration functions, which will take care
  of registering EM for the device and simplify drivers code
  (suggested by Robin and Rob)
- Energy Model: changed unregistering code, added kref to track usage, added
  code freeing tables, added helper function
- added return value to function dev_pm_opp_of_register_em() and updated
  CPUFreq drivers code, added debug prints in case of failure
- updated comments in devfreq cooling removing statement that only
  simple_ondemand devfreq governor is supported to work with power extentions
- fixed spelling in the documentation (reported by Randy)
v3 [3]:
- added back the cpumask 'cpus' in the em_perf_domain due potential cache misses
- removed _is_cpu_em() since there is no need for it
- changed function name from em_pd_energy() to em_cpu_energy(), which is
  optimized for usage from the scheduler making some assumptions and not
  validating arguments to speed-up, there is a comment stressing that it should
  be used only for CPUs em_perf_domain
- changed em_get_pd() to em_pd_get() which is now aligned with em_cpu_get()
  naming
- Energy Model: add code which checks if the EM is already registered for the
  devfreq device
- extended comment in em_cpu_get() describing the need for this function
- fixed build warning reported on x86 by kbuild test robot in devfreq_cooling.c
- updated documentation in the energy-model.rst
- changed print messages from 'energy_model' to 'EM'
- changed dev_warn to dev_dbg, should calm down test scripts in case the
  platform has OPPs less efficient in the OPP table (some of them are there for
  cooling reasons, we shouldn't warn in this case, debug info is enough)
v2 [2]:
- changed EM API em_register_perf_domain() adding cpumask_t pointer
  as last argument (which was discussed with Dietmar and Quentin)
- removed dependency on PM_OPP, thanks to the cpumask_t argument
- removed enum em_type and em->type dependent code
- em_get_pd() can handle CPU device as well as devfreq device
- updated EM documentation
- in devfreq cooling added code which prevents from race condition with
  devfreq governors which are trying to use OPPs while thermal is in the middle
  of disabling them.
- in devfreq cooling added code which updates state of the devfreq device to
  avoid working on stale data when governor has not updated it for a long time
- in devfreq cooling added backward compatibility frequency table for drivers
  which did not provide EM
- added Steven's Reviewed-by to trace code in thermal
- added another CPUFreq driver which needs to be updated to the new API
The v1 can be found here [1].

Regards,
Lukasz Luba

[1] https://lkml.org/lkml/2020/1/16/619
[2] https://lkml.org/lkml/2020/2/6/377
[3] https://lkml.org/lkml/2020/2/21/1910
[4] https://lkml.org/lkml/2020/3/9/471
[5] https://lkml.org/lkml/2020/3/18/351
[6] https://lkml.org/lkml/2020/4/10/108


Lukasz Luba (15):
  PM / EM: change naming convention from 'capacity' to 'performance'
  PM / EM: introduce em_dev_register_perf_domain function
  PM / EM: update callback structure and add device pointer
  PM / EM: add support for other devices than CPUs in Energy Model
  PM / EM: remove em_register_perf_domain
  PM / EM: change name of em_pd_energy to em_cpu_energy
  Documentation: power: update Energy Model description
  OPP: refactor dev_pm_opp_of_register_em() and update related drivers
  thermal: devfreq_cooling: change tracing function and arguments
  thermal: devfreq_cooling: get device load and frequency directly
  thermal: devfreq_cooling: work on a copy of device status
  thermal: devfreq_cooling: add new registration functions with Energy
    Model
  thermal: devfreq_cooling: remove old power model and use EM
  thermal: devfreq_cooling: update license to use SPDX
  drm/panfrost: Register devfreq cooling and attempt to add Energy Model

 Documentation/power/energy-model.rst        | 135 ++++---
 drivers/cpufreq/cpufreq-dt.c                |   2 +-
 drivers/cpufreq/imx6q-cpufreq.c             |   2 +-
 drivers/cpufreq/mediatek-cpufreq.c          |   2 +-
 drivers/cpufreq/omap-cpufreq.c              |   2 +-
 drivers/cpufreq/qcom-cpufreq-hw.c           |   2 +-
 drivers/cpufreq/scmi-cpufreq.c              |  11 +-
 drivers/cpufreq/scpi-cpufreq.c              |   2 +-
 drivers/cpufreq/vexpress-spc-cpufreq.c      |   2 +-
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |   2 +-
 drivers/opp/of.c                            |  76 ++--
 drivers/thermal/cpufreq_cooling.c           |  12 +-
 drivers/thermal/devfreq_cooling.c           | 426 ++++++++++----------
 include/linux/devfreq_cooling.h             |  51 ++-
 include/linux/device.h                      |   5 +
 include/linux/energy_model.h                | 148 ++++---
 include/linux/pm_opp.h                      |  15 +-
 include/trace/events/thermal.h              |  19 +-
 kernel/power/energy_model.c                 | 283 ++++++++-----
 kernel/sched/fair.c                         |   2 +-
 kernel/sched/topology.c                     |  20 +-
 21 files changed, 692 insertions(+), 527 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 01/15] PM / EM: change naming convention from 'capacity' to 'performance'
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
@ 2020-05-11 11:18 ` Lukasz Luba
  2020-05-11 11:46   ` Quentin Perret
  2020-05-11 11:18 ` [PATCH v7 02/15] PM / EM: introduce em_dev_register_perf_domain function Lukasz Luba
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:18 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

The Energy Model uses concept of performance domain and capacity states in
order to calculate power used by CPUs. Change naming convention from
capacity to performance state would enable wider usage in future, e.g.
upcoming support for other devices other than CPUs.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/cpufreq_cooling.c | 12 ++---
 include/linux/energy_model.h      | 86 +++++++++++++++++--------------
 kernel/power/energy_model.c       | 44 ++++++++--------
 kernel/sched/topology.c           | 20 +++----
 4 files changed, 84 insertions(+), 78 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index e297e135c031..ad8971e26538 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -333,18 +333,18 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
 		return false;
 
 	policy = cpufreq_cdev->policy;
-	if (!cpumask_equal(policy->related_cpus, to_cpumask(em->cpus))) {
+	if (!cpumask_equal(policy->related_cpus, em_span_cpus(em))) {
 		pr_err("The span of pd %*pbl is misaligned with cpufreq policy %*pbl\n",
-			cpumask_pr_args(to_cpumask(em->cpus)),
+			cpumask_pr_args(em_span_cpus(em)),
 			cpumask_pr_args(policy->related_cpus));
 		return false;
 	}
 
 	nr_levels = cpufreq_cdev->max_level + 1;
-	if (em->nr_cap_states != nr_levels) {
-		pr_err("The number of cap states in pd %*pbl (%u) doesn't match the number of cooling levels (%u)\n",
-			cpumask_pr_args(to_cpumask(em->cpus)),
-			em->nr_cap_states, nr_levels);
+	if (em_pd_nr_perf_states(em) != nr_levels) {
+		pr_err("The number of performance states in pd %*pbl (%u) doesn't match the number of cooling levels (%u)\n",
+			cpumask_pr_args(em_span_cpus(em)),
+			em_pd_nr_perf_states(em), nr_levels);
 		return false;
 	}
 
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index ade6486a3382..fe336a9eb5d4 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -10,13 +10,13 @@
 #include <linux/types.h>
 
 /**
- * em_cap_state - Capacity state of a performance domain
+ * em_perf_state - Performance state of a performance domain
  * @frequency:	The CPU frequency in KHz, for consistency with CPUFreq
  * @power:	The power consumed by 1 CPU at this level, in milli-watts
  * @cost:	The cost coefficient associated with this level, used during
  *		energy calculation. Equal to: power * max_frequency / frequency
  */
-struct em_cap_state {
+struct em_perf_state {
 	unsigned long frequency;
 	unsigned long power;
 	unsigned long cost;
@@ -24,8 +24,8 @@ struct em_cap_state {
 
 /**
  * em_perf_domain - Performance domain
- * @table:		List of capacity states, in ascending order
- * @nr_cap_states:	Number of capacity states
+ * @table:		List of performance states, in ascending order
+ * @nr_perf_states:	Number of performance states
  * @cpus:		Cpumask covering the CPUs of the domain
  *
  * A "performance domain" represents a group of CPUs whose performance is
@@ -34,22 +34,27 @@ struct em_cap_state {
  * CPUFreq policies.
  */
 struct em_perf_domain {
-	struct em_cap_state *table;
-	int nr_cap_states;
+	struct em_perf_state *table;
+	int nr_perf_states;
 	unsigned long cpus[];
 };
 
+#define em_span_cpus(em) (to_cpumask((em)->cpus))
+
 #ifdef CONFIG_ENERGY_MODEL
 #define EM_CPU_MAX_POWER 0xFFFF
 
 struct em_data_callback {
 	/**
-	 * active_power() - Provide power at the next capacity state of a CPU
-	 * @power	: Active power at the capacity state in mW (modified)
-	 * @freq	: Frequency at the capacity state in kHz (modified)
+	 * active_power() - Provide power at the next performance state of
+	 *		a CPU
+	 * @power	: Active power at the performance state in mW
+	 *		(modified)
+	 * @freq	: Frequency at the performance state in kHz
+	 *		(modified)
 	 * @cpu		: CPU for which we do this operation
 	 *
-	 * active_power() must find the lowest capacity state of 'cpu' above
+	 * active_power() must find the lowest performance state of 'cpu' above
 	 * 'freq' and update 'power' and 'freq' to the matching active power
 	 * and frequency.
 	 *
@@ -80,46 +85,46 @@ static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
 				unsigned long max_util, unsigned long sum_util)
 {
 	unsigned long freq, scale_cpu;
-	struct em_cap_state *cs;
+	struct em_perf_state *ps;
 	int i, cpu;
 
 	/*
-	 * In order to predict the capacity state, map the utilization of the
-	 * most utilized CPU of the performance domain to a requested frequency,
-	 * like schedutil.
+	 * In order to predict the performance state, map the utilization of
+	 * the most utilized CPU of the performance domain to a requested
+	 * frequency, like schedutil.
 	 */
 	cpu = cpumask_first(to_cpumask(pd->cpus));
 	scale_cpu = arch_scale_cpu_capacity(cpu);
-	cs = &pd->table[pd->nr_cap_states - 1];
-	freq = map_util_freq(max_util, cs->frequency, scale_cpu);
+	ps = &pd->table[pd->nr_perf_states - 1];
+	freq = map_util_freq(max_util, ps->frequency, scale_cpu);
 
 	/*
-	 * Find the lowest capacity state of the Energy Model above the
+	 * Find the lowest performance state of the Energy Model above the
 	 * requested frequency.
 	 */
-	for (i = 0; i < pd->nr_cap_states; i++) {
-		cs = &pd->table[i];
-		if (cs->frequency >= freq)
+	for (i = 0; i < pd->nr_perf_states; i++) {
+		ps = &pd->table[i];
+		if (ps->frequency >= freq)
 			break;
 	}
 
 	/*
-	 * The capacity of a CPU in the domain at that capacity state (cs)
+	 * The capacity of a CPU in the domain at the performance state (ps)
 	 * can be computed as:
 	 *
-	 *             cs->freq * scale_cpu
-	 *   cs->cap = --------------------                          (1)
+	 *             ps->freq * scale_cpu
+	 *   ps->cap = --------------------                          (1)
 	 *                 cpu_max_freq
 	 *
 	 * So, ignoring the costs of idle states (which are not available in
-	 * the EM), the energy consumed by this CPU at that capacity state is
-	 * estimated as:
+	 * the EM), the energy consumed by this CPU at that performance state
+	 * is estimated as:
 	 *
-	 *             cs->power * cpu_util
+	 *             ps->power * cpu_util
 	 *   cpu_nrg = --------------------                          (2)
-	 *                   cs->cap
+	 *                   ps->cap
 	 *
-	 * since 'cpu_util / cs->cap' represents its percentage of busy time.
+	 * since 'cpu_util / ps->cap' represents its percentage of busy time.
 	 *
 	 *   NOTE: Although the result of this computation actually is in
 	 *         units of power, it can be manipulated as an energy value
@@ -129,34 +134,35 @@ static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
 	 * By injecting (1) in (2), 'cpu_nrg' can be re-expressed as a product
 	 * of two terms:
 	 *
-	 *             cs->power * cpu_max_freq   cpu_util
+	 *             ps->power * cpu_max_freq   cpu_util
 	 *   cpu_nrg = ------------------------ * ---------          (3)
-	 *                    cs->freq            scale_cpu
+	 *                    ps->freq            scale_cpu
 	 *
-	 * The first term is static, and is stored in the em_cap_state struct
-	 * as 'cs->cost'.
+	 * The first term is static, and is stored in the em_perf_state struct
+	 * as 'ps->cost'.
 	 *
 	 * Since all CPUs of the domain have the same micro-architecture, they
-	 * share the same 'cs->cost', and the same CPU capacity. Hence, the
+	 * share the same 'ps->cost', and the same CPU capacity. Hence, the
 	 * total energy of the domain (which is the simple sum of the energy of
 	 * all of its CPUs) can be factorized as:
 	 *
-	 *            cs->cost * \Sum cpu_util
+	 *            ps->cost * \Sum cpu_util
 	 *   pd_nrg = ------------------------                       (4)
 	 *                  scale_cpu
 	 */
-	return cs->cost * sum_util / scale_cpu;
+	return ps->cost * sum_util / scale_cpu;
 }
 
 /**
- * em_pd_nr_cap_states() - Get the number of capacity states of a perf. domain
+ * em_pd_nr_perf_states() - Get the number of performance states of a perf.
+ *				domain
  * @pd		: performance domain for which this must be done
  *
- * Return: the number of capacity states in the performance domain table
+ * Return: the number of performance states in the performance domain table
  */
-static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
+static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
 {
-	return pd->nr_cap_states;
+	return pd->nr_perf_states;
 }
 
 #else
@@ -177,7 +183,7 @@ static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
 {
 	return 0;
 }
-static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
+static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
 {
 	return 0;
 }
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 0a9326f5f421..9892d548a0fa 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -27,18 +27,18 @@ static DEFINE_MUTEX(em_pd_mutex);
 #ifdef CONFIG_DEBUG_FS
 static struct dentry *rootdir;
 
-static void em_debug_create_cs(struct em_cap_state *cs, struct dentry *pd)
+static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
 {
 	struct dentry *d;
 	char name[24];
 
-	snprintf(name, sizeof(name), "cs:%lu", cs->frequency);
+	snprintf(name, sizeof(name), "ps:%lu", ps->frequency);
 
-	/* Create per-cs directory */
+	/* Create per-ps directory */
 	d = debugfs_create_dir(name, pd);
-	debugfs_create_ulong("frequency", 0444, d, &cs->frequency);
-	debugfs_create_ulong("power", 0444, d, &cs->power);
-	debugfs_create_ulong("cost", 0444, d, &cs->cost);
+	debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
+	debugfs_create_ulong("power", 0444, d, &ps->power);
+	debugfs_create_ulong("cost", 0444, d, &ps->cost);
 }
 
 static int em_debug_cpus_show(struct seq_file *s, void *unused)
@@ -62,9 +62,9 @@ static void em_debug_create_pd(struct em_perf_domain *pd, int cpu)
 
 	debugfs_create_file("cpus", 0444, d, pd->cpus, &em_debug_cpus_fops);
 
-	/* Create a sub-directory for each capacity state */
-	for (i = 0; i < pd->nr_cap_states; i++)
-		em_debug_create_cs(&pd->table[i], d);
+	/* Create a sub-directory for each performance state */
+	for (i = 0; i < pd->nr_perf_states; i++)
+		em_debug_create_ps(&pd->table[i], d);
 }
 
 static int __init em_debug_init(void)
@@ -84,7 +84,7 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
 	unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
 	unsigned long power, freq, prev_freq = 0;
 	int i, ret, cpu = cpumask_first(span);
-	struct em_cap_state *table;
+	struct em_perf_state *table;
 	struct em_perf_domain *pd;
 	u64 fmax;
 
@@ -99,26 +99,26 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
 	if (!table)
 		goto free_pd;
 
-	/* Build the list of capacity states for this performance domain */
+	/* Build the list of performance states for this performance domain */
 	for (i = 0, freq = 0; i < nr_states; i++, freq++) {
 		/*
 		 * active_power() is a driver callback which ceils 'freq' to
-		 * lowest capacity state of 'cpu' above 'freq' and updates
+		 * lowest performance state of 'cpu' above 'freq' and updates
 		 * 'power' and 'freq' accordingly.
 		 */
 		ret = cb->active_power(&power, &freq, cpu);
 		if (ret) {
-			pr_err("pd%d: invalid cap. state: %d\n", cpu, ret);
-			goto free_cs_table;
+			pr_err("pd%d: invalid perf. state: %d\n", cpu, ret);
+			goto free_ps_table;
 		}
 
 		/*
 		 * We expect the driver callback to increase the frequency for
-		 * higher capacity states.
+		 * higher performance states.
 		 */
 		if (freq <= prev_freq) {
 			pr_err("pd%d: non-increasing freq: %lu\n", cpu, freq);
-			goto free_cs_table;
+			goto free_ps_table;
 		}
 
 		/*
@@ -127,7 +127,7 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
 		 */
 		if (!power || power > EM_CPU_MAX_POWER) {
 			pr_err("pd%d: invalid power: %lu\n", cpu, power);
-			goto free_cs_table;
+			goto free_ps_table;
 		}
 
 		table[i].power = power;
@@ -141,12 +141,12 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
 		 */
 		opp_eff = freq / power;
 		if (opp_eff >= prev_opp_eff)
-			pr_warn("pd%d: hertz/watts ratio non-monotonically decreasing: em_cap_state %d >= em_cap_state%d\n",
+			pr_warn("pd%d: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
 					cpu, i, i - 1);
 		prev_opp_eff = opp_eff;
 	}
 
-	/* Compute the cost of each capacity_state. */
+	/* Compute the cost of each performance state. */
 	fmax = (u64) table[nr_states - 1].frequency;
 	for (i = 0; i < nr_states; i++) {
 		table[i].cost = div64_u64(fmax * table[i].power,
@@ -154,14 +154,14 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
 	}
 
 	pd->table = table;
-	pd->nr_cap_states = nr_states;
+	pd->nr_perf_states = nr_states;
 	cpumask_copy(to_cpumask(pd->cpus), span);
 
 	em_debug_create_pd(pd, cpu);
 
 	return pd;
 
-free_cs_table:
+free_ps_table:
 	kfree(table);
 free_pd:
 	kfree(pd);
@@ -185,7 +185,7 @@ EXPORT_SYMBOL_GPL(em_cpu_get);
 /**
  * em_register_perf_domain() - Register the Energy Model of a performance domain
  * @span	: Mask of CPUs in the performance domain
- * @nr_states	: Number of capacity states to register
+ * @nr_states	: Number of performance states to register
  * @cb		: Callback functions providing the data of the Energy Model
  *
  * Create Energy Model tables for a performance domain using the callbacks
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index ba81187bb7af..2f91d3126365 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -272,10 +272,10 @@ static void perf_domain_debug(const struct cpumask *cpu_map,
 	printk(KERN_DEBUG "root_domain %*pbl:", cpumask_pr_args(cpu_map));
 
 	while (pd) {
-		printk(KERN_CONT " pd%d:{ cpus=%*pbl nr_cstate=%d }",
+		printk(KERN_CONT " pd%d:{ cpus=%*pbl nr_pstate=%d }",
 				cpumask_first(perf_domain_span(pd)),
 				cpumask_pr_args(perf_domain_span(pd)),
-				em_pd_nr_cap_states(pd->em_pd));
+				em_pd_nr_perf_states(pd->em_pd));
 		pd = pd->next;
 	}
 
@@ -313,26 +313,26 @@ static void sched_energy_set(bool has_eas)
  *
  * The complexity of the Energy Model is defined as:
  *
- *              C = nr_pd * (nr_cpus + nr_cs)
+ *              C = nr_pd * (nr_cpus + nr_ps)
  *
  * with parameters defined as:
  *  - nr_pd:    the number of performance domains
  *  - nr_cpus:  the number of CPUs
- *  - nr_cs:    the sum of the number of capacity states of all performance
+ *  - nr_ps:    the sum of the number of performance states of all performance
  *              domains (for example, on a system with 2 performance domains,
- *              with 10 capacity states each, nr_cs = 2 * 10 = 20).
+ *              with 10 performance states each, nr_ps = 2 * 10 = 20).
  *
  * It is generally not a good idea to use such a model in the wake-up path on
  * very complex platforms because of the associated scheduling overheads. The
  * arbitrary constraint below prevents that. It makes EAS usable up to 16 CPUs
- * with per-CPU DVFS and less than 8 capacity states each, for example.
+ * with per-CPU DVFS and less than 8 performance states each, for example.
  */
 #define EM_MAX_COMPLEXITY 2048
 
 extern struct cpufreq_governor schedutil_gov;
 static bool build_perf_domains(const struct cpumask *cpu_map)
 {
-	int i, nr_pd = 0, nr_cs = 0, nr_cpus = cpumask_weight(cpu_map);
+	int i, nr_pd = 0, nr_ps = 0, nr_cpus = cpumask_weight(cpu_map);
 	struct perf_domain *pd = NULL, *tmp;
 	int cpu = cpumask_first(cpu_map);
 	struct root_domain *rd = cpu_rq(cpu)->rd;
@@ -384,15 +384,15 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
 		pd = tmp;
 
 		/*
-		 * Count performance domains and capacity states for the
+		 * Count performance domains and performance states for the
 		 * complexity check.
 		 */
 		nr_pd++;
-		nr_cs += em_pd_nr_cap_states(pd->em_pd);
+		nr_ps += em_pd_nr_perf_states(pd->em_pd);
 	}
 
 	/* Bail out if the Energy Model complexity is too high. */
-	if (nr_pd * (nr_cs + nr_cpus) > EM_MAX_COMPLEXITY) {
+	if (nr_pd * (nr_ps + nr_cpus) > EM_MAX_COMPLEXITY) {
 		WARN(1, "rd %*pbl: Failed to start EAS, EM complexity is too high\n",
 						cpumask_pr_args(cpu_map));
 		goto free;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 02/15] PM / EM: introduce em_dev_register_perf_domain function
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
  2020-05-11 11:18 ` [PATCH v7 01/15] PM / EM: change naming convention from 'capacity' to 'performance' Lukasz Luba
@ 2020-05-11 11:18 ` Lukasz Luba
  2020-05-11 11:51   ` Quentin Perret
  2020-05-11 11:19 ` [PATCH v7 03/15] PM / EM: update callback structure and add device pointer Lukasz Luba
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:18 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Add now function in the Energy Model framework which is going to support
new devices. This function will help in transition and make it smoother.
For now it still checks if the cpumask is a valid pointer, which will be
removed later when the new structures and infrastructure will be ready.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 13 ++++++++++--
 kernel/power/energy_model.c  | 40 ++++++++++++++++++++++++++++++------
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index fe336a9eb5d4..7c048df98447 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_ENERGY_MODEL_H
 #define _LINUX_ENERGY_MODEL_H
 #include <linux/cpumask.h>
+#include <linux/device.h>
 #include <linux/jump_label.h>
 #include <linux/kobject.h>
 #include <linux/rcupdate.h>
@@ -42,7 +43,7 @@ struct em_perf_domain {
 #define em_span_cpus(em) (to_cpumask((em)->cpus))
 
 #ifdef CONFIG_ENERGY_MODEL
-#define EM_CPU_MAX_POWER 0xFFFF
+#define EM_MAX_POWER 0xFFFF
 
 struct em_data_callback {
 	/**
@@ -59,7 +60,7 @@ struct em_data_callback {
 	 * and frequency.
 	 *
 	 * The power is the one of a single CPU in the domain, expressed in
-	 * milli-watts. It is expected to fit in the [0, EM_CPU_MAX_POWER]
+	 * milli-watts. It is expected to fit in the [0, EM_MAX_POWER]
 	 * range.
 	 *
 	 * Return 0 on success.
@@ -71,6 +72,8 @@ struct em_data_callback {
 struct em_perf_domain *em_cpu_get(int cpu);
 int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
 						struct em_data_callback *cb);
+int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
+				struct em_data_callback *cb, cpumask_t *span);
 
 /**
  * em_pd_energy() - Estimates the energy consumed by the CPUs of a perf. domain
@@ -174,6 +177,12 @@ static inline int em_register_perf_domain(cpumask_t *span,
 {
 	return -EINVAL;
 }
+static inline
+int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
+				struct em_data_callback *cb, cpumask_t *span)
+{
+	return -EINVAL;
+}
 static inline struct em_perf_domain *em_cpu_get(int cpu)
 {
 	return NULL;
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 9892d548a0fa..875b163e54ab 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -125,7 +125,7 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
 		 * The power returned by active_state() is expected to be
 		 * positive, in milli-watts and to fit into 16 bits.
 		 */
-		if (!power || power > EM_CPU_MAX_POWER) {
+		if (!power || power > EM_MAX_POWER) {
 			pr_err("pd%d: invalid power: %lu\n", cpu, power);
 			goto free_ps_table;
 		}
@@ -183,10 +183,13 @@ struct em_perf_domain *em_cpu_get(int cpu)
 EXPORT_SYMBOL_GPL(em_cpu_get);
 
 /**
- * em_register_perf_domain() - Register the Energy Model of a performance domain
- * @span	: Mask of CPUs in the performance domain
+ * em_dev_register_perf_domain() - Register the Energy Model (EM) for a device
+ * @dev		: Device for which the EM is to register
  * @nr_states	: Number of performance states to register
  * @cb		: Callback functions providing the data of the Energy Model
+ * @span	: Pointer to cpumask_t, which in case of a CPU device is
+ *		obligatory. It can be taken from i.e. 'policy->cpus'. For other
+ *		type of devices this should be set to NULL.
  *
  * Create Energy Model tables for a performance domain using the callbacks
  * defined in cb.
@@ -196,14 +199,14 @@ EXPORT_SYMBOL_GPL(em_cpu_get);
  *
  * Return 0 on success
  */
-int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
-						struct em_data_callback *cb)
+int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
+				struct em_data_callback *cb, cpumask_t *span)
 {
 	unsigned long cap, prev_cap = 0;
 	struct em_perf_domain *pd;
 	int cpu, ret = 0;
 
-	if (!span || !nr_states || !cb)
+	if (!dev || !span || !nr_states || !cb)
 		return -EINVAL;
 
 	/*
@@ -255,4 +258,29 @@ int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
+
+/**
+ * em_register_perf_domain() - Register the Energy Model of a performance domain
+ * @span	: Mask of CPUs in the performance domain
+ * @nr_states	: Number of capacity states to register
+ * @cb		: Callback functions providing the data of the Energy Model
+ *
+ * Create Energy Model tables for a performance domain using the callbacks
+ * defined in cb.
+ *
+ * If multiple clients register the same performance domain, all but the first
+ * registration will be ignored.
+ *
+ * Return 0 on success
+ */
+int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
+						struct em_data_callback *cb)
+{
+	struct device *cpu_dev;
+
+	cpu_dev = get_cpu_device(cpumask_first(span));
+
+	return em_dev_register_perf_domain(cpu_dev, nr_states, cb, span);
+}
 EXPORT_SYMBOL_GPL(em_register_perf_domain);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 03/15] PM / EM: update callback structure and add device pointer
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
  2020-05-11 11:18 ` [PATCH v7 01/15] PM / EM: change naming convention from 'capacity' to 'performance' Lukasz Luba
  2020-05-11 11:18 ` [PATCH v7 02/15] PM / EM: introduce em_dev_register_perf_domain function Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 11:57   ` Quentin Perret
  2020-05-11 11:19 ` [PATCH v7 04/15] PM / EM: add support for other devices than CPUs in Energy Model Lukasz Luba
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

The Energy Model framework is going to support devices other that CPUs. In
order to make this happen change the callback function and add pointer to
a device as an argument.

Update the related users to use new function and new callback from the
Energy Model.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpufreq/scmi-cpufreq.c | 11 +++--------
 drivers/opp/of.c               |  9 ++-------
 include/linux/energy_model.h   | 15 ++++++++-------
 kernel/power/energy_model.c    |  9 +++++----
 4 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 61623e2ff149..11ee24e06d12 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -103,17 +103,12 @@ scmi_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask)
 }
 
 static int __maybe_unused
-scmi_get_cpu_power(unsigned long *power, unsigned long *KHz, int cpu)
+scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
+		   struct device *cpu_dev)
 {
-	struct device *cpu_dev = get_cpu_device(cpu);
 	unsigned long Hz;
 	int ret, domain;
 
-	if (!cpu_dev) {
-		pr_err("failed to get cpu%d device\n", cpu);
-		return -ENODEV;
-	}
-
 	domain = handle->perf_ops->device_domain_id(cpu_dev);
 	if (domain < 0)
 		return domain;
@@ -200,7 +195,7 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
 
 	policy->fast_switch_possible = true;
 
-	em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
+	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
 
 	return 0;
 
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 9cd8f0adacae..5b75829a915d 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1047,9 +1047,8 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
  * calculation failed because of missing parameters, 0 otherwise.
  */
 static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
-					 int cpu)
+					 struct device *cpu_dev)
 {
-	struct device *cpu_dev;
 	struct dev_pm_opp *opp;
 	struct device_node *np;
 	unsigned long mV, Hz;
@@ -1057,10 +1056,6 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
 	u64 tmp;
 	int ret;
 
-	cpu_dev = get_cpu_device(cpu);
-	if (!cpu_dev)
-		return -ENODEV;
-
 	np = of_node_get(cpu_dev->of_node);
 	if (!np)
 		return -EINVAL;
@@ -1128,6 +1123,6 @@ void dev_pm_opp_of_register_em(struct cpumask *cpus)
 	if (ret || !cap)
 		return;
 
-	em_register_perf_domain(cpus, nr_opp, &em_cb);
+	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, cpus);
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em);
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 7c048df98447..7076cb22b247 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -48,24 +48,25 @@ struct em_perf_domain {
 struct em_data_callback {
 	/**
 	 * active_power() - Provide power at the next performance state of
-	 *		a CPU
+	 *		a device
 	 * @power	: Active power at the performance state in mW
 	 *		(modified)
 	 * @freq	: Frequency at the performance state in kHz
 	 *		(modified)
-	 * @cpu		: CPU for which we do this operation
+	 * @dev		: Device for which we do this operation (can be a CPU)
 	 *
-	 * active_power() must find the lowest performance state of 'cpu' above
+	 * active_power() must find the lowest performance state of 'dev' above
 	 * 'freq' and update 'power' and 'freq' to the matching active power
 	 * and frequency.
 	 *
-	 * The power is the one of a single CPU in the domain, expressed in
-	 * milli-watts. It is expected to fit in the [0, EM_MAX_POWER]
-	 * range.
+	 * In case of CPUs, the power is the one of a single CPU in the domain,
+	 * expressed in milli-watts. It is expected to fit in the
+	 * [0, EM_MAX_POWER] range.
 	 *
 	 * Return 0 on success.
 	 */
-	int (*active_power)(unsigned long *power, unsigned long *freq, int cpu);
+	int (*active_power)(unsigned long *power, unsigned long *freq,
+			    struct device *dev);
 };
 #define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb }
 
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 875b163e54ab..5b8a1566526a 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -78,8 +78,9 @@ core_initcall(em_debug_init);
 #else /* CONFIG_DEBUG_FS */
 static void em_debug_create_pd(struct em_perf_domain *pd, int cpu) {}
 #endif
-static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
-						struct em_data_callback *cb)
+static struct em_perf_domain *
+em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
+	     cpumask_t *span)
 {
 	unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
 	unsigned long power, freq, prev_freq = 0;
@@ -106,7 +107,7 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
 		 * lowest performance state of 'cpu' above 'freq' and updates
 		 * 'power' and 'freq' accordingly.
 		 */
-		ret = cb->active_power(&power, &freq, cpu);
+		ret = cb->active_power(&power, &freq, dev);
 		if (ret) {
 			pr_err("pd%d: invalid perf. state: %d\n", cpu, ret);
 			goto free_ps_table;
@@ -237,7 +238,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 	}
 
 	/* Create the performance domain and add it to the Energy Model. */
-	pd = em_create_pd(span, nr_states, cb);
+	pd = em_create_pd(dev, nr_states, cb, span);
 	if (!pd) {
 		ret = -EINVAL;
 		goto unlock;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 04/15] PM / EM: add support for other devices than CPUs in Energy Model
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (2 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 03/15] PM / EM: update callback structure and add device pointer Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 13:43   ` Quentin Perret
  2020-05-11 11:19 ` [PATCH v7 05/15] PM / EM: remove em_register_perf_domain Lukasz Luba
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Add support for other devices that CPUs. The registration function
does not require a valid cpumask pointer and is ready to handle new
devices. Some of the internal structures has been reorganized in order to
keep consistent view (like removing per_cpu pd pointers). To track usage
of the Energy Model structures, they are protected with kref.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/device.h       |   5 +
 include/linux/energy_model.h |  28 +++--
 kernel/power/energy_model.c  | 237 ++++++++++++++++++++++++-----------
 3 files changed, 187 insertions(+), 83 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index ac8e37cd716a..7023d3ea189b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -13,6 +13,7 @@
 #define _DEVICE_H_
 
 #include <linux/dev_printk.h>
+#include <linux/energy_model.h>
 #include <linux/ioport.h>
 #include <linux/kobject.h>
 #include <linux/klist.h>
@@ -559,6 +560,10 @@ struct device {
 	struct dev_pm_info	power;
 	struct dev_pm_domain	*pm_domain;
 
+#ifdef CONFIG_ENERGY_MODEL
+	struct em_perf_domain	*em_pd;
+#endif
+
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
 	struct irq_domain	*msi_domain;
 #endif
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 7076cb22b247..f5a88486d1c4 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -12,8 +12,10 @@
 
 /**
  * em_perf_state - Performance state of a performance domain
- * @frequency:	The CPU frequency in KHz, for consistency with CPUFreq
- * @power:	The power consumed by 1 CPU at this level, in milli-watts
+ * @frequency:	The frequency in KHz, for consistency with CPUFreq
+ * @power:	The power consumed at this level, in milli-watts (by 1 CPU or
+		by a registered device). It can be a total power: static and
+		dynamic.
  * @cost:	The cost coefficient associated with this level, used during
  *		energy calculation. Equal to: power * max_frequency / frequency
  */
@@ -27,12 +29,15 @@ struct em_perf_state {
  * em_perf_domain - Performance domain
  * @table:		List of performance states, in ascending order
  * @nr_perf_states:	Number of performance states
- * @cpus:		Cpumask covering the CPUs of the domain
+ * @cpus:		Cpumask covering the CPUs of the domain. It's here
+ *			for performance reasons to avoid potential cache
+ *			misses during energy calculations in the scheduler
  *
- * A "performance domain" represents a group of CPUs whose performance is
- * scaled together. All CPUs of a performance domain must have the same
- * micro-architecture. Performance domains often have a 1-to-1 mapping with
- * CPUFreq policies.
+ * In case of CPU device, a "performance domain" represents a group of CPUs
+ * whose performance is scaled together. All CPUs of a performance domain
+ * must have the same micro-architecture. Performance domains often have
+ * a 1-to-1 mapping with CPUFreq policies. In case of other devices the 'cpus'
+ * field is unused.
  */
 struct em_perf_domain {
 	struct em_perf_state *table;
@@ -71,10 +76,12 @@ struct em_data_callback {
 #define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb }
 
 struct em_perf_domain *em_cpu_get(int cpu);
+struct em_perf_domain *em_pd_get(struct device *dev);
 int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
 						struct em_data_callback *cb);
 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 				struct em_data_callback *cb, cpumask_t *span);
+void em_dev_unregister_perf_domain(struct device *dev);
 
 /**
  * em_pd_energy() - Estimates the energy consumed by the CPUs of a perf. domain
@@ -184,10 +191,17 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 {
 	return -EINVAL;
 }
+static inline void em_dev_unregister_perf_domain(struct device *dev)
+{
+}
 static inline struct em_perf_domain *em_cpu_get(int cpu)
 {
 	return NULL;
 }
+static inline struct em_perf_domain *em_pd_get(struct device *dev)
+{
+	return NULL;
+}
 static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
 			unsigned long max_util, unsigned long sum_util)
 {
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 5b8a1566526a..9cc7f2973600 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -2,8 +2,9 @@
 /*
  * Energy Model of CPUs
  *
- * Copyright (c) 2018, Arm ltd.
+ * Copyright (c) 2018-2020, Arm ltd.
  * Written by: Quentin Perret, Arm ltd.
+ * Improvements provided by: Lukasz Luba, Arm ltd.
  */
 
 #define pr_fmt(fmt) "energy_model: " fmt
@@ -15,15 +16,17 @@
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
 
-/* Mapping of each CPU to the performance domain to which it belongs. */
-static DEFINE_PER_CPU(struct em_perf_domain *, em_data);
-
 /*
  * Mutex serializing the registrations of performance domains and letting
  * callbacks defined by drivers sleep.
  */
 static DEFINE_MUTEX(em_pd_mutex);
 
+static bool _is_cpu_device(struct device *dev)
+{
+	return (dev->bus == &cpu_subsys);
+}
+
 #ifdef CONFIG_DEBUG_FS
 static struct dentry *rootdir;
 
@@ -49,22 +52,30 @@ static int em_debug_cpus_show(struct seq_file *s, void *unused)
 }
 DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
 
-static void em_debug_create_pd(struct em_perf_domain *pd, int cpu)
+static void em_debug_create_pd(struct device *dev)
 {
 	struct dentry *d;
-	char name[8];
 	int i;
 
-	snprintf(name, sizeof(name), "pd%d", cpu);
-
 	/* Create the directory of the performance domain */
-	d = debugfs_create_dir(name, rootdir);
+	d = debugfs_create_dir(dev_name(dev), rootdir);
 
-	debugfs_create_file("cpus", 0444, d, pd->cpus, &em_debug_cpus_fops);
+	if (_is_cpu_device(dev))
+		debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
+				    &em_debug_cpus_fops);
 
 	/* Create a sub-directory for each performance state */
-	for (i = 0; i < pd->nr_perf_states; i++)
-		em_debug_create_ps(&pd->table[i], d);
+	for (i = 0; i < dev->em_pd->nr_perf_states; i++)
+		em_debug_create_ps(&dev->em_pd->table[i], d);
+
+}
+
+static void em_debug_remove_pd(struct device *dev)
+{
+	struct dentry *debug_dir;
+
+	debug_dir = debugfs_lookup(dev_name(dev), rootdir);
+	debugfs_remove_recursive(debug_dir);
 }
 
 static int __init em_debug_init(void)
@@ -76,40 +87,34 @@ static int __init em_debug_init(void)
 }
 core_initcall(em_debug_init);
 #else /* CONFIG_DEBUG_FS */
-static void em_debug_create_pd(struct em_perf_domain *pd, int cpu) {}
+static void em_debug_create_pd(struct device *dev) {}
+static void em_debug_remove_pd(struct device *dev) {}
 #endif
-static struct em_perf_domain *
-em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
-	     cpumask_t *span)
+
+static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
+				int nr_states, struct em_data_callback *cb)
 {
 	unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
 	unsigned long power, freq, prev_freq = 0;
-	int i, ret, cpu = cpumask_first(span);
 	struct em_perf_state *table;
-	struct em_perf_domain *pd;
+	int i, ret;
 	u64 fmax;
 
-	if (!cb->active_power)
-		return NULL;
-
-	pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
-	if (!pd)
-		return NULL;
-
 	table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL);
 	if (!table)
-		goto free_pd;
+		return -ENOMEM;
 
 	/* Build the list of performance states for this performance domain */
 	for (i = 0, freq = 0; i < nr_states; i++, freq++) {
 		/*
 		 * active_power() is a driver callback which ceils 'freq' to
-		 * lowest performance state of 'cpu' above 'freq' and updates
+		 * lowest performance state of 'dev' above 'freq' and updates
 		 * 'power' and 'freq' accordingly.
 		 */
 		ret = cb->active_power(&power, &freq, dev);
 		if (ret) {
-			pr_err("pd%d: invalid perf. state: %d\n", cpu, ret);
+			dev_err(dev, "EM: invalid perf. state: %d\n",
+				ret);
 			goto free_ps_table;
 		}
 
@@ -118,7 +123,8 @@ em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
 		 * higher performance states.
 		 */
 		if (freq <= prev_freq) {
-			pr_err("pd%d: non-increasing freq: %lu\n", cpu, freq);
+			dev_err(dev, "EM: non-increasing freq: %lu\n",
+				freq);
 			goto free_ps_table;
 		}
 
@@ -127,7 +133,8 @@ em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
 		 * positive, in milli-watts and to fit into 16 bits.
 		 */
 		if (!power || power > EM_MAX_POWER) {
-			pr_err("pd%d: invalid power: %lu\n", cpu, power);
+			dev_err(dev, "EM: invalid power: %lu\n",
+				power);
 			goto free_ps_table;
 		}
 
@@ -142,8 +149,8 @@ em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
 		 */
 		opp_eff = freq / power;
 		if (opp_eff >= prev_opp_eff)
-			pr_warn("pd%d: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
-					cpu, i, i - 1);
+			dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
+					i, i - 1);
 		prev_opp_eff = opp_eff;
 	}
 
@@ -156,30 +163,82 @@ em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
 
 	pd->table = table;
 	pd->nr_perf_states = nr_states;
-	cpumask_copy(to_cpumask(pd->cpus), span);
-
-	em_debug_create_pd(pd, cpu);
 
-	return pd;
+	return 0;
 
 free_ps_table:
 	kfree(table);
-free_pd:
-	kfree(pd);
+	return -EINVAL;
+}
+
+static int em_create_pd(struct device *dev, int nr_states,
+			struct em_data_callback *cb, cpumask_t *cpus)
+{
+	struct em_perf_domain *pd;
+	struct device *cpu_dev;
+	int cpu, ret;
+
+	if (_is_cpu_device(dev)) {
+		pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
+		if (!pd)
+			return -ENOMEM;
+
+		cpumask_copy(em_span_cpus(pd), cpus);
+	} else {
+		pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+		if (!pd)
+			return -ENOMEM;
+	}
+
+	ret = em_create_perf_table(dev, pd, nr_states, cb);
+	if (ret) {
+		kfree(pd);
+		return ret;
+	}
 
-	return NULL;
+	if (_is_cpu_device(dev))
+		for_each_cpu(cpu, cpus) {
+			cpu_dev = get_cpu_device(cpu);
+			cpu_dev->em_pd = pd;
+		}
+	else
+		dev->em_pd = pd;
+
+	return 0;
 }
 
+/**
+ * em_pd_get() - Return the performance domain for a device
+ * @dev : Device to find the performance domain for
+ *
+ * Returns the performance domain to which 'dev' belongs, or NULL if it doesn't
+ * exist.
+ */
+struct em_perf_domain *em_pd_get(struct device *dev)
+{
+	if (IS_ERR_OR_NULL(dev))
+		return NULL;
+
+	return dev->em_pd;
+}
+EXPORT_SYMBOL_GPL(em_pd_get);
+
 /**
  * em_cpu_get() - Return the performance domain for a CPU
  * @cpu : CPU to find the performance domain for
  *
- * Return: the performance domain to which 'cpu' belongs, or NULL if it doesn't
+ * Returns the performance domain to which 'cpu' belongs, or NULL if it doesn't
  * exist.
  */
 struct em_perf_domain *em_cpu_get(int cpu)
 {
-	return READ_ONCE(per_cpu(em_data, cpu));
+	struct device *cpu_dev;
+
+	cpu_dev = get_cpu_device(cpu);
+	if (!cpu_dev)
+		return NULL;
+
+	return em_pd_get(cpu_dev);
 }
 EXPORT_SYMBOL_GPL(em_cpu_get);
 
@@ -188,7 +247,7 @@ EXPORT_SYMBOL_GPL(em_cpu_get);
  * @dev		: Device for which the EM is to register
  * @nr_states	: Number of performance states to register
  * @cb		: Callback functions providing the data of the Energy Model
- * @span	: Pointer to cpumask_t, which in case of a CPU device is
+ * @cpus	: Pointer to cpumask_t, which in case of a CPU device is
  *		obligatory. It can be taken from i.e. 'policy->cpus'. For other
  *		type of devices this should be set to NULL.
  *
@@ -201,13 +260,12 @@ EXPORT_SYMBOL_GPL(em_cpu_get);
  * Return 0 on success
  */
 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
-				struct em_data_callback *cb, cpumask_t *span)
+				struct em_data_callback *cb, cpumask_t *cpus)
 {
 	unsigned long cap, prev_cap = 0;
-	struct em_perf_domain *pd;
-	int cpu, ret = 0;
+	int cpu, ret;
 
-	if (!dev || !span || !nr_states || !cb)
+	if (!dev || !nr_states || !cb)
 		return -EINVAL;
 
 	/*
@@ -216,47 +274,50 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 	 */
 	mutex_lock(&em_pd_mutex);
 
-	for_each_cpu(cpu, span) {
-		/* Make sure we don't register again an existing domain. */
-		if (READ_ONCE(per_cpu(em_data, cpu))) {
-			ret = -EEXIST;
-			goto unlock;
-		}
+	if (dev->em_pd) {
+		ret = -EEXIST;
+		goto unlock;
+	}
 
-		/*
-		 * All CPUs of a domain must have the same micro-architecture
-		 * since they all share the same table.
-		 */
-		cap = arch_scale_cpu_capacity(cpu);
-		if (prev_cap && prev_cap != cap) {
-			pr_err("CPUs of %*pbl must have the same capacity\n",
-							cpumask_pr_args(span));
+	if (_is_cpu_device(dev)) {
+		if (!cpus) {
+			dev_err(dev, "EM: invalid CPU mask\n");
 			ret = -EINVAL;
 			goto unlock;
 		}
-		prev_cap = cap;
+
+		for_each_cpu(cpu, cpus) {
+			if (em_cpu_get(cpu)) {
+				dev_err(dev, "EM: exists for CPU%d\n", cpu);
+				ret = -EEXIST;
+				goto unlock;
+			}
+			/*
+			 * All CPUs of a domain must have the same
+			 * micro-architecture since they all share the same
+			 * table.
+			 */
+			cap = arch_scale_cpu_capacity(cpu);
+			if (prev_cap && prev_cap != cap) {
+				dev_err(dev, "EM: CPUs of %*pbl must have the same capacity\n",
+					cpumask_pr_args(cpus));
+
+				ret = -EINVAL;
+				goto unlock;
+			}
+			prev_cap = cap;
+		}
 	}
 
-	/* Create the performance domain and add it to the Energy Model. */
-	pd = em_create_pd(dev, nr_states, cb, span);
-	if (!pd) {
-		ret = -EINVAL;
+	ret = em_create_pd(dev, nr_states, cb, cpus);
+	if (ret)
 		goto unlock;
-	}
 
-	for_each_cpu(cpu, span) {
-		/*
-		 * The per-cpu array can be read concurrently from em_cpu_get().
-		 * The barrier enforces the ordering needed to make sure readers
-		 * can only access well formed em_perf_domain structs.
-		 */
-		smp_store_release(per_cpu_ptr(&em_data, cpu), pd);
-	}
+	em_debug_create_pd(dev);
+	dev_info(dev, "EM: created perf domain\n");
 
-	pr_debug("Created perf domain %*pbl\n", cpumask_pr_args(span));
 unlock:
 	mutex_unlock(&em_pd_mutex);
-
 	return ret;
 }
 EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
@@ -285,3 +346,27 @@ int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
 	return em_dev_register_perf_domain(cpu_dev, nr_states, cb, span);
 }
 EXPORT_SYMBOL_GPL(em_register_perf_domain);
+
+/**
+ * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
+ * @dev		: Device for which the EM is registered
+ *
+ * Try to unregister the EM for the specified device (but not a CPU).
+ */
+void em_dev_unregister_perf_domain(struct device *dev)
+{
+	if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
+		return;
+
+	if (_is_cpu_device(dev))
+		return;
+
+	mutex_lock(&em_pd_mutex);
+	em_debug_remove_pd(dev);
+
+	kfree(dev->em_pd->table);
+	kfree(dev->em_pd);
+	dev->em_pd = NULL;
+	mutex_unlock(&em_pd_mutex);
+}
+EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 05/15] PM / EM: remove em_register_perf_domain
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (3 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 04/15] PM / EM: add support for other devices than CPUs in Energy Model Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 13:44   ` Quentin Perret
  2020-05-11 11:19 ` [PATCH v7 06/15] PM / EM: change name of em_pd_energy to em_cpu_energy Lukasz Luba
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Remove old function em_register_perf_domain which is no longer needed.
There is em_dev_register_perf_domain that covers old use cases and new as
well.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h |  7 -------
 kernel/power/energy_model.c  | 25 -------------------------
 2 files changed, 32 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index f5a88486d1c4..6b56ae6c6a89 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -77,8 +77,6 @@ struct em_data_callback {
 
 struct em_perf_domain *em_cpu_get(int cpu);
 struct em_perf_domain *em_pd_get(struct device *dev);
-int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
-						struct em_data_callback *cb);
 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 				struct em_data_callback *cb, cpumask_t *span);
 void em_dev_unregister_perf_domain(struct device *dev);
@@ -180,11 +178,6 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
 struct em_data_callback {};
 #define EM_DATA_CB(_active_power_cb) { }
 
-static inline int em_register_perf_domain(cpumask_t *span,
-			unsigned int nr_states, struct em_data_callback *cb)
-{
-	return -EINVAL;
-}
 static inline
 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 				struct em_data_callback *cb, cpumask_t *span)
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 9cc7f2973600..3c5c579ad2c2 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -322,31 +322,6 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 }
 EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
 
-/**
- * em_register_perf_domain() - Register the Energy Model of a performance domain
- * @span	: Mask of CPUs in the performance domain
- * @nr_states	: Number of capacity states to register
- * @cb		: Callback functions providing the data of the Energy Model
- *
- * Create Energy Model tables for a performance domain using the callbacks
- * defined in cb.
- *
- * If multiple clients register the same performance domain, all but the first
- * registration will be ignored.
- *
- * Return 0 on success
- */
-int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
-						struct em_data_callback *cb)
-{
-	struct device *cpu_dev;
-
-	cpu_dev = get_cpu_device(cpumask_first(span));
-
-	return em_dev_register_perf_domain(cpu_dev, nr_states, cb, span);
-}
-EXPORT_SYMBOL_GPL(em_register_perf_domain);
-
 /**
  * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
  * @dev		: Device for which the EM is registered
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 06/15] PM / EM: change name of em_pd_energy to em_cpu_energy
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (4 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 05/15] PM / EM: remove em_register_perf_domain Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 13:45   ` Quentin Perret
  2020-05-11 11:19 ` [PATCH v7 07/15] Documentation: power: update Energy Model description Lukasz Luba
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Energy Model framework supports now other devices than CPUs. Refactor some
of the functions in order to prevent wrong usage. The old function
em_pd_energy has to generic name. It must not be used without proper
cpumask pointer, which is possible only for CPU devices. Thus, rename it
and add proper description to warn of potential wrong usage for other
devices.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 11 ++++++++---
 kernel/sched/fair.c          |  2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 6b56ae6c6a89..c92e3ee15245 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -82,15 +82,20 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 void em_dev_unregister_perf_domain(struct device *dev);
 
 /**
- * em_pd_energy() - Estimates the energy consumed by the CPUs of a perf. domain
+ * em_cpu_energy() - Estimates the energy consumed by the CPUs of a
+		performance domain
  * @pd		: performance domain for which energy has to be estimated
  * @max_util	: highest utilization among CPUs of the domain
  * @sum_util	: sum of the utilization of all CPUs in the domain
  *
+ * This function must be used only for CPU devices. There is no validation,
+ * i.e. if the EM is a CPU type and has cpumask allocated. It is called from
+ * the scheduler code quite frequently and that is why there is not checks.
+ *
  * Return: the sum of the energy consumed by the CPUs of the domain assuming
  * a capacity state satisfying the max utilization of the domain.
  */
-static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
+static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 				unsigned long max_util, unsigned long sum_util)
 {
 	unsigned long freq, scale_cpu;
@@ -195,7 +200,7 @@ static inline struct em_perf_domain *em_pd_get(struct device *dev)
 {
 	return NULL;
 }
-static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
+static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 			unsigned long max_util, unsigned long sum_util)
 {
 	return 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b3bb4d6e49c3..7700e97af52f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6468,7 +6468,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
 		max_util = max(max_util, cpu_util);
 	}
 
-	return em_pd_energy(pd->em_pd, max_util, sum_util);
+	return em_cpu_energy(pd->em_pd, max_util, sum_util);
 }
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 07/15] Documentation: power: update Energy Model description
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (5 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 06/15] PM / EM: change name of em_pd_energy to em_cpu_energy Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 13:48   ` Quentin Perret
  2020-05-11 11:19 ` [PATCH v7 08/15] OPP: refactor dev_pm_opp_of_register_em() and update related drivers Lukasz Luba
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

The Energy Model framework supports also other devices than CPUs. Update
related information and add description for the new usage.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 Documentation/power/energy-model.rst | 135 +++++++++++++++------------
 1 file changed, 75 insertions(+), 60 deletions(-)

diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index 90a345d57ae9..a6fb986abe3c 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -1,15 +1,17 @@
-====================
-Energy Model of CPUs
-====================
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Energy Model of devices
+=======================
 
 1. Overview
 -----------
 
 The Energy Model (EM) framework serves as an interface between drivers knowing
-the power consumed by CPUs at various performance levels, and the kernel
+the power consumed by devices at various performance levels, and the kernel
 subsystems willing to use that information to make energy-aware decisions.
 
-The source of the information about the power consumed by CPUs can vary greatly
+The source of the information about the power consumed by devices can vary greatly
 from one platform to another. These power costs can be estimated using
 devicetree data in some cases. In others, the firmware will know better.
 Alternatively, userspace might be best positioned. And so on. In order to avoid
@@ -25,7 +27,7 @@ framework, and interested clients reading the data from it::
        +---------------+  +-----------------+  +---------------+
        | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
        +---------------+  +-----------------+  +---------------+
-               |                   | em_pd_energy()    |
+               |                   | em_cpu_energy()   |
                |                   | em_cpu_get()      |
                +---------+         |         +---------+
                          |         |         |
@@ -35,7 +37,7 @@ framework, and interested clients reading the data from it::
                         |     Framework       |
                         +---------------------+
                            ^       ^       ^
-                           |       |       | em_register_perf_domain()
+                           |       |       | em_dev_register_perf_domain()
                 +----------+       |       +---------+
                 |                  |                 |
         +---------------+  +---------------+  +--------------+
@@ -47,12 +49,12 @@ framework, and interested clients reading the data from it::
         | Device Tree  |   |   Firmware    |  |      ?       |
         +--------------+   +---------------+  +--------------+
 
-The EM framework manages power cost tables per 'performance domain' in the
-system. A performance domain is a group of CPUs whose performance is scaled
-together. Performance domains generally have a 1-to-1 mapping with CPUFreq
-policies. All CPUs in a performance domain are required to have the same
-micro-architecture. CPUs in different performance domains can have different
-micro-architectures.
+In case of CPU devices the EM framework manages power cost tables per
+'performance domain' in the system. A performance domain is a group of CPUs
+whose performance is scaled together. Performance domains generally have a
+1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
+required to have the same micro-architecture. CPUs in different performance
+domains can have different micro-architectures.
 
 
 2. Core APIs
@@ -70,14 +72,16 @@ CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
 Drivers are expected to register performance domains into the EM framework by
 calling the following API::
 
-  int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
-			      struct em_data_callback *cb);
+  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
+		struct em_data_callback *cb, cpumask_t *cpus);
 
-Drivers must specify the CPUs of the performance domains using the cpumask
-argument, and provide a callback function returning <frequency, power> tuples
-for each capacity state. The callback function provided by the driver is free
+Drivers must provide a callback function returning <frequency, power> tuples
+for each performance state. The callback function provided by the driver is free
 to fetch data from any relevant location (DT, firmware, ...), and by any mean
-deemed necessary. See Section 3. for an example of driver implementing this
+deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
+performance domains using cpumask. For other devices than CPUs the last
+argument must be set to NULL.
+See Section 3. for an example of driver implementing this
 callback, and kernel/power/energy_model.c for further documentation on this
 API.
 
@@ -85,13 +89,20 @@ API.
 2.3 Accessing performance domains
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
+There are two API functions which provide the access to the energy model:
+em_cpu_get() which takes CPU id as an argument and em_pd_get() with device
+pointer as an argument. It depends on the subsystem which interface it is
+going to use, but in case of CPU devices both functions return the same
+performance domain.
+
 Subsystems interested in the energy model of a CPU can retrieve it using the
 em_cpu_get() API. The energy model tables are allocated once upon creation of
 the performance domains, and kept in memory untouched.
 
 The energy consumed by a performance domain can be estimated using the
-em_pd_energy() API. The estimation is performed assuming that the schedutil
-CPUfreq governor is in use.
+em_cpu_energy() API. The estimation is performed assuming that the schedutil
+CPUfreq governor is in use in case of CPU device. Currently this calculation is
+not provided for other type of devices.
 
 More details about the above APIs can be found in include/linux/energy_model.h.
 
@@ -106,42 +117,46 @@ EM framework::
 
   -> drivers/cpufreq/foo_cpufreq.c
 
-  01	static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
-  02	{
-  03		long freq, power;
-  04
-  05		/* Use the 'foo' protocol to ceil the frequency */
-  06		freq = foo_get_freq_ceil(cpu, *KHz);
-  07		if (freq < 0);
-  08			return freq;
-  09
-  10		/* Estimate the power cost for the CPU at the relevant freq. */
-  11		power = foo_estimate_power(cpu, freq);
-  12		if (power < 0);
-  13			return power;
-  14
-  15		/* Return the values to the EM framework */
-  16		*mW = power;
-  17		*KHz = freq;
-  18
-  19		return 0;
-  20	}
-  21
-  22	static int foo_cpufreq_init(struct cpufreq_policy *policy)
-  23	{
-  24		struct em_data_callback em_cb = EM_DATA_CB(est_power);
-  25		int nr_opp, ret;
-  26
-  27		/* Do the actual CPUFreq init work ... */
-  28		ret = do_foo_cpufreq_init(policy);
-  29		if (ret)
-  30			return ret;
-  31
-  32		/* Find the number of OPPs for this policy */
-  33		nr_opp = foo_get_nr_opp(policy);
-  34
-  35		/* And register the new performance domain */
-  36		em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
-  37
-  38	        return 0;
-  39	}
+  01	static int est_power(unsigned long *mW, unsigned long *KHz,
+  02			struct device *dev)
+  03	{
+  04		long freq, power;
+  05
+  06		/* Use the 'foo' protocol to ceil the frequency */
+  07		freq = foo_get_freq_ceil(dev, *KHz);
+  08		if (freq < 0);
+  09			return freq;
+  10
+  11		/* Estimate the power cost for the dev at the relevant freq. */
+  12		power = foo_estimate_power(dev, freq);
+  13		if (power < 0);
+  14			return power;
+  15
+  16		/* Return the values to the EM framework */
+  17		*mW = power;
+  18		*KHz = freq;
+  19
+  20		return 0;
+  21	}
+  22
+  23	static int foo_cpufreq_init(struct cpufreq_policy *policy)
+  24	{
+  25		struct em_data_callback em_cb = EM_DATA_CB(est_power);
+  26		struct device *cpu_dev;
+  27		int nr_opp, ret;
+  28
+  29		cpu_dev = get_cpu_device(cpumask_first(policy->cpus));
+  30
+  31     	/* Do the actual CPUFreq init work ... */
+  32     	ret = do_foo_cpufreq_init(policy);
+  33     	if (ret)
+  34     		return ret;
+  35
+  36     	/* Find the number of OPPs for this policy */
+  37     	nr_opp = foo_get_nr_opp(policy);
+  38
+  39     	/* And register the new performance domain */
+  40     	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
+  41
+  42	        return 0;
+  43	}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 08/15] OPP: refactor dev_pm_opp_of_register_em() and update related drivers
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (6 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 07/15] Documentation: power: update Energy Model description Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 11:19 ` [PATCH v7 09/15] thermal: devfreq_cooling: change tracing function and arguments Lukasz Luba
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

The Energy Model framework supports not only CPU devices. Drop the CPU
specific interface with cpumask and add struct device. Add also a return
value, user might use it. This new interface provides easy way to create
a simple Energy Model, which then might be used by e.g. thermal subsystem.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpufreq/cpufreq-dt.c           |  2 +-
 drivers/cpufreq/imx6q-cpufreq.c        |  2 +-
 drivers/cpufreq/mediatek-cpufreq.c     |  2 +-
 drivers/cpufreq/omap-cpufreq.c         |  2 +-
 drivers/cpufreq/qcom-cpufreq-hw.c      |  2 +-
 drivers/cpufreq/scpi-cpufreq.c         |  2 +-
 drivers/cpufreq/vexpress-spc-cpufreq.c |  2 +-
 drivers/opp/of.c                       | 71 ++++++++++++++++----------
 include/linux/pm_opp.h                 | 15 +++++-
 9 files changed, 65 insertions(+), 35 deletions(-)

diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index 26fe8dfb9ce6..f9f03fd49b83 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -275,7 +275,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
 	policy->cpuinfo.transition_latency = transition_latency;
 	policy->dvfs_possible_from_any_cpu = true;
 
-	dev_pm_opp_of_register_em(policy->cpus);
+	dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
 
 	return 0;
 
diff --git a/drivers/cpufreq/imx6q-cpufreq.c b/drivers/cpufreq/imx6q-cpufreq.c
index fdb2ffffbd15..ef7b34c1fd2b 100644
--- a/drivers/cpufreq/imx6q-cpufreq.c
+++ b/drivers/cpufreq/imx6q-cpufreq.c
@@ -193,7 +193,7 @@ static int imx6q_cpufreq_init(struct cpufreq_policy *policy)
 	policy->clk = clks[ARM].clk;
 	cpufreq_generic_init(policy, freq_table, transition_latency);
 	policy->suspend_freq = max_freq;
-	dev_pm_opp_of_register_em(policy->cpus);
+	dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
 
 	return 0;
 }
diff --git a/drivers/cpufreq/mediatek-cpufreq.c b/drivers/cpufreq/mediatek-cpufreq.c
index 0c98dd08273d..7d1212c9b7c8 100644
--- a/drivers/cpufreq/mediatek-cpufreq.c
+++ b/drivers/cpufreq/mediatek-cpufreq.c
@@ -448,7 +448,7 @@ static int mtk_cpufreq_init(struct cpufreq_policy *policy)
 	policy->driver_data = info;
 	policy->clk = info->cpu_clk;
 
-	dev_pm_opp_of_register_em(policy->cpus);
+	dev_pm_opp_of_register_em(info->cpu_dev, policy->cpus);
 
 	return 0;
 }
diff --git a/drivers/cpufreq/omap-cpufreq.c b/drivers/cpufreq/omap-cpufreq.c
index 8d14b42a8c6f..3694bb030df3 100644
--- a/drivers/cpufreq/omap-cpufreq.c
+++ b/drivers/cpufreq/omap-cpufreq.c
@@ -131,7 +131,7 @@ static int omap_cpu_init(struct cpufreq_policy *policy)
 
 	/* FIXME: what's the actual transition time? */
 	cpufreq_generic_init(policy, freq_table, 300 * 1000);
-	dev_pm_opp_of_register_em(policy->cpus);
+	dev_pm_opp_of_register_em(mpu_dev, policy->cpus);
 
 	return 0;
 }
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index fc92a8842e25..0a04b6f03b9a 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -238,7 +238,7 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
 		goto error;
 	}
 
-	dev_pm_opp_of_register_em(policy->cpus);
+	dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
 
 	policy->fast_switch_possible = true;
 
diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c
index 20d1f85d5f5a..b0f5388b8854 100644
--- a/drivers/cpufreq/scpi-cpufreq.c
+++ b/drivers/cpufreq/scpi-cpufreq.c
@@ -167,7 +167,7 @@ static int scpi_cpufreq_init(struct cpufreq_policy *policy)
 
 	policy->fast_switch_possible = false;
 
-	dev_pm_opp_of_register_em(policy->cpus);
+	dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
 
 	return 0;
 
diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c
index 83c85d3d67e3..4e8b1dee7c9a 100644
--- a/drivers/cpufreq/vexpress-spc-cpufreq.c
+++ b/drivers/cpufreq/vexpress-spc-cpufreq.c
@@ -450,7 +450,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy)
 	policy->freq_table = freq_table[cur_cluster];
 	policy->cpuinfo.transition_latency = 1000000; /* 1 ms */
 
-	dev_pm_opp_of_register_em(policy->cpus);
+	dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
 
 	if (is_bL_switching_enabled())
 		per_cpu(cpu_last_req_freq, policy->cpu) =
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 5b75829a915d..4500ce46d476 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1036,18 +1036,18 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
 
 /*
  * Callback function provided to the Energy Model framework upon registration.
- * This computes the power estimated by @CPU at @kHz if it is the frequency
+ * This computes the power estimated by @dev at @kHz if it is the frequency
  * of an existing OPP, or at the frequency of the first OPP above @kHz otherwise
  * (see dev_pm_opp_find_freq_ceil()). This function updates @kHz to the ceiled
  * frequency and @mW to the associated power. The power is estimated as
- * P = C * V^2 * f with C being the CPU's capacitance and V and f respectively
- * the voltage and frequency of the OPP.
+ * P = C * V^2 * f with C being the device's capacitance and V and f
+ * respectively the voltage and frequency of the OPP.
  *
- * Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power
- * calculation failed because of missing parameters, 0 otherwise.
+ * Returns -EINVAL if the power calculation failed because of missing
+ * parameters, 0 otherwise.
  */
-static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
-					 struct device *cpu_dev)
+static int __maybe_unused _get_power(unsigned long *mW, unsigned long *kHz,
+				     struct device *dev)
 {
 	struct dev_pm_opp *opp;
 	struct device_node *np;
@@ -1056,7 +1056,7 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
 	u64 tmp;
 	int ret;
 
-	np = of_node_get(cpu_dev->of_node);
+	np = of_node_get(dev->of_node);
 	if (!np)
 		return -EINVAL;
 
@@ -1066,7 +1066,7 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
 		return -EINVAL;
 
 	Hz = *kHz * 1000;
-	opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz);
+	opp = dev_pm_opp_find_freq_ceil(dev, &Hz);
 	if (IS_ERR(opp))
 		return -EINVAL;
 
@@ -1086,30 +1086,38 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
 
 /**
  * dev_pm_opp_of_register_em() - Attempt to register an Energy Model
- * @cpus	: CPUs for which an Energy Model has to be registered
+ * @dev		: Device for which an Energy Model has to be registered
+ * @cpus	: CPUs for which an Energy Model has to be registered. For
+ *		other type of devices it should be set to NULL.
  *
  * This checks whether the "dynamic-power-coefficient" devicetree property has
  * been specified, and tries to register an Energy Model with it if it has.
+ * Having this property means the voltages are known for OPPs and the EM
+ * might be calculated.
  */
-void dev_pm_opp_of_register_em(struct cpumask *cpus)
+int dev_pm_opp_of_register_em(struct device *dev, struct cpumask *cpus)
 {
-	struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power);
-	int ret, nr_opp, cpu = cpumask_first(cpus);
-	struct device *cpu_dev;
+	struct em_data_callback em_cb = EM_DATA_CB(_get_power);
 	struct device_node *np;
+	int ret, nr_opp;
 	u32 cap;
 
-	cpu_dev = get_cpu_device(cpu);
-	if (!cpu_dev)
-		return;
+	if (IS_ERR_OR_NULL(dev)) {
+		ret = -EINVAL;
+		goto failed;
+	}
 
-	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
-	if (nr_opp <= 0)
-		return;
+	nr_opp = dev_pm_opp_get_opp_count(dev);
+	if (nr_opp <= 0) {
+		ret = -EINVAL;
+		goto failed;
+	}
 
-	np = of_node_get(cpu_dev->of_node);
-	if (!np)
-		return;
+	np = of_node_get(dev->of_node);
+	if (!np) {
+		ret = -EINVAL;
+		goto failed;
+	}
 
 	/*
 	 * Register an EM only if the 'dynamic-power-coefficient' property is
@@ -1120,9 +1128,20 @@ void dev_pm_opp_of_register_em(struct cpumask *cpus)
 	 */
 	ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap);
 	of_node_put(np);
-	if (ret || !cap)
-		return;
+	if (ret || !cap) {
+		dev_dbg(dev, "Couldn't find proper 'dynamic-power-coefficient' in DT\n");
+		ret = -EINVAL;
+		goto failed;
+	}
 
-	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, cpus);
+	ret = em_dev_register_perf_domain(dev, nr_opp, &em_cb, cpus);
+	if (ret)
+		goto failed;
+
+	return 0;
+
+failed:
+	dev_dbg(dev, "Couldn't register Energy Model %d\n", ret);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em);
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index 747861816f4f..8df8b82054e5 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -11,6 +11,7 @@
 #ifndef __LINUX_OPP_H__
 #define __LINUX_OPP_H__
 
+#include <linux/energy_model.h>
 #include <linux/err.h>
 #include <linux/notifier.h>
 
@@ -360,7 +361,11 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpuma
 struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev);
 struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp);
 int of_get_required_opp_performance_state(struct device_node *np, int index);
-void dev_pm_opp_of_register_em(struct cpumask *cpus);
+int dev_pm_opp_of_register_em(struct device *dev, struct cpumask *cpus);
+static inline void dev_pm_opp_of_unregister_em(struct device *dev)
+{
+	em_dev_unregister_perf_domain(dev);
+}
 #else
 static inline int dev_pm_opp_of_add_table(struct device *dev)
 {
@@ -400,7 +405,13 @@ static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp)
 	return NULL;
 }
 
-static inline void dev_pm_opp_of_register_em(struct cpumask *cpus)
+static inline int dev_pm_opp_of_register_em(struct device *dev,
+					    struct cpumask *cpus)
+{
+	return -ENOTSUPP;
+}
+
+static inline void dev_pm_opp_of_unregister_em(struct device *dev)
 {
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 09/15] thermal: devfreq_cooling: change tracing function and arguments
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (7 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 08/15] OPP: refactor dev_pm_opp_of_register_em() and update related drivers Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 11:19 ` [PATCH v7 10/15] thermal: devfreq_cooling: get device load and frequency directly Lukasz Luba
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Prepare for deleting the static and dynamic power calculation and clean
the trace function. These two fields are going to be removed in the next
changes.

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> # for tracing code
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/devfreq_cooling.c |  3 +--
 include/trace/events/thermal.h    | 19 +++++++++----------
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index f7f32e98331b..52694d4bd819 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -286,8 +286,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 		*power = dyn_power + static_power;
 	}
 
-	trace_thermal_power_devfreq_get_power(cdev, status, freq, dyn_power,
-					      static_power, *power);
+	trace_thermal_power_devfreq_get_power(cdev, status, freq, *power);
 
 	return 0;
 fail:
diff --git a/include/trace/events/thermal.h b/include/trace/events/thermal.h
index 135e5421f003..8a5f04888abd 100644
--- a/include/trace/events/thermal.h
+++ b/include/trace/events/thermal.h
@@ -153,31 +153,30 @@ TRACE_EVENT(thermal_power_cpu_limit,
 TRACE_EVENT(thermal_power_devfreq_get_power,
 	TP_PROTO(struct thermal_cooling_device *cdev,
 		 struct devfreq_dev_status *status, unsigned long freq,
-		u32 dynamic_power, u32 static_power, u32 power),
+		u32 power),
 
-	TP_ARGS(cdev, status,  freq, dynamic_power, static_power, power),
+	TP_ARGS(cdev, status,  freq, power),
 
 	TP_STRUCT__entry(
 		__string(type,         cdev->type    )
 		__field(unsigned long, freq          )
-		__field(u32,           load          )
-		__field(u32,           dynamic_power )
-		__field(u32,           static_power  )
+		__field(u32,           busy_time)
+		__field(u32,           total_time)
 		__field(u32,           power)
 	),
 
 	TP_fast_assign(
 		__assign_str(type, cdev->type);
 		__entry->freq = freq;
-		__entry->load = (100 * status->busy_time) / status->total_time;
-		__entry->dynamic_power = dynamic_power;
-		__entry->static_power = static_power;
+		__entry->busy_time = status->busy_time;
+		__entry->total_time = status->total_time;
 		__entry->power = power;
 	),
 
-	TP_printk("type=%s freq=%lu load=%u dynamic_power=%u static_power=%u power=%u",
+	TP_printk("type=%s freq=%lu load=%u power=%u",
 		__get_str(type), __entry->freq,
-		__entry->load, __entry->dynamic_power, __entry->static_power,
+		__entry->total_time == 0 ? 0 :
+			(100 * __entry->busy_time) / __entry->total_time,
 		__entry->power)
 );
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 10/15] thermal: devfreq_cooling: get device load and frequency directly
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (8 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 09/15] thermal: devfreq_cooling: change tracing function and arguments Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 11:19 ` [PATCH v7 11/15] thermal: devfreq_cooling: work on a copy of device status Lukasz Luba
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Devfreq cooling needs to now the correct status of the device in order
to operate. Do not rely on Devfreq last_status which might be a stale data
and get more up-to-date values of load and frequency.

In addition this patch adds normalization function, which also makes sure
that whatever data comes from the device, it is in a sane range.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/devfreq_cooling.c | 38 +++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index 52694d4bd819..396f16bb6566 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -235,6 +235,24 @@ static inline unsigned long get_total_power(struct devfreq_cooling_device *dfc,
 							       voltage);
 }
 
+static void _normalize_load(struct devfreq_dev_status *status)
+{
+	/* Make some space if needed */
+	if (status->busy_time > 0xffff) {
+		status->busy_time >>= 10;
+		status->total_time >>= 10;
+	}
+
+	if (status->busy_time > status->total_time)
+		status->busy_time = status->total_time;
+
+	status->busy_time *= 100;
+	status->busy_time /= status->total_time ? : 1;
+
+	/* Avoid division by 0 */
+	status->busy_time = status->busy_time ? : 1;
+	status->total_time = 100;
+}
 
 static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cdev,
 					       struct thermal_zone_device *tz,
@@ -242,14 +260,22 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 {
 	struct devfreq_cooling_device *dfc = cdev->devdata;
 	struct devfreq *df = dfc->devfreq;
-	struct devfreq_dev_status *status = &df->last_status;
+	struct devfreq_dev_status status;
 	unsigned long state;
-	unsigned long freq = status->current_frequency;
+	unsigned long freq;
 	unsigned long voltage;
 	u32 dyn_power = 0;
 	u32 static_power = 0;
 	int res;
 
+	mutex_lock(&df->lock);
+	res = df->profile->get_dev_status(df->dev.parent, &status);
+	mutex_unlock(&df->lock);
+	if (res)
+		return res;
+
+	freq = status.current_frequency;
+
 	state = freq_get_state(dfc, freq);
 	if (state == THERMAL_CSTATE_INVALID) {
 		res = -EAGAIN;
@@ -277,16 +303,18 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 	} else {
 		dyn_power = dfc->power_table[state];
 
+		_normalize_load(&status);
+
 		/* Scale dynamic power for utilization */
-		dyn_power *= status->busy_time;
-		dyn_power /= status->total_time;
+		dyn_power *= status.busy_time;
+		dyn_power /= status.total_time;
 		/* Get static power */
 		static_power = get_static_power(dfc, freq);
 
 		*power = dyn_power + static_power;
 	}
 
-	trace_thermal_power_devfreq_get_power(cdev, status, freq, *power);
+	trace_thermal_power_devfreq_get_power(cdev, &status, freq, *power);
 
 	return 0;
 fail:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 11/15] thermal: devfreq_cooling: work on a copy of device status
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (9 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 10/15] thermal: devfreq_cooling: get device load and frequency directly Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 11:19 ` [PATCH v7 12/15] thermal: devfreq_cooling: add new registration functions with Energy Model Lukasz Luba
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Devfreq framework can change the device status in the background. To
mitigate this situation make a copy of the status structure and use it
for internal calculations.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/devfreq_cooling.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index 396f16bb6566..36ec6a48606c 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -348,14 +348,20 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev,
 {
 	struct devfreq_cooling_device *dfc = cdev->devdata;
 	struct devfreq *df = dfc->devfreq;
-	struct devfreq_dev_status *status = &df->last_status;
-	unsigned long freq = status->current_frequency;
+	struct devfreq_dev_status status;
 	unsigned long busy_time;
+	unsigned long freq;
 	s32 dyn_power;
 	u32 static_power;
 	s32 est_power;
 	int i;
 
+	mutex_lock(&df->lock);
+	status = df->last_status;
+	mutex_unlock(&df->lock);
+
+	freq = status.current_frequency;
+
 	if (dfc->power_ops->get_real_power) {
 		/* Scale for resource utilization */
 		est_power = power * dfc->res_util;
@@ -367,8 +373,8 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev,
 		dyn_power = dyn_power > 0 ? dyn_power : 0;
 
 		/* Scale dynamic power for utilization */
-		busy_time = status->busy_time ?: 1;
-		est_power = (dyn_power * status->total_time) / busy_time;
+		busy_time = status.busy_time ?: 1;
+		est_power = (dyn_power * status.total_time) / busy_time;
 	}
 
 	/*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 12/15] thermal: devfreq_cooling: add new registration functions with Energy Model
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (10 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 11/15] thermal: devfreq_cooling: work on a copy of device status Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 11:19 ` [PATCH v7 13/15] thermal: devfreq_cooling: remove old power model and use EM Lukasz Luba
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

The Energy Model (EM) framework supports devices such as Devfreq. Create
new registration functions which automatically register EM for the thermal
devfreq_cooling devices. This patch prepares the code for coming changes
which are going to replace old power model with the new EM.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/devfreq_cooling.c | 99 ++++++++++++++++++++++++++++++-
 include/linux/devfreq_cooling.h   | 22 +++++++
 2 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index 36ec6a48606c..c7ffed2d6ee0 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -58,6 +58,8 @@ static DEFINE_IDA(devfreq_ida);
  * @capped_state:	index to cooling state with in dynamic power budget
  * @req_max_freq:	PM QoS request for limiting the maximum frequency
  *			of the devfreq device.
+ * @em:		Energy Model which represents the associated Devfreq device
+ * @em_registered:	Devfreq cooling registered the EM and should free it.
  */
 struct devfreq_cooling_device {
 	int id;
@@ -71,6 +73,8 @@ struct devfreq_cooling_device {
 	u32 res_util;
 	int capped_state;
 	struct dev_pm_qos_request req_max_freq;
+	struct em_perf_domain *em;
+	bool em_registered;
 };
 
 static int devfreq_cooling_get_max_state(struct thermal_cooling_device *cdev,
@@ -594,22 +598,115 @@ struct thermal_cooling_device *devfreq_cooling_register(struct devfreq *df)
 }
 EXPORT_SYMBOL_GPL(devfreq_cooling_register);
 
+/**
+ * devfreq_cooling_em_register_power() - Register devfreq cooling device with
+ *		power information and attempt to register Energy Model (EM)
+ * @df:		Pointer to devfreq device.
+ * @dfc_power:	Pointer to devfreq_cooling_power.
+ * @em_cb:	Callback functions providing the data of the EM
+ *
+ * Register a devfreq cooling device and attempt to register Energy Model. The
+ * available OPPs must be registered for the device.
+ *
+ * If @dfc_power is provided, the cooling device is registered with the
+ * power extensions. If @em_cb is provided it will be called for each OPP to
+ * calculate power value and cost. If @em_cb is not provided then simple Energy
+ * Model is going to be used, which requires "dynamic-power-coefficient" a
+ * devicetree property.
+ */
+struct thermal_cooling_device *
+devfreq_cooling_em_register_power(struct devfreq *df,
+				  struct devfreq_cooling_power *dfc_power,
+				  struct em_data_callback *em_cb)
+{
+	struct thermal_cooling_device *cdev;
+	struct devfreq_cooling_device *dfc;
+	struct device_node *np = NULL;
+	struct device *dev;
+	int nr_opp, ret;
+
+	if (IS_ERR_OR_NULL(df))
+		return ERR_PTR(-EINVAL);
+
+	dev = df->dev.parent;
+
+	if (em_cb) {
+		nr_opp = dev_pm_opp_get_opp_count(dev);
+		if (nr_opp <= 0) {
+			dev_err(dev, "No valid OPPs found\n");
+			return ERR_PTR(-EINVAL);
+		}
+
+		ret = em_dev_register_perf_domain(dev, nr_opp, em_cb, NULL);
+	} else {
+		ret = dev_pm_opp_of_register_em(dev, NULL);
+	}
+
+	if (ret)
+		dev_warn(dev, "Unable to register EM for devfreq cooling device (%d)\n",
+			 ret);
+
+	if (dev->of_node)
+		np = of_node_get(dev->of_node);
+
+	cdev = of_devfreq_cooling_register_power(np, df, dfc_power);
+
+	if (np)
+		of_node_put(np);
+
+	if (IS_ERR_OR_NULL(cdev)) {
+		if (!ret)
+			em_dev_unregister_perf_domain(dev);
+	} else {
+		dfc = cdev->devdata;
+		dfc->em_registered = !ret;
+	}
+
+	return cdev;
+}
+EXPORT_SYMBOL_GPL(devfreq_cooling_em_register_power);
+
+/**
+ * devfreq_cooling_em_register() - Register devfreq cooling device together
+ *				with Energy Model.
+ * @df:		Pointer to devfreq device.
+ * @em_cb:	Callback functions providing the data of the Energy Model
+ *
+ * This function attempts to register Energy Model for devfreq device and then
+ * register the devfreq cooling device.
+ */
+struct thermal_cooling_device *
+devfreq_cooling_em_register(struct devfreq *df, struct em_data_callback *em_cb)
+{
+	return devfreq_cooling_em_register_power(df, NULL, em_cb);
+}
+EXPORT_SYMBOL_GPL(devfreq_cooling_em_register);
+
 /**
  * devfreq_cooling_unregister() - Unregister devfreq cooling device.
  * @cdev: Pointer to devfreq cooling device to unregister.
+ *
+ * Unregisters devfreq cooling device and related Energy Model if it was
+ * present.
  */
 void devfreq_cooling_unregister(struct thermal_cooling_device *cdev)
 {
 	struct devfreq_cooling_device *dfc;
+	struct device *dev;
 
-	if (!cdev)
+	if (IS_ERR_OR_NULL(cdev))
 		return;
 
 	dfc = cdev->devdata;
+	dev = dfc->devfreq->dev.parent;
 
 	thermal_cooling_device_unregister(dfc->cdev);
 	ida_simple_remove(&devfreq_ida, dfc->id);
 	dev_pm_qos_remove_request(&dfc->req_max_freq);
+
+	if (dfc->em_registered)
+		em_dev_unregister_perf_domain(dev);
+
 	kfree(dfc->power_table);
 	kfree(dfc->freq_table);
 
diff --git a/include/linux/devfreq_cooling.h b/include/linux/devfreq_cooling.h
index 79a6e37a1d6f..07df019fcb66 100644
--- a/include/linux/devfreq_cooling.h
+++ b/include/linux/devfreq_cooling.h
@@ -18,6 +18,7 @@
 #define __DEVFREQ_COOLING_H__
 
 #include <linux/devfreq.h>
+#include <linux/energy_model.h>
 #include <linux/thermal.h>
 
 
@@ -72,6 +73,13 @@ struct thermal_cooling_device *
 of_devfreq_cooling_register(struct device_node *np, struct devfreq *df);
 struct thermal_cooling_device *devfreq_cooling_register(struct devfreq *df);
 void devfreq_cooling_unregister(struct thermal_cooling_device *dfc);
+struct thermal_cooling_device *
+devfreq_cooling_em_register_power(struct devfreq *df,
+				  struct devfreq_cooling_power *dfc_power,
+				  struct em_data_callback *em_cb);
+struct thermal_cooling_device *
+devfreq_cooling_em_register(struct devfreq *df,
+			    struct em_data_callback *em_cb);
 
 #else /* !CONFIG_DEVFREQ_THERMAL */
 
@@ -94,6 +102,20 @@ devfreq_cooling_register(struct devfreq *df)
 	return ERR_PTR(-EINVAL);
 }
 
+static inline struct thermal_cooling_device *
+devfreq_cooling_em_register_power(struct devfreq *df,
+				  struct devfreq_cooling_power *dfc_power,
+				  struct em_data_callback *em_cb)
+{
+	return ERR_PTR(-EINVAL);
+}
+
+static inline struct thermal_cooling_device *
+devfreq_cooling_em_register(struct devfreq *df,	struct em_data_callback *em_cb)
+{
+	return ERR_PTR(-EINVAL);
+}
+
 static inline void
 devfreq_cooling_unregister(struct thermal_cooling_device *dfc)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 13/15] thermal: devfreq_cooling: remove old power model and use EM
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (11 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 12/15] thermal: devfreq_cooling: add new registration functions with Energy Model Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 11:19 ` [PATCH v7 14/15] thermal: devfreq_cooling: update license to use SPDX Lukasz Luba
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Remove old power model and use new Energy Model to calculate the power
budget. It drops static + dynamic power calculations and power table
in order to use Energy Model performance domain data. This model
should be easy to use and could find more users. It is also less
complicated to setup the needed structures.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/devfreq_cooling.c | 282 +++++++++---------------------
 include/linux/devfreq_cooling.h   |  17 --
 2 files changed, 86 insertions(+), 213 deletions(-)

diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index c7ffed2d6ee0..79dcef3dbeeb 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -41,20 +41,17 @@ static DEFINE_IDA(devfreq_ida);
  * @cdev:	Pointer to associated thermal cooling device.
  * @devfreq:	Pointer to associated devfreq device.
  * @cooling_state:	Current cooling state.
- * @power_table:	Pointer to table with maximum power draw for each
- *			cooling state. State is the index into the table, and
- *			the power is in mW.
  * @freq_table:	Pointer to a table with the frequencies sorted in descending
  *		order.  You can index the table by cooling device state
- * @freq_table_size:	Size of the @freq_table and @power_table
- * @power_ops:	Pointer to devfreq_cooling_power, used to generate the
- *		@power_table.
+ * @max_state:	It is the last index, that is, one less than the number of the
+ *		OPPs
+ * @power_ops:	Pointer to devfreq_cooling_power, a more precised model.
  * @res_util:	Resource utilization scaling factor for the power.
  *		It is multiplied by 100 to minimize the error. It is used
  *		for estimation of the power budget instead of using
- *		'utilization' (which is	'busy_time / 'total_time').
- *		The 'res_util' range is from 100 to (power_table[state] * 100)
- *		for the corresponding 'state'.
+ *		'utilization' (which is	'busy_time' / 'total_time').
+ *		The 'res_util' range is from 100 to power * 100	for the
+ *		corresponding 'state'.
  * @capped_state:	index to cooling state with in dynamic power budget
  * @req_max_freq:	PM QoS request for limiting the maximum frequency
  *			of the devfreq device.
@@ -66,9 +63,8 @@ struct devfreq_cooling_device {
 	struct thermal_cooling_device *cdev;
 	struct devfreq *devfreq;
 	unsigned long cooling_state;
-	u32 *power_table;
 	u32 *freq_table;
-	size_t freq_table_size;
+	size_t max_state;
 	struct devfreq_cooling_power *power_ops;
 	u32 res_util;
 	int capped_state;
@@ -82,7 +78,7 @@ static int devfreq_cooling_get_max_state(struct thermal_cooling_device *cdev,
 {
 	struct devfreq_cooling_device *dfc = cdev->devdata;
 
-	*state = dfc->freq_table_size - 1;
+	*state = dfc->max_state;
 
 	return 0;
 }
@@ -110,10 +106,16 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev,
 
 	dev_dbg(dev, "Setting cooling state %lu\n", state);
 
-	if (state >= dfc->freq_table_size)
+	if (state > dfc->max_state)
 		return -EINVAL;
 
-	freq = dfc->freq_table[state];
+	if (dfc->em) {
+		/* Energy Model frequencies are in kHz */
+		freq = dfc->em->table[dfc->max_state - state].frequency;
+		freq *= 1000;
+	} else {
+		freq = dfc->freq_table[state];
+	}
 
 	dev_pm_qos_update_request(&dfc->req_max_freq,
 				  DIV_ROUND_UP(freq, HZ_PER_KHZ));
@@ -124,11 +126,11 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev,
 }
 
 /**
- * freq_get_state() - get the cooling state corresponding to a frequency
+ * freq_get_state() - get the performance index corresponding to a frequency
  * @dfc:	Pointer to devfreq cooling device
- * @freq:	frequency in Hz
+ * @freq:	frequency in kHz
  *
- * Return: the cooling state associated with the @freq, or
+ * Return: the performance index associated with the @freq, or
  * THERMAL_CSTATE_INVALID if it wasn't found.
  */
 static unsigned long
@@ -136,8 +138,8 @@ freq_get_state(struct devfreq_cooling_device *dfc, unsigned long freq)
 {
 	int i;
 
-	for (i = 0; i < dfc->freq_table_size; i++) {
-		if (dfc->freq_table[i] == freq)
+	for (i = 0; i <= dfc->max_state; i++) {
+		if (dfc->em->table[i].frequency == freq)
 			return i;
 	}
 
@@ -172,71 +174,15 @@ static unsigned long get_voltage(struct devfreq *df, unsigned long freq)
 	return voltage;
 }
 
-/**
- * get_static_power() - calculate the static power
- * @dfc:	Pointer to devfreq cooling device
- * @freq:	Frequency in Hz
- *
- * Calculate the static power in milliwatts using the supplied
- * get_static_power().  The current voltage is calculated using the
- * OPP library.  If no get_static_power() was supplied, assume the
- * static power is negligible.
- */
-static unsigned long
-get_static_power(struct devfreq_cooling_device *dfc, unsigned long freq)
+static void dfc_em_get_requested_power(struct em_perf_domain *em,
+				       struct devfreq_dev_status *status,
+				       u32 *power, int em_perf_idx)
 {
-	struct devfreq *df = dfc->devfreq;
-	unsigned long voltage;
+	*power = em->table[em_perf_idx].power;
 
-	if (!dfc->power_ops->get_static_power)
-		return 0;
-
-	voltage = get_voltage(df, freq);
-
-	if (voltage == 0)
-		return 0;
-
-	return dfc->power_ops->get_static_power(df, voltage);
-}
-
-/**
- * get_dynamic_power - calculate the dynamic power
- * @dfc:	Pointer to devfreq cooling device
- * @freq:	Frequency in Hz
- * @voltage:	Voltage in millivolts
- *
- * Calculate the dynamic power in milliwatts consumed by the device at
- * frequency @freq and voltage @voltage.  If the get_dynamic_power()
- * was supplied as part of the devfreq_cooling_power struct, then that
- * function is used.  Otherwise, a simple power model (Pdyn = Coeff *
- * Voltage^2 * Frequency) is used.
- */
-static unsigned long
-get_dynamic_power(struct devfreq_cooling_device *dfc, unsigned long freq,
-		  unsigned long voltage)
-{
-	u64 power;
-	u32 freq_mhz;
-	struct devfreq_cooling_power *dfc_power = dfc->power_ops;
-
-	if (dfc_power->get_dynamic_power)
-		return dfc_power->get_dynamic_power(dfc->devfreq, freq,
-						    voltage);
-
-	freq_mhz = freq / 1000000;
-	power = (u64)dfc_power->dyn_power_coeff * freq_mhz * voltage * voltage;
-	do_div(power, 1000000000);
-
-	return power;
-}
-
-
-static inline unsigned long get_total_power(struct devfreq_cooling_device *dfc,
-					    unsigned long freq,
-					    unsigned long voltage)
-{
-	return get_static_power(dfc, freq) + get_dynamic_power(dfc, freq,
-							       voltage);
+	/* Scale power for utilization */
+	*power *= status->busy_time;
+	*power /= status->total_time;
 }
 
 static void _normalize_load(struct devfreq_dev_status *status)
@@ -268,8 +214,6 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 	unsigned long state;
 	unsigned long freq;
 	unsigned long voltage;
-	u32 dyn_power = 0;
-	u32 static_power = 0;
 	int res;
 
 	mutex_lock(&df->lock);
@@ -280,13 +224,14 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 
 	freq = status.current_frequency;
 
-	state = freq_get_state(dfc, freq);
+	/* Energy Model frequencies are in kHz */
+	state = freq_get_state(dfc, freq / 1000);
 	if (state == THERMAL_CSTATE_INVALID) {
 		res = -EAGAIN;
 		goto fail;
 	}
 
-	if (dfc->power_ops->get_real_power) {
+	if (dfc->power_ops && dfc->power_ops->get_real_power) {
 		voltage = get_voltage(df, freq);
 		if (voltage == 0) {
 			res = -EINVAL;
@@ -296,7 +241,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 		res = dfc->power_ops->get_real_power(df, power, freq, voltage);
 		if (!res) {
 			state = dfc->capped_state;
-			dfc->res_util = dfc->power_table[state];
+			dfc->res_util = dfc->em->table[state].power;
 			dfc->res_util *= SCALE_ERROR_MITIGATION;
 
 			if (*power > 1)
@@ -305,17 +250,8 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
 			goto fail;
 		}
 	} else {
-		dyn_power = dfc->power_table[state];
-
 		_normalize_load(&status);
-
-		/* Scale dynamic power for utilization */
-		dyn_power *= status.busy_time;
-		dyn_power /= status.total_time;
-		/* Get static power */
-		static_power = get_static_power(dfc, freq);
-
-		*power = dyn_power + static_power;
+		dfc_em_get_requested_power(dfc->em, &status, power, state);
 	}
 
 	trace_thermal_power_devfreq_get_power(cdev, &status, freq, *power);
@@ -333,16 +269,14 @@ static int devfreq_cooling_state2power(struct thermal_cooling_device *cdev,
 				       u32 *power)
 {
 	struct devfreq_cooling_device *dfc = cdev->devdata;
-	unsigned long freq;
-	u32 static_power;
+	int idx;
 
-	if (state >= dfc->freq_table_size)
+	if (state > dfc->max_state)
 		return -EINVAL;
 
-	freq = dfc->freq_table[state];
-	static_power = get_static_power(dfc, freq);
+	idx = dfc->max_state - state;
+	*power = dfc->em->table[idx].power;
 
-	*power = dfc->power_table[state] + static_power;
 	return 0;
 }
 
@@ -353,11 +287,8 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev,
 	struct devfreq_cooling_device *dfc = cdev->devdata;
 	struct devfreq *df = dfc->devfreq;
 	struct devfreq_dev_status status;
-	unsigned long busy_time;
+	u32 est_power = power;
 	unsigned long freq;
-	s32 dyn_power;
-	u32 static_power;
-	s32 est_power;
 	int i;
 
 	mutex_lock(&df->lock);
@@ -366,31 +297,26 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev,
 
 	freq = status.current_frequency;
 
-	if (dfc->power_ops->get_real_power) {
+	if (dfc->power_ops && dfc->power_ops->get_real_power) {
 		/* Scale for resource utilization */
 		est_power = power * dfc->res_util;
 		est_power /= SCALE_ERROR_MITIGATION;
 	} else {
-		static_power = get_static_power(dfc, freq);
-
-		dyn_power = power - static_power;
-		dyn_power = dyn_power > 0 ? dyn_power : 0;
-
-		/* Scale dynamic power for utilization */
-		busy_time = status.busy_time ?: 1;
-		est_power = (dyn_power * status.total_time) / busy_time;
+		_normalize_load(&status);
+		est_power *= status.total_time;
+		est_power /= status.busy_time;
 	}
 
 	/*
 	 * Find the first cooling state that is within the power
-	 * budget for dynamic power.
+	 * budget. The EM power table is sorted ascending.
 	 */
-	for (i = 0; i < dfc->freq_table_size - 1; i++)
-		if (est_power >= dfc->power_table[i])
+	for (i = dfc->max_state; i > 0; i--)
+		if (est_power >= dfc->em->table[i].power)
 			break;
 
-	*state = i;
-	dfc->capped_state = i;
+	*state = dfc->max_state - i;
+	dfc->capped_state = *state;
 	trace_thermal_power_devfreq_limit(cdev, freq, *state, power);
 	return 0;
 }
@@ -402,91 +328,43 @@ static struct thermal_cooling_device_ops devfreq_cooling_ops = {
 };
 
 /**
- * devfreq_cooling_gen_tables() - Generate power and freq tables.
- * @dfc: Pointer to devfreq cooling device.
- *
- * Generate power and frequency tables: the power table hold the
- * device's maximum power usage at each cooling state (OPP).  The
- * static and dynamic power using the appropriate voltage and
- * frequency for the state, is acquired from the struct
- * devfreq_cooling_power, and summed to make the maximum power draw.
- *
- * The frequency table holds the frequencies in descending order.
- * That way its indexed by cooling device state.
+ * devfreq_cooling_gen_tables() - Generate frequency table.
+ * @dfc:	Pointer to devfreq cooling device.
+ * @num_opps:	Number of OPPs
  *
- * The tables are malloced, and pointers put in dfc.  They must be
- * freed when unregistering the devfreq cooling device.
+ * Generate frequency table which holds the frequencies in descending
+ * order. That way its indexed by cooling device state. This is for
+ * compatibility with drivers which do not register Energy Model.
  *
  * Return: 0 on success, negative error code on failure.
  */
-static int devfreq_cooling_gen_tables(struct devfreq_cooling_device *dfc)
+static int devfreq_cooling_gen_tables(struct devfreq_cooling_device *dfc,
+				      int num_opps)
 {
 	struct devfreq *df = dfc->devfreq;
 	struct device *dev = df->dev.parent;
-	int ret, num_opps;
 	unsigned long freq;
-	u32 *power_table = NULL;
-	u32 *freq_table;
 	int i;
 
-	num_opps = dev_pm_opp_get_opp_count(dev);
-
-	if (dfc->power_ops) {
-		power_table = kcalloc(num_opps, sizeof(*power_table),
-				      GFP_KERNEL);
-		if (!power_table)
-			return -ENOMEM;
-	}
-
-	freq_table = kcalloc(num_opps, sizeof(*freq_table),
+	dfc->freq_table = kcalloc(num_opps, sizeof(*dfc->freq_table),
 			     GFP_KERNEL);
-	if (!freq_table) {
-		ret = -ENOMEM;
-		goto free_power_table;
-	}
+	if (!dfc->freq_table)
+		return -ENOMEM;
 
 	for (i = 0, freq = ULONG_MAX; i < num_opps; i++, freq--) {
-		unsigned long power, voltage;
 		struct dev_pm_opp *opp;
 
 		opp = dev_pm_opp_find_freq_floor(dev, &freq);
 		if (IS_ERR(opp)) {
-			ret = PTR_ERR(opp);
-			goto free_tables;
+			kfree(dfc->freq_table);
+			return PTR_ERR(opp);
 		}
 
-		voltage = dev_pm_opp_get_voltage(opp) / 1000; /* mV */
 		dev_pm_opp_put(opp);
-
-		if (dfc->power_ops) {
-			if (dfc->power_ops->get_real_power)
-				power = get_total_power(dfc, freq, voltage);
-			else
-				power = get_dynamic_power(dfc, freq, voltage);
-
-			dev_dbg(dev, "Power table: %lu MHz @ %lu mV: %lu = %lu mW\n",
-				freq / 1000000, voltage, power, power);
-
-			power_table[i] = power;
-		}
-
-		freq_table[i] = freq;
+		dfc->freq_table[i] = freq;
 	}
 
-	if (dfc->power_ops)
-		dfc->power_table = power_table;
-
-	dfc->freq_table = freq_table;
-	dfc->freq_table_size = num_opps;
-
 	return 0;
-
-free_tables:
-	kfree(freq_table);
-free_power_table:
-	kfree(power_table);
-
-	return ret;
 }
 
 /**
@@ -511,7 +389,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
 	struct thermal_cooling_device *cdev;
 	struct devfreq_cooling_device *dfc;
 	char dev_name[THERMAL_NAME_LENGTH];
-	int err;
+	int err, num_opps;
 
 	dfc = kzalloc(sizeof(*dfc), GFP_KERNEL);
 	if (!dfc)
@@ -519,28 +397,45 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
 
 	dfc->devfreq = df;
 
-	if (dfc_power) {
-		dfc->power_ops = dfc_power;
-
+	dfc->em = em_pd_get(df->dev.parent);
+	if (dfc->em) {
 		devfreq_cooling_ops.get_requested_power =
 			devfreq_cooling_get_requested_power;
 		devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
 		devfreq_cooling_ops.power2state = devfreq_cooling_power2state;
+
+		dfc->power_ops = dfc_power;
+
+		num_opps = em_pd_nr_perf_states(dfc->em);
+	} else {
+		/* Backward compatibility for drivers which do not use IPA */
+		dev_dbg(df->dev.parent, "missing EM for cooling device\n");
+
+		num_opps = dev_pm_opp_get_opp_count(df->dev.parent);
+
+		err = devfreq_cooling_gen_tables(dfc, num_opps);
+		if (err)
+			goto free_dfc;
 	}
 
-	err = devfreq_cooling_gen_tables(dfc);
-	if (err)
+	if (num_opps <= 0) {
+		err = -EINVAL;
 		goto free_dfc;
+	}
+
+	/* max_state is an index, not a counter */
+	dfc->max_state = num_opps - 1;
 
 	err = dev_pm_qos_add_request(df->dev.parent, &dfc->req_max_freq,
 				     DEV_PM_QOS_MAX_FREQUENCY,
 				     PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
 	if (err < 0)
-		goto free_tables;
+		goto free_table;
 
 	err = ida_simple_get(&devfreq_ida, 0, 0, GFP_KERNEL);
 	if (err < 0)
 		goto remove_qos_req;
+
 	dfc->id = err;
 
 	snprintf(dev_name, sizeof(dev_name), "thermal-devfreq-%d", dfc->id);
@@ -561,12 +456,9 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
 
 release_ida:
 	ida_simple_remove(&devfreq_ida, dfc->id);
-
 remove_qos_req:
 	dev_pm_qos_remove_request(&dfc->req_max_freq);
-
-free_tables:
-	kfree(dfc->power_table);
+free_table:
 	kfree(dfc->freq_table);
 free_dfc:
 	kfree(dfc);
@@ -707,9 +599,7 @@ void devfreq_cooling_unregister(struct thermal_cooling_device *cdev)
 	if (dfc->em_registered)
 		em_dev_unregister_perf_domain(dev);
 
-	kfree(dfc->power_table);
 	kfree(dfc->freq_table);
-
 	kfree(dfc);
 }
 EXPORT_SYMBOL_GPL(devfreq_cooling_unregister);
diff --git a/include/linux/devfreq_cooling.h b/include/linux/devfreq_cooling.h
index 07df019fcb66..613678ce23df 100644
--- a/include/linux/devfreq_cooling.h
+++ b/include/linux/devfreq_cooling.h
@@ -24,17 +24,6 @@
 
 /**
  * struct devfreq_cooling_power - Devfreq cooling power ops
- * @get_static_power:	Take voltage, in mV, and return the static power
- *			in mW.  If NULL, the static power is assumed
- *			to be 0.
- * @get_dynamic_power:	Take voltage, in mV, and frequency, in HZ, and
- *			return the dynamic power draw in mW.  If NULL,
- *			a simple power model is used.
- * @dyn_power_coeff:	Coefficient for the simple dynamic power model in
- *			mW/(MHz mV mV).
- *			If get_dynamic_power() is NULL, then the
- *			dynamic power is calculated as
- *			@dyn_power_coeff * frequency * voltage^2
  * @get_real_power:	When this is set, the framework uses it to ask the
  *			device driver for the actual power.
  *			Some devices have more sophisticated methods
@@ -54,14 +43,8 @@
  *			max total (static + dynamic) power value for each OPP.
  */
 struct devfreq_cooling_power {
-	unsigned long (*get_static_power)(struct devfreq *devfreq,
-					  unsigned long voltage);
-	unsigned long (*get_dynamic_power)(struct devfreq *devfreq,
-					   unsigned long freq,
-					   unsigned long voltage);
 	int (*get_real_power)(struct devfreq *df, u32 *power,
 			      unsigned long freq, unsigned long voltage);
-	unsigned long dyn_power_coeff;
 };
 
 #ifdef CONFIG_DEVFREQ_THERMAL
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 14/15] thermal: devfreq_cooling: update license to use SPDX
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (12 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 13/15] thermal: devfreq_cooling: remove old power model and use EM Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-11 11:19 ` [PATCH v7 15/15] drm/panfrost: Register devfreq cooling and attempt to add Energy Model Lukasz Luba
  2020-05-22 10:43 ` [PATCH v7 00/15] Add support for devices in the " Daniel Lezcano
  15 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Change the license headers and use SPDX standard.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/devfreq_cooling.c | 12 ++----------
 include/linux/devfreq_cooling.h   | 12 ++----------
 2 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index 79dcef3dbeeb..77d996fd82d2 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -1,17 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
  * devfreq_cooling: Thermal cooling device implementation for devices using
  *                  devfreq
  *
- * Copyright (C) 2014-2015 ARM Limited
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed "as is" WITHOUT ANY WARRANTY of any
- * kind, whether express or implied; without even the implied warranty
- * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
+ * Copyright (C) 2014-2020 ARM Limited
  *
  * TODO:
  *    - If OPPs are added or removed after devfreq cooling has
diff --git a/include/linux/devfreq_cooling.h b/include/linux/devfreq_cooling.h
index 613678ce23df..1f5c07c78f0f 100644
--- a/include/linux/devfreq_cooling.h
+++ b/include/linux/devfreq_cooling.h
@@ -1,17 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * devfreq_cooling: Thermal cooling device implementation for devices using
  *                  devfreq
  *
- * Copyright (C) 2014-2015 ARM Limited
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed "as is" WITHOUT ANY WARRANTY of any
- * kind, whether express or implied; without even the implied warranty
- * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
+ * Copyright (C) 2014-2020 ARM Limited
  */
 
 #ifndef __DEVFREQ_COOLING_H__
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7 15/15] drm/panfrost: Register devfreq cooling and attempt to add Energy Model
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (13 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 14/15] thermal: devfreq_cooling: update license to use SPDX Lukasz Luba
@ 2020-05-11 11:19 ` Lukasz Luba
  2020-05-22 10:43 ` [PATCH v7 00/15] Add support for devices in the " Daniel Lezcano
  15 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-11 11:19 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria,
	daniel.lezcano, mingo, peterz, juri.lelli, vincent.guittot,
	rostedt, qperret, bsegall, mgorman, shawnguo, s.hauer, festevam,
	kernel, khilman, agross, bjorn.andersson, robh, matthias.bgg,
	steven.price, tomeu.vizoso, alyssa.rosenzweig, airlied, daniel,
	liviu.dudau, lorenzo.pieralisi, lukasz.luba, patrick.bellasi,
	orjan.eide, rdunlap, mka

Register devfreq cooling device and attempt to register Energy Model. This
will add the devfreq device to the Energy Model framework. It will create
a dedicated and unified data structures used i.e. in thermal framework.
The last NULL parameter indicates that the power model is simplified and
created based on DT 'dynamic-power-coefficient', voltage and frequency.

Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/gpu/drm/panfrost/panfrost_devfreq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index 413987038fbf..8759a73db153 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -105,7 +105,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
 	}
 	pfdev->devfreq.devfreq = devfreq;
 
-	cooling = of_devfreq_cooling_register(dev->of_node, devfreq);
+	cooling = devfreq_cooling_em_register(devfreq, NULL);
 	if (IS_ERR(cooling))
 		DRM_DEV_INFO(dev, "Failed to register cooling device\n");
 	else
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 01/15] PM / EM: change naming convention from 'capacity' to 'performance'
  2020-05-11 11:18 ` [PATCH v7 01/15] PM / EM: change naming convention from 'capacity' to 'performance' Lukasz Luba
@ 2020-05-11 11:46   ` Quentin Perret
  0 siblings, 0 replies; 28+ messages in thread
From: Quentin Perret @ 2020-05-11 11:46 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx, Dietmar.Eggemann,
	cw00.choi, b.zolnierkie, rjw, sudeep.holla, viresh.kumar, nm,
	sboyd, rui.zhang, amit.kucheria, daniel.lezcano, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, bsegall, mgorman, shawnguo,
	s.hauer, festevam, kernel, khilman, agross, bjorn.andersson,
	robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

On Monday 11 May 2020 at 12:18:58 (+0100), Lukasz Luba wrote:
> The Energy Model uses concept of performance domain and capacity states in
> order to calculate power used by CPUs. Change naming convention from
> capacity to performance state would enable wider usage in future, e.g.
> upcoming support for other devices other than CPUs.
> 
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

Acked-by: Quentin Perret <qperret@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 02/15] PM / EM: introduce em_dev_register_perf_domain function
  2020-05-11 11:18 ` [PATCH v7 02/15] PM / EM: introduce em_dev_register_perf_domain function Lukasz Luba
@ 2020-05-11 11:51   ` Quentin Perret
  0 siblings, 0 replies; 28+ messages in thread
From: Quentin Perret @ 2020-05-11 11:51 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx, Dietmar.Eggemann,
	cw00.choi, b.zolnierkie, rjw, sudeep.holla, viresh.kumar, nm,
	sboyd, rui.zhang, amit.kucheria, daniel.lezcano, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, bsegall, mgorman, shawnguo,
	s.hauer, festevam, kernel, khilman, agross, bjorn.andersson,
	robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

On Monday 11 May 2020 at 12:18:59 (+0100), Lukasz Luba wrote:
> Add now function in the Energy Model framework which is going to support
> new devices. This function will help in transition and make it smoother.
> For now it still checks if the cpumask is a valid pointer, which will be
> removed later when the new structures and infrastructure will be ready.
> 
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

Acked-by: Quentin Perret <qperret@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 03/15] PM / EM: update callback structure and add device pointer
  2020-05-11 11:19 ` [PATCH v7 03/15] PM / EM: update callback structure and add device pointer Lukasz Luba
@ 2020-05-11 11:57   ` Quentin Perret
  2020-05-12 11:11     ` Lukasz Luba
  0 siblings, 1 reply; 28+ messages in thread
From: Quentin Perret @ 2020-05-11 11:57 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx, Dietmar.Eggemann,
	cw00.choi, b.zolnierkie, rjw, sudeep.holla, viresh.kumar, nm,
	sboyd, rui.zhang, amit.kucheria, daniel.lezcano, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, bsegall, mgorman, shawnguo,
	s.hauer, festevam, kernel, khilman, agross, bjorn.andersson,
	robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

On Monday 11 May 2020 at 12:19:00 (+0100), Lukasz Luba wrote:
> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
> index 61623e2ff149..11ee24e06d12 100644
> --- a/drivers/cpufreq/scmi-cpufreq.c
> +++ b/drivers/cpufreq/scmi-cpufreq.c
> @@ -103,17 +103,12 @@ scmi_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask)
>  }
>  
>  static int __maybe_unused
> -scmi_get_cpu_power(unsigned long *power, unsigned long *KHz, int cpu)
> +scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
> +		   struct device *cpu_dev)
>  {
> -	struct device *cpu_dev = get_cpu_device(cpu);
>  	unsigned long Hz;
>  	int ret, domain;
>  
> -	if (!cpu_dev) {
> -		pr_err("failed to get cpu%d device\n", cpu);
> -		return -ENODEV;
> -	}
> -
>  	domain = handle->perf_ops->device_domain_id(cpu_dev);
>  	if (domain < 0)
>  		return domain;
> @@ -200,7 +195,7 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
>  
>  	policy->fast_switch_possible = true;
>  
> -	em_register_perf_domain(policy->cpus, nr_opp, &em_cb);

So this one has no users after this patch right? I suppose you could
squash patch 05 in this one. But no big deal.

Acked-by: Quentin Perret <qperret@google.com>

> +	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
>  
>  	return 0;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 04/15] PM / EM: add support for other devices than CPUs in Energy Model
  2020-05-11 11:19 ` [PATCH v7 04/15] PM / EM: add support for other devices than CPUs in Energy Model Lukasz Luba
@ 2020-05-11 13:43   ` Quentin Perret
  2020-05-12 11:38     ` Lukasz Luba
  0 siblings, 1 reply; 28+ messages in thread
From: Quentin Perret @ 2020-05-11 13:43 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx, Dietmar.Eggemann,
	cw00.choi, b.zolnierkie, rjw, sudeep.holla, viresh.kumar, nm,
	sboyd, rui.zhang, amit.kucheria, daniel.lezcano, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, bsegall, mgorman, shawnguo,
	s.hauer, festevam, kernel, khilman, agross, bjorn.andersson,
	robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

Hey Lukasz,

On Monday 11 May 2020 at 12:19:01 (+0100), Lukasz Luba wrote:
<snip>
> @@ -27,12 +29,15 @@ struct em_perf_state {
>   * em_perf_domain - Performance domain
>   * @table:		List of performance states, in ascending order
>   * @nr_perf_states:	Number of performance states
> - * @cpus:		Cpumask covering the CPUs of the domain
> + * @cpus:		Cpumask covering the CPUs of the domain. It's here
> + *			for performance reasons to avoid potential cache
> + *			misses during energy calculations in the scheduler

And because that saves a pointer, and simplifies allocating/freeing that
memory region :)

<snip>
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 5b8a1566526a..9cc7f2973600 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -2,8 +2,9 @@
>  /*
>   * Energy Model of CPUs

Should this comment change too?

<snip>
> -static void em_debug_create_pd(struct em_perf_domain *pd, int cpu)
> +static void em_debug_create_pd(struct device *dev)
>  {
>  	struct dentry *d;
> -	char name[8];
>  	int i;
>  
> -	snprintf(name, sizeof(name), "pd%d", cpu);
> -
>  	/* Create the directory of the performance domain */
> -	d = debugfs_create_dir(name, rootdir);
> +	d = debugfs_create_dir(dev_name(dev), rootdir);

So what will be the name for the perf domain of CPUs now? cpuX?

<snip>
> @@ -142,8 +149,8 @@ em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
>  		 */
>  		opp_eff = freq / power;
>  		if (opp_eff >= prev_opp_eff)
> -			pr_warn("pd%d: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
> -					cpu, i, i - 1);
> +			dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
> +					i, i - 1);

It feels like changing from warn to debug doesn't really belong to this
patch no?

<snip>
> @@ -216,47 +274,50 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>  	 */
>  	mutex_lock(&em_pd_mutex);
>  
> -	for_each_cpu(cpu, span) {
> -		/* Make sure we don't register again an existing domain. */
> -		if (READ_ONCE(per_cpu(em_data, cpu))) {
> -			ret = -EEXIST;
> -			goto unlock;
> -		}
> +	if (dev->em_pd) {
> +		ret = -EEXIST;
> +		goto unlock;
> +	}
>  
> -		/*
> -		 * All CPUs of a domain must have the same micro-architecture
> -		 * since they all share the same table.
> -		 */
> -		cap = arch_scale_cpu_capacity(cpu);
> -		if (prev_cap && prev_cap != cap) {
> -			pr_err("CPUs of %*pbl must have the same capacity\n",
> -							cpumask_pr_args(span));
> +	if (_is_cpu_device(dev)) {

Something like

	if (!_is_cpu_device(dev))
		goto device;

would limit the diff a bit, but that may just be personal taste.

But appart from these nits, the patch LGTM.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 05/15] PM / EM: remove em_register_perf_domain
  2020-05-11 11:19 ` [PATCH v7 05/15] PM / EM: remove em_register_perf_domain Lukasz Luba
@ 2020-05-11 13:44   ` Quentin Perret
  0 siblings, 0 replies; 28+ messages in thread
From: Quentin Perret @ 2020-05-11 13:44 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx, Dietmar.Eggemann,
	cw00.choi, b.zolnierkie, rjw, sudeep.holla, viresh.kumar, nm,
	sboyd, rui.zhang, amit.kucheria, daniel.lezcano, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, bsegall, mgorman, shawnguo,
	s.hauer, festevam, kernel, khilman, agross, bjorn.andersson,
	robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

On Monday 11 May 2020 at 12:19:02 (+0100), Lukasz Luba wrote:
> Remove old function em_register_perf_domain which is no longer needed.
> There is em_dev_register_perf_domain that covers old use cases and new as
> well.
> 
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

So, as mentioned in patch 03, I think this one could be squashed. But in
any case:

Acked-by: Quentin Perret <qperret@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 06/15] PM / EM: change name of em_pd_energy to em_cpu_energy
  2020-05-11 11:19 ` [PATCH v7 06/15] PM / EM: change name of em_pd_energy to em_cpu_energy Lukasz Luba
@ 2020-05-11 13:45   ` Quentin Perret
  0 siblings, 0 replies; 28+ messages in thread
From: Quentin Perret @ 2020-05-11 13:45 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx, Dietmar.Eggemann,
	cw00.choi, b.zolnierkie, rjw, sudeep.holla, viresh.kumar, nm,
	sboyd, rui.zhang, amit.kucheria, daniel.lezcano, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, bsegall, mgorman, shawnguo,
	s.hauer, festevam, kernel, khilman, agross, bjorn.andersson,
	robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

On Monday 11 May 2020 at 12:19:03 (+0100), Lukasz Luba wrote:
> Energy Model framework supports now other devices than CPUs. Refactor some
> of the functions in order to prevent wrong usage. The old function
> em_pd_energy has to generic name. It must not be used without proper
> cpumask pointer, which is possible only for CPU devices. Thus, rename it
> and add proper description to warn of potential wrong usage for other
> devices.
> 
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

Acked-by: Quentin Perret <qperret@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 07/15] Documentation: power: update Energy Model description
  2020-05-11 11:19 ` [PATCH v7 07/15] Documentation: power: update Energy Model description Lukasz Luba
@ 2020-05-11 13:48   ` Quentin Perret
  0 siblings, 0 replies; 28+ messages in thread
From: Quentin Perret @ 2020-05-11 13:48 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx, Dietmar.Eggemann,
	cw00.choi, b.zolnierkie, rjw, sudeep.holla, viresh.kumar, nm,
	sboyd, rui.zhang, amit.kucheria, daniel.lezcano, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, bsegall, mgorman, shawnguo,
	s.hauer, festevam, kernel, khilman, agross, bjorn.andersson,
	robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

On Monday 11 May 2020 at 12:19:04 (+0100), Lukasz Luba wrote:
> The Energy Model framework supports also other devices than CPUs. Update
> related information and add description for the new usage.
> 
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

Acked-by: Quentin Perret <qperret@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 03/15] PM / EM: update callback structure and add device pointer
  2020-05-11 11:57   ` Quentin Perret
@ 2020-05-12 11:11     ` Lukasz Luba
  0 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-12 11:11 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx, Dietmar.Eggemann,
	cw00.choi, b.zolnierkie, rjw, sudeep.holla, viresh.kumar, nm,
	sboyd, rui.zhang, amit.kucheria, daniel.lezcano, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, bsegall, mgorman, shawnguo,
	s.hauer, festevam, kernel, khilman, agross, bjorn.andersson,
	robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

Hi Quentin,

On 5/11/20 12:57 PM, Quentin Perret wrote:
> On Monday 11 May 2020 at 12:19:00 (+0100), Lukasz Luba wrote:
>> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
>> index 61623e2ff149..11ee24e06d12 100644
>> --- a/drivers/cpufreq/scmi-cpufreq.c
>> +++ b/drivers/cpufreq/scmi-cpufreq.c
>> @@ -103,17 +103,12 @@ scmi_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask)
>>   }
>>   
>>   static int __maybe_unused
>> -scmi_get_cpu_power(unsigned long *power, unsigned long *KHz, int cpu)
>> +scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
>> +		   struct device *cpu_dev)
>>   {
>> -	struct device *cpu_dev = get_cpu_device(cpu);
>>   	unsigned long Hz;
>>   	int ret, domain;
>>   
>> -	if (!cpu_dev) {
>> -		pr_err("failed to get cpu%d device\n", cpu);
>> -		return -ENODEV;
>> -	}
>> -
>>   	domain = handle->perf_ops->device_domain_id(cpu_dev);
>>   	if (domain < 0)
>>   		return domain;
>> @@ -200,7 +195,7 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
>>   
>>   	policy->fast_switch_possible = true;
>>   
>> -	em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
> 
> So this one has no users after this patch right? I suppose you could
> squash patch 05 in this one. But no big deal.

Yes, it was tricky to me to decide the splits suggested by Daniel and
this is the example. I had to introduce the em_dev_register_perf_domain
and make clients of it before I remove the old em_register_perf_domain
completely. I agree it could also go with the patch 5, but it does not
harm to be here.

> 
> Acked-by: Quentin Perret <qperret@google.com>

Thank you for this ACKs and the earlier.

Regards,
Lukasz

> 
>> +	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
>>   
>>   	return 0;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 04/15] PM / EM: add support for other devices than CPUs in Energy Model
  2020-05-11 13:43   ` Quentin Perret
@ 2020-05-12 11:38     ` Lukasz Luba
  0 siblings, 0 replies; 28+ messages in thread
From: Lukasz Luba @ 2020-05-12 11:38 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-kernel, linux-pm, linux-arm-kernel, dri-devel, linux-omap,
	linux-mediatek, linux-arm-msm, linux-imx, Dietmar.Eggemann,
	cw00.choi, b.zolnierkie, rjw, sudeep.holla, viresh.kumar, nm,
	sboyd, rui.zhang, amit.kucheria, daniel.lezcano, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, bsegall, mgorman, shawnguo,
	s.hauer, festevam, kernel, khilman, agross, bjorn.andersson,
	robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka



On 5/11/20 2:43 PM, Quentin Perret wrote:
> Hey Lukasz,
> 
> On Monday 11 May 2020 at 12:19:01 (+0100), Lukasz Luba wrote:
> <snip>
>> @@ -27,12 +29,15 @@ struct em_perf_state {
>>    * em_perf_domain - Performance domain
>>    * @table:		List of performance states, in ascending order
>>    * @nr_perf_states:	Number of performance states
>> - * @cpus:		Cpumask covering the CPUs of the domain
>> + * @cpus:		Cpumask covering the CPUs of the domain. It's here
>> + *			for performance reasons to avoid potential cache
>> + *			misses during energy calculations in the scheduler
> 
> And because that saves a pointer, and simplifies allocating/freeing that
> memory region :)

True, I will add this also:
'and simplifies allocating/freeing that memory region'

> 
> <snip>
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index 5b8a1566526a..9cc7f2973600 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -2,8 +2,9 @@
>>   /*
>>    * Energy Model of CPUs
> 
> Should this comment change too?

Yes, indeed. I will adjust it.

> 
> <snip>
>> -static void em_debug_create_pd(struct em_perf_domain *pd, int cpu)
>> +static void em_debug_create_pd(struct device *dev)
>>   {
>>   	struct dentry *d;
>> -	char name[8];
>>   	int i;
>>   
>> -	snprintf(name, sizeof(name), "pd%d", cpu);
>> -
>>   	/* Create the directory of the performance domain */
>> -	d = debugfs_create_dir(name, rootdir);
>> +	d = debugfs_create_dir(dev_name(dev), rootdir);
> 
> So what will be the name for the perf domain of CPUs now? cpuX?

yeap, it will be 'cpu0', 'cpu4', etc...

> 
> <snip>
>> @@ -142,8 +149,8 @@ em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
>>   		 */
>>   		opp_eff = freq / power;
>>   		if (opp_eff >= prev_opp_eff)
>> -			pr_warn("pd%d: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
>> -					cpu, i, i - 1);
>> +			dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
>> +					i, i - 1);
> 
> It feels like changing from warn to debug doesn't really belong to this
> patch no?

I thought that these prints are not worth to introduce another patch.
This warning is a bit tricky, because we (SW eng) basically are not able
to tweak OPPs, we can only remove them to calm down this warning.

There are platforms, with dozen of OPPs, seeing this. Warnings triggers 
the automated tests scripts, which are sensitive to dmesg log level and
cause developers to spent time and investigate the issue.

Then, what if these OPPs are needed because the thermal was tested OK
with some OPPs which unfortunately are triggering also this warning.
They cannot remove these OPPS, but the warning would stay. We might see
this also for GPUs.

I decided to change it into dbg, due to the reason above.

> 
> <snip>
>> @@ -216,47 +274,50 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>>   	 */
>>   	mutex_lock(&em_pd_mutex);
>>   
>> -	for_each_cpu(cpu, span) {
>> -		/* Make sure we don't register again an existing domain. */
>> -		if (READ_ONCE(per_cpu(em_data, cpu))) {
>> -			ret = -EEXIST;
>> -			goto unlock;
>> -		}
>> +	if (dev->em_pd) {
>> +		ret = -EEXIST;
>> +		goto unlock;
>> +	}
>>   
>> -		/*
>> -		 * All CPUs of a domain must have the same micro-architecture
>> -		 * since they all share the same table.
>> -		 */
>> -		cap = arch_scale_cpu_capacity(cpu);
>> -		if (prev_cap && prev_cap != cap) {
>> -			pr_err("CPUs of %*pbl must have the same capacity\n",
>> -							cpumask_pr_args(span));
>> +	if (_is_cpu_device(dev)) {
> 
> Something like
> 
> 	if (!_is_cpu_device(dev))
> 		goto device;
> 
> would limit the diff a bit, but that may just be personal taste.

Possible

> 
> But appart from these nits, the patch LGTM.

Thank you for the review.

I will wait for Daniel's (because he suggested the em_pd inside
device struct) comments and if there is no other issues I will just
resend the patch with adjusted comment fields in response.

Regards,
Lukasz

> 
> Thanks,
> Quentin
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 00/15] Add support for devices in the Energy Model
  2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
                   ` (14 preceding siblings ...)
  2020-05-11 11:19 ` [PATCH v7 15/15] drm/panfrost: Register devfreq cooling and attempt to add Energy Model Lukasz Luba
@ 2020-05-22 10:43 ` Daniel Lezcano
  2020-05-22 12:58   ` Lukasz Luba
  15 siblings, 1 reply; 28+ messages in thread
From: Daniel Lezcano @ 2020-05-22 10:43 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, linux-arm-kernel, dri-devel,
	linux-omap, linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, qperret, bsegall, mgorman,
	shawnguo, s.hauer, festevam, kernel, khilman, agross,
	bjorn.andersson, robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka


Hi Lukasz,

On 11/05/2020 13:18, Lukasz Luba wrote:
> Hi all,
> 
> This patch set introduces support for devices in the Energy Model (EM)
> framework. It will unify the power model for thermal subsystem. It will
> make simpler to add support for new devices willing to use more
> advanced features (like Intelligent Power Allocation). Now it should
> require less knowledge and effort for driver developer to add e.g.
> GPU driver with simple energy model. A more sophisticated energy model
> in the thermal framework is also possible, driver needs to provide
> a dedicated callback function. More information can be found in the
> updated documentation file.
> 
> First 7 patches are refactoring Energy Model framework to add support
> of other devices that CPUs. They change:
> - naming convention from 'capacity' to 'performance' state,
> - API arguments adding device pointer and not rely only on cpumask,
> - change naming when 'cpu' was used, now it's a 'device'
> - internal structure to maintain registered devices
> - update users to the new API
> Patch 8 updates OPP framework helper function to be more generic, not
> CPU specific.
> Patches 9-14 change devfreq cooling, dropping part of old power model and
> adding registration with Energy Model via exported GPL function.
> The last path is a simple change for Panfrost GPU driver.
> 
> The patch set is based on linux-next tag next-20200508.

Do you think it is possible to respin against linux-pm next ?

I wanted to try the series but I'm getting non trivial conflicts with
the devfreq_cooling changes


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 00/15] Add support for devices in the Energy Model
  2020-05-22 10:43 ` [PATCH v7 00/15] Add support for devices in the " Daniel Lezcano
@ 2020-05-22 12:58   ` Lukasz Luba
  2020-05-22 13:01     ` Daniel Lezcano
  0 siblings, 1 reply; 28+ messages in thread
From: Lukasz Luba @ 2020-05-22 12:58 UTC (permalink / raw)
  To: Daniel Lezcano, linux-kernel, linux-pm, linux-arm-kernel,
	dri-devel, linux-omap, linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, qperret, bsegall, mgorman,
	shawnguo, s.hauer, festevam, kernel, khilman, agross,
	bjorn.andersson, robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

Hi Daniel,

On 5/22/20 11:43 AM, Daniel Lezcano wrote:
> 
> Hi Lukasz,
> 
> On 11/05/2020 13:18, Lukasz Luba wrote:
>> Hi all,
>>
>> This patch set introduces support for devices in the Energy Model (EM)
>> framework. It will unify the power model for thermal subsystem. It will
>> make simpler to add support for new devices willing to use more
>> advanced features (like Intelligent Power Allocation). Now it should
>> require less knowledge and effort for driver developer to add e.g.
>> GPU driver with simple energy model. A more sophisticated energy model
>> in the thermal framework is also possible, driver needs to provide
>> a dedicated callback function. More information can be found in the
>> updated documentation file.
>>
>> First 7 patches are refactoring Energy Model framework to add support
>> of other devices that CPUs. They change:
>> - naming convention from 'capacity' to 'performance' state,
>> - API arguments adding device pointer and not rely only on cpumask,
>> - change naming when 'cpu' was used, now it's a 'device'
>> - internal structure to maintain registered devices
>> - update users to the new API
>> Patch 8 updates OPP framework helper function to be more generic, not
>> CPU specific.
>> Patches 9-14 change devfreq cooling, dropping part of old power model and
>> adding registration with Energy Model via exported GPL function.
>> The last path is a simple change for Panfrost GPU driver.
>>
>> The patch set is based on linux-next tag next-20200508.
> 
> Do you think it is possible to respin against linux-pm next ?

Yes, I will do it and send the v8.

> 
> I wanted to try the series but I'm getting non trivial conflicts with
> the devfreq_cooling changes
> 
> 

Let me take care of this.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7 00/15] Add support for devices in the Energy Model
  2020-05-22 12:58   ` Lukasz Luba
@ 2020-05-22 13:01     ` Daniel Lezcano
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Lezcano @ 2020-05-22 13:01 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, linux-arm-kernel, dri-devel,
	linux-omap, linux-mediatek, linux-arm-msm, linux-imx
  Cc: Dietmar.Eggemann, cw00.choi, b.zolnierkie, rjw, sudeep.holla,
	viresh.kumar, nm, sboyd, rui.zhang, amit.kucheria, mingo, peterz,
	juri.lelli, vincent.guittot, rostedt, qperret, bsegall, mgorman,
	shawnguo, s.hauer, festevam, kernel, khilman, agross,
	bjorn.andersson, robh, matthias.bgg, steven.price, tomeu.vizoso,
	alyssa.rosenzweig, airlied, daniel, liviu.dudau,
	lorenzo.pieralisi, patrick.bellasi, orjan.eide, rdunlap, mka

On 22/05/2020 14:58, Lukasz Luba wrote:

[ ... ]

>>>
>>> The patch set is based on linux-next tag next-20200508.
>>
>> Do you think it is possible to respin against linux-pm next ?
> 
> Yes, I will do it and send the v8.
> 
>>
>> I wanted to try the series but I'm getting non trivial conflicts with
>> the devfreq_cooling changes
>>
>>
> 
> Let me take care of this.

Thanks Lukasz !
-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, back to index

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11 11:18 [PATCH v7 00/15] Add support for devices in the Energy Model Lukasz Luba
2020-05-11 11:18 ` [PATCH v7 01/15] PM / EM: change naming convention from 'capacity' to 'performance' Lukasz Luba
2020-05-11 11:46   ` Quentin Perret
2020-05-11 11:18 ` [PATCH v7 02/15] PM / EM: introduce em_dev_register_perf_domain function Lukasz Luba
2020-05-11 11:51   ` Quentin Perret
2020-05-11 11:19 ` [PATCH v7 03/15] PM / EM: update callback structure and add device pointer Lukasz Luba
2020-05-11 11:57   ` Quentin Perret
2020-05-12 11:11     ` Lukasz Luba
2020-05-11 11:19 ` [PATCH v7 04/15] PM / EM: add support for other devices than CPUs in Energy Model Lukasz Luba
2020-05-11 13:43   ` Quentin Perret
2020-05-12 11:38     ` Lukasz Luba
2020-05-11 11:19 ` [PATCH v7 05/15] PM / EM: remove em_register_perf_domain Lukasz Luba
2020-05-11 13:44   ` Quentin Perret
2020-05-11 11:19 ` [PATCH v7 06/15] PM / EM: change name of em_pd_energy to em_cpu_energy Lukasz Luba
2020-05-11 13:45   ` Quentin Perret
2020-05-11 11:19 ` [PATCH v7 07/15] Documentation: power: update Energy Model description Lukasz Luba
2020-05-11 13:48   ` Quentin Perret
2020-05-11 11:19 ` [PATCH v7 08/15] OPP: refactor dev_pm_opp_of_register_em() and update related drivers Lukasz Luba
2020-05-11 11:19 ` [PATCH v7 09/15] thermal: devfreq_cooling: change tracing function and arguments Lukasz Luba
2020-05-11 11:19 ` [PATCH v7 10/15] thermal: devfreq_cooling: get device load and frequency directly Lukasz Luba
2020-05-11 11:19 ` [PATCH v7 11/15] thermal: devfreq_cooling: work on a copy of device status Lukasz Luba
2020-05-11 11:19 ` [PATCH v7 12/15] thermal: devfreq_cooling: add new registration functions with Energy Model Lukasz Luba
2020-05-11 11:19 ` [PATCH v7 13/15] thermal: devfreq_cooling: remove old power model and use EM Lukasz Luba
2020-05-11 11:19 ` [PATCH v7 14/15] thermal: devfreq_cooling: update license to use SPDX Lukasz Luba
2020-05-11 11:19 ` [PATCH v7 15/15] drm/panfrost: Register devfreq cooling and attempt to add Energy Model Lukasz Luba
2020-05-22 10:43 ` [PATCH v7 00/15] Add support for devices in the " Daniel Lezcano
2020-05-22 12:58   ` Lukasz Luba
2020-05-22 13:01     ` Daniel Lezcano

Linux-OMAP Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-omap/0 linux-omap/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-omap linux-omap/ https://lore.kernel.org/linux-omap \
		linux-omap@vger.kernel.org
	public-inbox-index linux-omap

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-omap


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git