All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/9] Inefficient OPPs
@ 2021-09-08 14:05 Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 1/9] PM / EM: Fix inefficient states detection Vincent Donnefort
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Hi all,

Here's the new version for the inefficient OPPs. This patch-set is based on the
following series from Viresh:

  [PATCH V3 0/9] Add callback to register with energy model
  https://lore.kernel.org/linux-arm-msm/cover.1628742634.git.viresh.kumar@linaro.org/

The main change in this version is the re-introduction of CPUFREQ_RELATION_E,
as a relation flag. When set, all relations will try to resolve a frequency
across efficient frequencies only.

A bit of context:

We (Power team in Arm) are working with an experimental kernel for the
Google's Pixel4 to evaluate and improve the current mainline performance
and energy consumption on a real life device with Android.

The SD855 SoC found in this phone has several OPPs that are inefficient.
I.e. despite a lower frequency, they have a greater cost. (That cost being
fmax * OPP power / OPP freq). This issue is twofold. First of course,
running a specific workload at an inefficient OPP is counterproductive
since it wastes wasting energy. But also, inefficient OPPs make a
performance domain less appealing for task placement than it really is.

We evaluated the change presented here by running 30 iterations of Android
PCMark "Work 2.0 Performance". While we did not see any statistically
significant performance impact, this change allowed to drastically improve
the idle time residency.


                           |   Running   |  WFI [1]  |    Idle   |
   ------------------------+-------------+-----------+-----------+
   Little cluster (4 CPUs) |    -0.35%   |   +0.35%  |   +0.79%  |
   ------------------------+-------------+-----------+-----------+
   Medium cluster (3 CPUs) |    -6.3%    |    -18%   |    +12%   |
   ------------------------+-------------+-----------+-----------+
   Big cluster    (1 CPU)  |    -6.4%    |    -6.5%  |    +2.8%  |
   ------------------------+-------------+-----------+-----------+

On the SD855, the inefficient OPPs are found on the little cluster. By
removing them from the Energy Model, we make the most efficient CPUs more
appealing for task placement, helping to reduce the running time for the
medium and big CPUs. Increasing idle time is crucial for this platform due
to the substantial energy cost differences among the clusters. Also,
despite not appearing in the statistics (the idle driver used here doesn't
report it), we can speculate that we also improve the cluster idle time.

[1] WFI: Wait for interrupt.

Changelog since v6:
  - Bring back CPUFREQ_RELATION_E as a relation flag.
  - Make the policy min/max hard limits.
  - Remove the "efficient" member from the freq_table that was pointing to the
    next efficient frequency.

Changelog since v5:
  - EM setup inefficient frequencies in CPUFreq, instead of CPUFreq settings its
    own inefficient frequencies from EM.

Changelog since v4:
  - Remove CPUFREQ_RELATION_E.
  - Skip inefficient OPPs for all governors with CPUFREQ_GOV_DYNAMIC_SWITCHING
  - Remove CPUFREQ_READ_ENERGY_MODEL in favor of the register_em callback.

Changelog since v3:
  - New freq-table relation CPUFREQ_RELATION_E.
  - New CPUFreq driver flag CPUFREQ_READ_ENERGY_MODEL.
  - EM flag to skip or not inefficiencies (driven by CPUFreq).
  - Fix infinite loop in set_freq_table_efficiencies().

Changelog since v2:
  - Add separated support for inefficiencies into CPUFreq.
  - Collect Reviewed-by for the first patch.

Changelog since v1:
  - Remove the Look-up table as the numbers weren't strong enough to


Vincent Donnefort (9):
  PM / EM: Fix inefficient states detection
  PM / EM: Mark inefficient states
  PM / EM: Extend em_perf_domain with a flag field
  PM / EM: Allow skipping inefficient states
  cpufreq: Make policy min/max hard requirements
  cpufreq: Add an interface to mark inefficient frequencies
  cpufreq: Introducing CPUFREQ_RELATION_E
  cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
  PM / EM: Mark inefficiencies in CPUFreq

 drivers/cpufreq/acpi-cpufreq.c         |   3 +-
 drivers/cpufreq/amd_freq_sensitivity.c |   3 +-
 drivers/cpufreq/cpufreq.c              |  19 +++-
 drivers/cpufreq/cpufreq_conservative.c |   6 +-
 drivers/cpufreq/cpufreq_ondemand.c     |  16 ++--
 drivers/cpufreq/powernv-cpufreq.c      |   4 +-
 drivers/cpufreq/s5pv210-cpufreq.c      |   2 +-
 include/linux/cpufreq.h                | 169 ++++++++++++++++++++++++++-------
 include/linux/energy_model.h           |  68 +++++++++++--
 kernel/power/energy_model.c            |  86 +++++++++++++----
 10 files changed, 301 insertions(+), 75 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v7 1/9] PM / EM: Fix inefficient states detection
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
@ 2021-09-08 14:05 ` Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 2/9] PM / EM: Mark inefficient states Vincent Donnefort
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Currently, a debug message is printed if an inefficient state is detected
in the Energy Model. Unfortunately, it won't detect if the first state is
inefficient or if two successive states are. Fix this behavior.

Fixes: 27871f7a8a34 (PM: Introduce an Energy Model management framework)
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index a332ccd829e2..97e62469a6b3 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -107,8 +107,7 @@ static void em_debug_remove_pd(struct device *dev) {}
 static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 				int nr_states, struct em_data_callback *cb)
 {
-	unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
-	unsigned long power, freq, prev_freq = 0;
+	unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX;
 	struct em_perf_state *table;
 	int i, ret;
 	u64 fmax;
@@ -153,27 +152,21 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 
 		table[i].power = power;
 		table[i].frequency = prev_freq = freq;
-
-		/*
-		 * The hertz/watts efficiency ratio should decrease as the
-		 * frequency grows on sane platforms. But this isn't always
-		 * true in practice so warn the user if a higher OPP is more
-		 * power efficient than a lower one.
-		 */
-		opp_eff = freq / power;
-		if (opp_eff >= prev_opp_eff)
-			dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
-					i, i - 1);
-		prev_opp_eff = opp_eff;
 	}
 
 	/* Compute the cost of each performance state. */
 	fmax = (u64) table[nr_states - 1].frequency;
-	for (i = 0; i < nr_states; i++) {
+	for (i = nr_states - 1; i >= 0; i--) {
 		unsigned long power_res = em_scale_power(table[i].power);
 
 		table[i].cost = div64_u64(fmax * power_res,
 					  table[i].frequency);
+		if (table[i].cost >= prev_cost) {
+			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
+				table[i].frequency);
+		} else {
+			prev_cost = table[i].cost;
+		}
 	}
 
 	pd->table = table;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 2/9] PM / EM: Mark inefficient states
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 1/9] PM / EM: Fix inefficient states detection Vincent Donnefort
@ 2021-09-08 14:05 ` Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 3/9] PM / EM: Extend em_perf_domain with a flag field Vincent Donnefort
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Some SoCs, such as the sd855 have OPPs within the same performance domain,
whose cost is higher than others with a higher frequency. Even though
those OPPs are interesting from a cooling perspective, it makes no sense
to use them when the device can run at full capacity. Those OPPs handicap
the performance domain, when choosing the most energy-efficient CPU and
are wasting energy. They are inefficient.

Hence, add support for such OPPs to the Energy Model. The table can now
be read skipping inefficient performance states (and by extension,
inefficient OPPs).

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 1834752c5617..629f5f63a7d7 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -17,13 +17,25 @@
  *		device). It can be a total power: static and dynamic.
  * @cost:	The cost coefficient associated with this level, used during
  *		energy calculation. Equal to: power * max_frequency / frequency
+ * @flags:	see "em_perf_state flags" description below.
  */
 struct em_perf_state {
 	unsigned long frequency;
 	unsigned long power;
 	unsigned long cost;
+	unsigned long flags;
 };
 
+/*
+ * em_perf_state flags:
+ *
+ * EM_PERF_STATE_INEFFICIENT: The performance state is inefficient. There is
+ * in this em_perf_domain, another performance state with a higher frequency
+ * but a lower or equal power cost. Such inefficient states are ignored when
+ * using em_pd_get_efficient_*() functions.
+ */
+#define EM_PERF_STATE_INEFFICIENT BIT(0)
+
 /**
  * em_perf_domain - Performance domain
  * @table:		List of performance states, in ascending order
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 97e62469a6b3..6d8438347535 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -2,7 +2,7 @@
 /*
  * Energy Model of devices
  *
- * Copyright (c) 2018-2020, Arm ltd.
+ * Copyright (c) 2018-2021, Arm ltd.
  * Written by: Quentin Perret, Arm ltd.
  * Improvements provided by: Lukasz Luba, Arm ltd.
  */
@@ -42,6 +42,7 @@ static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
 	debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
 	debugfs_create_ulong("power", 0444, d, &ps->power);
 	debugfs_create_ulong("cost", 0444, d, &ps->cost);
+	debugfs_create_ulong("inefficient", 0444, d, &ps->flags);
 }
 
 static int em_debug_cpus_show(struct seq_file *s, void *unused)
@@ -162,6 +163,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 		table[i].cost = div64_u64(fmax * power_res,
 					  table[i].frequency);
 		if (table[i].cost >= prev_cost) {
+			table[i].flags = EM_PERF_STATE_INEFFICIENT;
 			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
 				table[i].frequency);
 		} else {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 3/9] PM / EM: Extend em_perf_domain with a flag field
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 1/9] PM / EM: Fix inefficient states detection Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 2/9] PM / EM: Mark inefficient states Vincent Donnefort
@ 2021-09-08 14:05 ` Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 4/9] PM / EM: Allow skipping inefficient states Vincent Donnefort
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Merge the current "milliwatts" option into a "flag" field. This intends to
prepare the extension of this structure for inefficient states support in
the Energy Model.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 629f5f63a7d7..ac2f7d0ab946 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -40,8 +40,7 @@ struct em_perf_state {
  * em_perf_domain - Performance domain
  * @table:		List of performance states, in ascending order
  * @nr_perf_states:	Number of performance states
- * @milliwatts:		Flag indicating the power values are in milli-Watts
- *			or some other scale.
+ * @flags:		See "em_perf_domain flags"
  * @cpus:		Cpumask covering the CPUs of the domain. It's here
  *			for performance reasons to avoid potential cache
  *			misses during energy calculations in the scheduler
@@ -56,10 +55,18 @@ struct em_perf_state {
 struct em_perf_domain {
 	struct em_perf_state *table;
 	int nr_perf_states;
-	int milliwatts;
+	unsigned long flags;
 	unsigned long cpus[];
 };
 
+/*
+ *  em_perf_domain flags:
+ *
+ *  EM_PERF_DOMAIN_MILLIWATTS: The power values are in milli-Watts or some
+ *  other scale.
+ */
+#define EM_PERF_DOMAIN_MILLIWATTS BIT(0)
+
 #define em_span_cpus(em) (to_cpumask((em)->cpus))
 
 #ifdef CONFIG_ENERGY_MODEL
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 6d8438347535..3a7d1573b214 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -56,7 +56,8 @@ DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
 static int em_debug_units_show(struct seq_file *s, void *unused)
 {
 	struct em_perf_domain *pd = s->private;
-	char *units = pd->milliwatts ? "milliWatts" : "bogoWatts";
+	char *units = (pd->flags & EM_PERF_DOMAIN_MILLIWATTS) ?
+		"milliWatts" : "bogoWatts";
 
 	seq_printf(s, "%s\n", units);
 
@@ -330,7 +331,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 	if (ret)
 		goto unlock;
 
-	dev->em_pd->milliwatts = milliwatts;
+	if (milliwatts)
+		dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS;
 
 	em_debug_create_pd(dev);
 	dev_info(dev, "EM: created perf domain\n");
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 4/9] PM / EM: Allow skipping inefficient states
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
                   ` (2 preceding siblings ...)
  2021-09-08 14:05 ` [PATCH v7 3/9] PM / EM: Extend em_perf_domain with a flag field Vincent Donnefort
@ 2021-09-08 14:05 ` Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 5/9] cpufreq: Make policy min/max hard requirements Vincent Donnefort
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

The new performance domain flag EM_PERF_DOMAIN_SKIP_INEFFICIENCIES allows
to not take into account inefficient states when estimating energy
consumption. This intends to let the Energy Model know that CPUFreq itself
will skip inefficiencies and such states don't need to be part of the
estimation anymore.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index ac2f7d0ab946..15d41f09f009 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -64,8 +64,12 @@ struct em_perf_domain {
  *
  *  EM_PERF_DOMAIN_MILLIWATTS: The power values are in milli-Watts or some
  *  other scale.
+ *
+ *  EM_PERF_DOMAIN_SKIP_INEFFICIENCIES: Skip inefficient states when estimating
+ *  energy consumption.
  */
 #define EM_PERF_DOMAIN_MILLIWATTS BIT(0)
+#define EM_PERF_DOMAIN_SKIP_INEFFICIENCIES BIT(1)
 
 #define em_span_cpus(em) (to_cpumask((em)->cpus))
 
@@ -121,6 +125,37 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 void em_dev_unregister_perf_domain(struct device *dev);
 
 /**
+ * em_pd_get_efficient_state() - Get an efficient performance state from the EM
+ * @pd   : Performance domain for which we want an efficient frequency
+ * @freq : Frequency to map with the EM
+ *
+ * It is called from the scheduler code quite frequently and as a consequence
+ * doesn't implement any check.
+ *
+ * Return: An efficient performance state, high enough to meet @freq
+ * requirement.
+ */
+static inline
+struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd,
+						unsigned long freq)
+{
+	struct em_perf_state *ps;
+	int i;
+
+	for (i = 0; i < pd->nr_perf_states; i++) {
+		ps = &pd->table[i];
+		if (ps->frequency >= freq) {
+			if (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
+			    ps->flags & EM_PERF_STATE_INEFFICIENT)
+				continue;
+			break;
+		}
+	}
+
+	return ps;
+}
+
+/**
  * em_cpu_energy() - Estimates the energy consumed by the CPUs of a
 		performance domain
  * @pd		: performance domain for which energy has to be estimated
@@ -142,7 +177,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 {
 	unsigned long freq, scale_cpu;
 	struct em_perf_state *ps;
-	int i, cpu;
+	int cpu;
 
 	if (!sum_util)
 		return 0;
@@ -167,11 +202,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 * Find the lowest performance state of the Energy Model above the
 	 * requested frequency.
 	 */
-	for (i = 0; i < pd->nr_perf_states; i++) {
-		ps = &pd->table[i];
-		if (ps->frequency >= freq)
-			break;
-	}
+	ps = em_pd_get_efficient_state(pd, freq);
 
 	/*
 	 * The capacity of a CPU in the domain at the performance state (ps)
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 3a7d1573b214..d353ef29e37f 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -65,6 +65,17 @@ static int em_debug_units_show(struct seq_file *s, void *unused)
 }
 DEFINE_SHOW_ATTRIBUTE(em_debug_units);
 
+static int em_debug_skip_inefficiencies_show(struct seq_file *s, void *unused)
+{
+	struct em_perf_domain *pd = s->private;
+	int enabled = (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES) ? 1 : 0;
+
+	seq_printf(s, "%d\n", enabled);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(em_debug_skip_inefficiencies);
+
 static void em_debug_create_pd(struct device *dev)
 {
 	struct dentry *d;
@@ -78,6 +89,8 @@ static void em_debug_create_pd(struct device *dev)
 				    &em_debug_cpus_fops);
 
 	debugfs_create_file("units", 0444, d, dev->em_pd, &em_debug_units_fops);
+	debugfs_create_file("skip-inefficiencies", 0444, d, dev->em_pd,
+			    &em_debug_skip_inefficiencies_fops);
 
 	/* Create a sub-directory for each performance state */
 	for (i = 0; i < dev->em_pd->nr_perf_states; i++)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 5/9] cpufreq: Make policy min/max hard requirements
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
                   ` (3 preceding siblings ...)
  2021-09-08 14:05 ` [PATCH v7 4/9] PM / EM: Allow skipping inefficient states Vincent Donnefort
@ 2021-09-08 14:05 ` Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 6/9] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

When applying the policy min/max limits, the requested frequency is
simply clamped to not be out of range. It means, however, if one of the
boundaries isn't an available frequency, the frequency resolution can
return a value out of those limits, depending on the relation used.

e.g. freq{0,1,2} being available frequencies.

          freq0  policy->min  freq1  policy->max   freq2
            |        |          |        |           |
          17kHz     18kHz     19kHz     20kHz      21kHz

     __resolve_freq(21kHz, CPUFREQ_RELATION_L) -> 21kHz (out of bounds)
     __resolve_freq(17kHz, CPUFREQ_RELATION_H) -> 17kHz (out of bounds)

If, during the policy init, we resolve the requested min/max to existing
frequencies, we ensure that any CPUFREQ_RELATION_* would resolve to a
frequency which is inside the policy min/max range.

Making the policy limits rigid helps to introduce the inefficient
frequencies support. Resolving an inefficient frequency to an efficient
one should not transgress policy->max (which can be set for thermal
reason) and having a value we can trust simplify this comparison.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 7d5f170ecad1..95b0464249d6 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2527,8 +2527,15 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 	if (ret)
 		return ret;
 
+	/*
+	 * Resolve policy min/max to available frequencies. It ensures
+	 * no frequency resolution will neither overshoot the requested maximum
+	 * nor undershoot the requested minimum.
+	 */
 	policy->min = new_data.min;
 	policy->max = new_data.max;
+	policy->min = __resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
+	policy->max = __resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
 	trace_cpu_frequency_limits(policy);
 
 	policy->cached_target_freq = UINT_MAX;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 6/9] cpufreq: Add an interface to mark inefficient frequencies
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
                   ` (4 preceding siblings ...)
  2021-09-08 14:05 ` [PATCH v7 5/9] cpufreq: Make policy min/max hard requirements Vincent Donnefort
@ 2021-09-08 14:05 ` Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 7/9] cpufreq: Introducing CPUFREQ_RELATION_E Vincent Donnefort
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Some SoCs such as the sd855 have OPPs within the same policy whose cost is
higher than others with a higher frequency. Those OPPs are inefficients
and it might be interesting for a governor to not use them.

cpufreq_table_set_inefficient() allows the caller to identify a specified
frequency as being inefficient. Inefficient frequencies are only supported
on sorted tables.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index c65a1d7385f8..18b486a1c816 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -661,10 +661,11 @@ struct governor_attr {
  *********************************************************************/
 
 /* Special Values of .frequency field */
-#define CPUFREQ_ENTRY_INVALID	~0u
-#define CPUFREQ_TABLE_END	~1u
+#define CPUFREQ_ENTRY_INVALID		~0u
+#define CPUFREQ_TABLE_END		~1u
 /* Special Values of .flags field */
-#define CPUFREQ_BOOST_FREQ	(1 << 0)
+#define CPUFREQ_BOOST_FREQ		(1 << 0)
+#define CPUFREQ_INEFFICIENT_FREQ	(1 << 1)
 
 struct cpufreq_frequency_table {
 	unsigned int	flags;
@@ -1003,6 +1004,36 @@ static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy
 
 	return count;
 }
+
+/**
+ * cpufreq_table_set_inefficient() - Mark a frequency as inefficient
+ * @policy:	the &struct cpufreq_policy containing the inefficient frequency
+ * @frequency:	the inefficient frequency
+ *
+ * The &struct cpufreq_policy must use a sorted frequency table
+ *
+ * Return:	%0 on success or a negative errno code
+ */
+
+static inline int
+cpufreq_table_set_inefficient(struct cpufreq_policy *policy,
+			      unsigned int frequency)
+{
+	struct cpufreq_frequency_table *pos;
+
+	/* Not supported */
+	if (policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED)
+		return -EINVAL;
+
+	cpufreq_for_each_valid_entry(pos, policy->freq_table) {
+		if (pos->frequency == frequency) {
+			pos->flags |= CPUFREQ_INEFFICIENT_FREQ;
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
 #else
 static inline int cpufreq_boost_trigger_state(int state)
 {
@@ -1022,6 +1053,13 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
 {
 	return false;
 }
+
+static inline int
+cpufreq_table_set_inefficient(struct cpufreq_policy *policy,
+			      unsigned int frequency)
+{
+	return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 7/9] cpufreq: Introducing CPUFREQ_RELATION_E
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
                   ` (5 preceding siblings ...)
  2021-09-08 14:05 ` [PATCH v7 6/9] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
@ 2021-09-08 14:05 ` Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 8/9] cpufreq: Use CPUFREQ_RELATION_E in DVFS governors Vincent Donnefort
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

This newly introduced flag can be applied by a governor to a CPUFreq
relation, when looking for a frequency within the policy table. The
resolution would then only walk through efficient frequencies.

Even with the flag set, the policy max limit will still be honoured. If no
efficient frequencies can be found within the limits of the policy, an
inefficient one would be returned.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index b49612895c78..e88b6b5dd913 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -470,7 +470,8 @@ static unsigned int acpi_cpufreq_fast_switch(struct cpufreq_policy *policy,
 	if (policy->cached_target_freq == target_freq)
 		index = policy->cached_resolved_idx;
 	else
-		index = cpufreq_table_find_index_dl(policy, target_freq);
+		index = cpufreq_table_find_index_dl(policy, target_freq,
+						    false);
 
 	entry = &policy->freq_table[index];
 	next_freq = entry->frequency;
diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
index d0b10baf039a..6448e03bcf48 100644
--- a/drivers/cpufreq/amd_freq_sensitivity.c
+++ b/drivers/cpufreq/amd_freq_sensitivity.c
@@ -91,7 +91,8 @@ static unsigned int amd_powersave_bias_target(struct cpufreq_policy *policy,
 			unsigned int index;
 
 			index = cpufreq_table_find_index_h(policy,
-							   policy->cur - 1);
+							   policy->cur - 1,
+							   relation & CPUFREQ_RELATION_E);
 			freq_next = policy->freq_table[index].frequency;
 		}
 
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 95b0464249d6..9edf4abc9fa0 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2264,8 +2264,16 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
 	    !(cpufreq_driver->flags & CPUFREQ_NEED_UPDATE_LIMITS))
 		return 0;
 
-	if (cpufreq_driver->target)
+	if (cpufreq_driver->target) {
+		/*
+		 * If the driver hasn't setup a single inefficient frequency,
+		 * it's unlikely it knows how to decode CPUFREQ_RELATION_E.
+		 */
+		if (!policy->efficiencies_available)
+			relation &= ~CPUFREQ_RELATION_E;
+
 		return cpufreq_driver->target(policy, target_freq, relation);
+	}
 
 	if (!cpufreq_driver->target_index)
 		return -EINVAL;
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index eb4320b619c9..f68cad9abd11 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -83,9 +83,11 @@ static unsigned int generic_powersave_bias_target(struct cpufreq_policy *policy,
 	freq_avg = freq_req - freq_reduc;
 
 	/* Find freq bounds for freq_avg in freq_table */
-	index = cpufreq_table_find_index_h(policy, freq_avg);
+	index = cpufreq_table_find_index_h(policy, freq_avg,
+					   relation & CPUFREQ_RELATION_E);
 	freq_lo = freq_table[index].frequency;
-	index = cpufreq_table_find_index_l(policy, freq_avg);
+	index = cpufreq_table_find_index_l(policy, freq_avg,
+					   relation & CPUFREQ_RELATION_E);
 	freq_hi = freq_table[index].frequency;
 
 	/* Find out how long we have to be in hi and lo freqs */
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 23a06cba392c..ed5b317574ec 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -933,7 +933,7 @@ static void powernv_cpufreq_work_fn(struct work_struct *work)
 		policy = cpufreq_cpu_get(cpu);
 		if (!policy)
 			continue;
-		index = cpufreq_table_find_index_c(policy, policy->cur);
+		index = cpufreq_table_find_index_c(policy, policy->cur, false);
 		powernv_cpufreq_target_index(policy, index);
 		cpumask_andnot(&mask, &mask, policy->cpus);
 		cpufreq_cpu_put(policy);
@@ -1021,7 +1021,7 @@ static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
 	int index;
 	struct powernv_smp_call_data freq_data;
 
-	index = cpufreq_table_find_index_dl(policy, target_freq);
+	index = cpufreq_table_find_index_dl(policy, target_freq, false);
 	freq_data.pstate_id = powernv_freqs[index].driver_data;
 	freq_data.gpstate_id = powernv_freqs[index].driver_data;
 	set_pstate(&freq_data);
diff --git a/drivers/cpufreq/s5pv210-cpufreq.c b/drivers/cpufreq/s5pv210-cpufreq.c
index ad7d4f272ddc..76c888ed8d16 100644
--- a/drivers/cpufreq/s5pv210-cpufreq.c
+++ b/drivers/cpufreq/s5pv210-cpufreq.c
@@ -243,7 +243,7 @@ static int s5pv210_target(struct cpufreq_policy *policy, unsigned int index)
 	new_freq = s5pv210_freq_table[index].frequency;
 
 	/* Finding current running level index */
-	priv_index = cpufreq_table_find_index_h(policy, old_freq);
+	priv_index = cpufreq_table_find_index_h(policy, old_freq, false);
 
 	arm_volt = dvs_conf[index].arm_volt;
 	int_volt = dvs_conf[index].int_volt;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 18b486a1c816..045565fe5b29 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -117,6 +117,13 @@ struct cpufreq_policy {
 	bool			strict_target;
 
 	/*
+	 * Set if inefficient frequencies were found in the frequency table.
+	 * This indicates if the relation flag CPUFREQ_RELATION_E can be
+	 * honored.
+	 */
+	bool			efficiencies_available;
+
+	/*
 	 * Preferred average time interval between consecutive invocations of
 	 * the driver to set the frequency for this policy.  To be set by the
 	 * scaling driver (0, which is the default, means no preference).
@@ -271,6 +278,8 @@ static inline void cpufreq_stats_record_transition(struct cpufreq_policy *policy
 #define CPUFREQ_RELATION_L 0  /* lowest frequency at or above target */
 #define CPUFREQ_RELATION_H 1  /* highest frequency below or at target */
 #define CPUFREQ_RELATION_C 2  /* closest frequency to target */
+/* relation flags */
+#define CPUFREQ_RELATION_E BIT(2) /* Get if possible an efficient frequency */
 
 struct freq_attr {
 	struct attribute attr;
@@ -742,6 +751,22 @@ static inline void dev_pm_opp_free_cpufreq_table(struct device *dev,
 			continue;					\
 		else
 
+/**
+ * cpufreq_for_each_efficient_entry_idx - iterate with index over a cpufreq
+ *	frequency_table excluding CPUFREQ_ENTRY_INVALID and
+ *	CPUFREQ_INEFFICIENT_FREQ frequencies.
+ * @pos: the &struct cpufreq_frequency_table to use as a loop cursor.
+ * @table: the &struct cpufreq_frequency_table to iterate over.
+ * @idx: the table entry currently being processed.
+ * @efficiencies: set to true to only iterate over efficient frequencies.
+ */
+
+#define cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies)	\
+	cpufreq_for_each_valid_entry_idx(pos, table, idx)			\
+		if (efficiencies && (pos->flags & CPUFREQ_INEFFICIENT_FREQ))	\
+			continue;						\
+		else
+
 
 int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
 				    struct cpufreq_frequency_table *table);
@@ -766,14 +791,15 @@ bool policy_has_boost_freq(struct cpufreq_policy *policy);
 
 /* Find lowest freq at or above target in a table in ascending order */
 static inline int cpufreq_table_find_index_al(struct cpufreq_policy *policy,
-					      unsigned int target_freq)
+					      unsigned int target_freq,
+					      bool efficiencies)
 {
 	struct cpufreq_frequency_table *table = policy->freq_table;
 	struct cpufreq_frequency_table *pos;
 	unsigned int freq;
 	int idx, best = -1;
 
-	cpufreq_for_each_valid_entry_idx(pos, table, idx) {
+	cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
 		freq = pos->frequency;
 
 		if (freq >= target_freq)
@@ -787,14 +813,15 @@ static inline int cpufreq_table_find_index_al(struct cpufreq_policy *policy,
 
 /* Find lowest freq at or above target in a table in descending order */
 static inline int cpufreq_table_find_index_dl(struct cpufreq_policy *policy,
-					      unsigned int target_freq)
+					      unsigned int target_freq,
+					      bool efficiencies)
 {
 	struct cpufreq_frequency_table *table = policy->freq_table;
 	struct cpufreq_frequency_table *pos;
 	unsigned int freq;
 	int idx, best = -1;
 
-	cpufreq_for_each_valid_entry_idx(pos, table, idx) {
+	cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
 		freq = pos->frequency;
 
 		if (freq == target_freq)
@@ -817,26 +844,30 @@ static inline int cpufreq_table_find_index_dl(struct cpufreq_policy *policy,
 
 /* Works only on sorted freq-tables */
 static inline int cpufreq_table_find_index_l(struct cpufreq_policy *policy,
-					     unsigned int target_freq)
+					     unsigned int target_freq,
+					     bool efficiencies)
 {
 	target_freq = clamp_val(target_freq, policy->min, policy->max);
 
 	if (policy->freq_table_sorted == CPUFREQ_TABLE_SORTED_ASCENDING)
-		return cpufreq_table_find_index_al(policy, target_freq);
+		return cpufreq_table_find_index_al(policy, target_freq,
+						   efficiencies);
 	else
-		return cpufreq_table_find_index_dl(policy, target_freq);
+		return cpufreq_table_find_index_dl(policy, target_freq,
+						   efficiencies);
 }
 
 /* Find highest freq at or below target in a table in ascending order */
 static inline int cpufreq_table_find_index_ah(struct cpufreq_policy *policy,
-					      unsigned int target_freq)
+					      unsigned int target_freq,
+					      bool efficiencies)
 {
 	struct cpufreq_frequency_table *table = policy->freq_table;
 	struct cpufreq_frequency_table *pos;
 	unsigned int freq;
 	int idx, best = -1;
 
-	cpufreq_for_each_valid_entry_idx(pos, table, idx) {
+	cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
 		freq = pos->frequency;
 
 		if (freq == target_freq)
@@ -859,14 +890,15 @@ static inline int cpufreq_table_find_index_ah(struct cpufreq_policy *policy,
 
 /* Find highest freq at or below target in a table in descending order */
 static inline int cpufreq_table_find_index_dh(struct cpufreq_policy *policy,
-					      unsigned int target_freq)
+					      unsigned int target_freq,
+					      bool efficiencies)
 {
 	struct cpufreq_frequency_table *table = policy->freq_table;
 	struct cpufreq_frequency_table *pos;
 	unsigned int freq;
 	int idx, best = -1;
 
-	cpufreq_for_each_valid_entry_idx(pos, table, idx) {
+	cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
 		freq = pos->frequency;
 
 		if (freq <= target_freq)
@@ -880,26 +912,30 @@ static inline int cpufreq_table_find_index_dh(struct cpufreq_policy *policy,
 
 /* Works only on sorted freq-tables */
 static inline int cpufreq_table_find_index_h(struct cpufreq_policy *policy,
-					     unsigned int target_freq)
+					     unsigned int target_freq,
+					     bool efficiencies)
 {
 	target_freq = clamp_val(target_freq, policy->min, policy->max);
 
 	if (policy->freq_table_sorted == CPUFREQ_TABLE_SORTED_ASCENDING)
-		return cpufreq_table_find_index_ah(policy, target_freq);
+		return cpufreq_table_find_index_ah(policy, target_freq,
+						   efficiencies);
 	else
-		return cpufreq_table_find_index_dh(policy, target_freq);
+		return cpufreq_table_find_index_dh(policy, target_freq,
+						   efficiencies);
 }
 
 /* Find closest freq to target in a table in ascending order */
 static inline int cpufreq_table_find_index_ac(struct cpufreq_policy *policy,
-					      unsigned int target_freq)
+					      unsigned int target_freq,
+					      bool efficiencies)
 {
 	struct cpufreq_frequency_table *table = policy->freq_table;
 	struct cpufreq_frequency_table *pos;
 	unsigned int freq;
 	int idx, best = -1;
 
-	cpufreq_for_each_valid_entry_idx(pos, table, idx) {
+	cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
 		freq = pos->frequency;
 
 		if (freq == target_freq)
@@ -926,14 +962,15 @@ static inline int cpufreq_table_find_index_ac(struct cpufreq_policy *policy,
 
 /* Find closest freq to target in a table in descending order */
 static inline int cpufreq_table_find_index_dc(struct cpufreq_policy *policy,
-					      unsigned int target_freq)
+					      unsigned int target_freq,
+					      bool efficiencies)
 {
 	struct cpufreq_frequency_table *table = policy->freq_table;
 	struct cpufreq_frequency_table *pos;
 	unsigned int freq;
 	int idx, best = -1;
 
-	cpufreq_for_each_valid_entry_idx(pos, table, idx) {
+	cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
 		freq = pos->frequency;
 
 		if (freq == target_freq)
@@ -960,35 +997,58 @@ static inline int cpufreq_table_find_index_dc(struct cpufreq_policy *policy,
 
 /* Works only on sorted freq-tables */
 static inline int cpufreq_table_find_index_c(struct cpufreq_policy *policy,
-					     unsigned int target_freq)
+					     unsigned int target_freq,
+					     bool efficiencies)
 {
 	target_freq = clamp_val(target_freq, policy->min, policy->max);
 
 	if (policy->freq_table_sorted == CPUFREQ_TABLE_SORTED_ASCENDING)
-		return cpufreq_table_find_index_ac(policy, target_freq);
+		return cpufreq_table_find_index_ac(policy, target_freq,
+						   efficiencies);
 	else
-		return cpufreq_table_find_index_dc(policy, target_freq);
+		return cpufreq_table_find_index_dc(policy, target_freq,
+						   efficiencies);
 }
 
 static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
 						 unsigned int target_freq,
 						 unsigned int relation)
 {
+	bool efficiencies = policy->efficiencies_available &&
+			    (relation & CPUFREQ_RELATION_E);
+	int idx;
+
+	/* cpufreq_table_index_unsorted() has no use for this flag anyway */
+	relation &= ~CPUFREQ_RELATION_E;
+
 	if (unlikely(policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED))
 		return cpufreq_table_index_unsorted(policy, target_freq,
 						    relation);
-
+retry:
 	switch (relation) {
 	case CPUFREQ_RELATION_L:
-		return cpufreq_table_find_index_l(policy, target_freq);
+		idx = cpufreq_table_find_index_l(policy, target_freq,
+						 efficiencies);
+		break;
 	case CPUFREQ_RELATION_H:
-		return cpufreq_table_find_index_h(policy, target_freq);
+		idx = cpufreq_table_find_index_h(policy, target_freq,
+						 efficiencies);
+		break;
 	case CPUFREQ_RELATION_C:
-		return cpufreq_table_find_index_c(policy, target_freq);
+		idx = cpufreq_table_find_index_c(policy, target_freq,
+						 efficiencies);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
 	}
+
+	if (idx < 0 && efficiencies) {
+		efficiencies = false;
+		goto retry;
+	}
+
+	return idx;
 }
 
 static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy *policy)
@@ -1028,6 +1088,7 @@ cpufreq_table_set_inefficient(struct cpufreq_policy *policy,
 	cpufreq_for_each_valid_entry(pos, policy->freq_table) {
 		if (pos->frequency == frequency) {
 			pos->flags |= CPUFREQ_INEFFICIENT_FREQ;
+			policy->efficiencies_available = true;
 			return 0;
 		}
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 8/9] cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
                   ` (6 preceding siblings ...)
  2021-09-08 14:05 ` [PATCH v7 7/9] cpufreq: Introducing CPUFREQ_RELATION_E Vincent Donnefort
@ 2021-09-08 14:05 ` Vincent Donnefort
  2021-09-08 14:05 ` [PATCH v7 9/9] PM / EM: Mark inefficiencies in CPUFreq Vincent Donnefort
  2021-10-04  9:25 ` [PATCH v7 0/9] Inefficient OPPs Viresh Kumar
  9 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Let the governors schedutil, conservative and ondemand to work, if possible
on efficient frequencies only.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 9edf4abc9fa0..c6d82c28f768 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -554,7 +554,7 @@ static unsigned int __resolve_freq(struct cpufreq_policy *policy,
 unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
 					 unsigned int target_freq)
 {
-	return __resolve_freq(policy, target_freq, CPUFREQ_RELATION_L);
+	return __resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
 }
 EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
 
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index aa39ff31ec9f..0879ec3c170c 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -111,7 +111,8 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
 		if (requested_freq > policy->max)
 			requested_freq = policy->max;
 
-		__cpufreq_driver_target(policy, requested_freq, CPUFREQ_RELATION_H);
+		__cpufreq_driver_target(policy, requested_freq,
+					CPUFREQ_RELATION_HE);
 		dbs_info->requested_freq = requested_freq;
 		goto out;
 	}
@@ -134,7 +135,8 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
 		else
 			requested_freq = policy->min;
 
-		__cpufreq_driver_target(policy, requested_freq, CPUFREQ_RELATION_L);
+		__cpufreq_driver_target(policy, requested_freq,
+					CPUFREQ_RELATION_LE);
 		dbs_info->requested_freq = requested_freq;
 	}
 
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index f68cad9abd11..3b8f924771b4 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -120,12 +120,12 @@ static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq)
 
 	if (od_tuners->powersave_bias)
 		freq = od_ops.powersave_bias_target(policy, freq,
-				CPUFREQ_RELATION_H);
+						    CPUFREQ_RELATION_HE);
 	else if (policy->cur == policy->max)
 		return;
 
 	__cpufreq_driver_target(policy, freq, od_tuners->powersave_bias ?
-			CPUFREQ_RELATION_L : CPUFREQ_RELATION_H);
+			CPUFREQ_RELATION_LE : CPUFREQ_RELATION_HE);
 }
 
 /*
@@ -163,9 +163,9 @@ static void od_update(struct cpufreq_policy *policy)
 		if (od_tuners->powersave_bias)
 			freq_next = od_ops.powersave_bias_target(policy,
 								 freq_next,
-								 CPUFREQ_RELATION_L);
+								 CPUFREQ_RELATION_LE);
 
-		__cpufreq_driver_target(policy, freq_next, CPUFREQ_RELATION_C);
+		__cpufreq_driver_target(policy, freq_next, CPUFREQ_RELATION_CE);
 	}
 }
 
@@ -184,7 +184,7 @@ static unsigned int od_dbs_update(struct cpufreq_policy *policy)
 	 */
 	if (sample_type == OD_SUB_SAMPLE && policy_dbs->sample_delay_ns > 0) {
 		__cpufreq_driver_target(policy, dbs_info->freq_lo,
-					CPUFREQ_RELATION_H);
+					CPUFREQ_RELATION_HE);
 		return dbs_info->freq_lo_delay_us;
 	}
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 045565fe5b29..a4a79a67868f 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -281,6 +281,10 @@ static inline void cpufreq_stats_record_transition(struct cpufreq_policy *policy
 /* relation flags */
 #define CPUFREQ_RELATION_E BIT(2) /* Get if possible an efficient frequency */
 
+#define CPUFREQ_RELATION_LE (CPUFREQ_RELATION_L | CPUFREQ_RELATION_E)
+#define CPUFREQ_RELATION_HE (CPUFREQ_RELATION_H | CPUFREQ_RELATION_E)
+#define CPUFREQ_RELATION_CE (CPUFREQ_RELATION_C | CPUFREQ_RELATION_E)
+
 struct freq_attr {
 	struct attribute attr;
 	ssize_t (*show)(struct cpufreq_policy *, char *);
@@ -637,9 +641,11 @@ struct cpufreq_governor *cpufreq_fallback_governor(void);
 static inline void cpufreq_policy_apply_limits(struct cpufreq_policy *policy)
 {
 	if (policy->max < policy->cur)
-		__cpufreq_driver_target(policy, policy->max, CPUFREQ_RELATION_H);
+		__cpufreq_driver_target(policy, policy->max,
+					CPUFREQ_RELATION_HE);
 	else if (policy->min > policy->cur)
-		__cpufreq_driver_target(policy, policy->min, CPUFREQ_RELATION_L);
+		__cpufreq_driver_target(policy, policy->min,
+					CPUFREQ_RELATION_LE);
 }
 
 /* Governor attribute set */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 9/9] PM / EM: Mark inefficiencies in CPUFreq
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
                   ` (7 preceding siblings ...)
  2021-09-08 14:05 ` [PATCH v7 8/9] cpufreq: Use CPUFREQ_RELATION_E in DVFS governors Vincent Donnefort
@ 2021-09-08 14:05 ` Vincent Donnefort
  2021-10-04  9:25 ` [PATCH v7 0/9] Inefficient OPPs Viresh Kumar
  9 siblings, 0 replies; 12+ messages in thread
From: Vincent Donnefort @ 2021-09-08 14:05 UTC (permalink / raw)
  To: rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

The Energy Model has a 1:1 mapping between OPPs and performance states
(em_perf_state). If a CPUFreq driver registers an Energy Model,
inefficiencies found by the latter can be applied to CPUFreq.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index d353ef29e37f..0153b0ca7b23 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -10,6 +10,7 @@
 #define pr_fmt(fmt) "energy_model: " fmt
 
 #include <linux/cpu.h>
+#include <linux/cpufreq.h>
 #include <linux/cpumask.h>
 #include <linux/debugfs.h>
 #include <linux/energy_model.h>
@@ -231,6 +232,43 @@ static int em_create_pd(struct device *dev, int nr_states,
 	return 0;
 }
 
+static void em_cpufreq_update_efficiencies(struct device *dev)
+{
+	struct em_perf_domain *pd = dev->em_pd;
+	struct em_perf_state *table;
+	struct cpufreq_policy *policy;
+	int found = 0;
+	int i;
+
+	if (!_is_cpu_device(dev) || !pd)
+		return;
+
+	policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd)));
+	if (!policy) {
+		dev_warn(dev, "EM: Access to CPUFreq policy failed");
+		return;
+	}
+
+	table = pd->table;
+
+	for (i = 0; i < pd->nr_perf_states; i++) {
+		if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
+			continue;
+
+		if (!cpufreq_table_set_inefficient(policy, table[i].frequency))
+			found++;
+	}
+
+	if (!found)
+		return;
+
+	/*
+	 * Efficiencies have been installed in CPUFreq, inefficient frequencies
+	 * will be skipped. The EM can do the same.
+	 */
+	pd->flags |= EM_PERF_DOMAIN_SKIP_INEFFICIENCIES;
+}
+
 /**
  * em_pd_get() - Return the performance domain for a device
  * @dev : Device to find the performance domain for
@@ -347,6 +385,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 	if (milliwatts)
 		dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS;
 
+	em_cpufreq_update_efficiencies(dev);
+
 	em_debug_create_pd(dev);
 	dev_info(dev, "EM: created perf domain\n");
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 0/9] Inefficient OPPs
  2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
                   ` (8 preceding siblings ...)
  2021-09-08 14:05 ` [PATCH v7 9/9] PM / EM: Mark inefficiencies in CPUFreq Vincent Donnefort
@ 2021-10-04  9:25 ` Viresh Kumar
  2021-10-05 14:34   ` Rafael J. Wysocki
  9 siblings, 1 reply; 12+ messages in thread
From: Viresh Kumar @ 2021-10-04  9:25 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann, mka

On 08-09-21, 15:05, Vincent Donnefort wrote:
> Hi all,
> 
> Here's the new version for the inefficient OPPs. This patch-set is based on the
> following series from Viresh:
> 
>   [PATCH V3 0/9] Add callback to register with energy model
>   https://lore.kernel.org/linux-arm-msm/cover.1628742634.git.viresh.kumar@linaro.org/
> 
> The main change in this version is the re-introduction of CPUFREQ_RELATION_E,
> as a relation flag. When set, all relations will try to resolve a frequency
> across efficient frequencies only.
> 
> A bit of context:
> 
> We (Power team in Arm) are working with an experimental kernel for the
> Google's Pixel4 to evaluate and improve the current mainline performance
> and energy consumption on a real life device with Android.
> 
> The SD855 SoC found in this phone has several OPPs that are inefficient.
> I.e. despite a lower frequency, they have a greater cost. (That cost being
> fmax * OPP power / OPP freq). This issue is twofold. First of course,
> running a specific workload at an inefficient OPP is counterproductive
> since it wastes wasting energy. But also, inefficient OPPs make a
> performance domain less appealing for task placement than it really is.
> 
> We evaluated the change presented here by running 30 iterations of Android
> PCMark "Work 2.0 Performance". While we did not see any statistically
> significant performance impact, this change allowed to drastically improve
> the idle time residency.
> 
> 
>                            |   Running   |  WFI [1]  |    Idle   |
>    ------------------------+-------------+-----------+-----------+
>    Little cluster (4 CPUs) |    -0.35%   |   +0.35%  |   +0.79%  |
>    ------------------------+-------------+-----------+-----------+
>    Medium cluster (3 CPUs) |    -6.3%    |    -18%   |    +12%   |
>    ------------------------+-------------+-----------+-----------+
>    Big cluster    (1 CPU)  |    -6.4%    |    -6.5%  |    +2.8%  |
>    ------------------------+-------------+-----------+-----------+
> 
> On the SD855, the inefficient OPPs are found on the little cluster. By
> removing them from the Energy Model, we make the most efficient CPUs more
> appealing for task placement, helping to reduce the running time for the
> medium and big CPUs. Increasing idle time is crucial for this platform due
> to the substantial energy cost differences among the clusters. Also,
> despite not appearing in the statistics (the idle driver used here doesn't
> report it), we can speculate that we also improve the cluster idle time.
> 
> [1] WFI: Wait for interrupt.
> 
> Changelog since v6:
>   - Bring back CPUFREQ_RELATION_E as a relation flag.
>   - Make the policy min/max hard limits.
>   - Remove the "efficient" member from the freq_table that was pointing to the
>     next efficient frequency.

Had a quick look, LGTM.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 0/9] Inefficient OPPs
  2021-10-04  9:25 ` [PATCH v7 0/9] Inefficient OPPs Viresh Kumar
@ 2021-10-05 14:34   ` Rafael J. Wysocki
  0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2021-10-05 14:34 UTC (permalink / raw)
  To: Viresh Kumar, Vincent Donnefort
  Cc: Rafael J. Wysocki, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Lukasz Luba, Dietmar Eggemann,
	Matthias Kaehlcke

On Mon, Oct 4, 2021 at 11:25 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 08-09-21, 15:05, Vincent Donnefort wrote:
> > Hi all,
> >
> > Here's the new version for the inefficient OPPs. This patch-set is based on the
> > following series from Viresh:
> >
> >   [PATCH V3 0/9] Add callback to register with energy model
> >   https://lore.kernel.org/linux-arm-msm/cover.1628742634.git.viresh.kumar@linaro.org/
> >
> > The main change in this version is the re-introduction of CPUFREQ_RELATION_E,
> > as a relation flag. When set, all relations will try to resolve a frequency
> > across efficient frequencies only.
> >
> > A bit of context:
> >
> > We (Power team in Arm) are working with an experimental kernel for the
> > Google's Pixel4 to evaluate and improve the current mainline performance
> > and energy consumption on a real life device with Android.
> >
> > The SD855 SoC found in this phone has several OPPs that are inefficient.
> > I.e. despite a lower frequency, they have a greater cost. (That cost being
> > fmax * OPP power / OPP freq). This issue is twofold. First of course,
> > running a specific workload at an inefficient OPP is counterproductive
> > since it wastes wasting energy. But also, inefficient OPPs make a
> > performance domain less appealing for task placement than it really is.
> >
> > We evaluated the change presented here by running 30 iterations of Android
> > PCMark "Work 2.0 Performance". While we did not see any statistically
> > significant performance impact, this change allowed to drastically improve
> > the idle time residency.
> >
> >
> >                            |   Running   |  WFI [1]  |    Idle   |
> >    ------------------------+-------------+-----------+-----------+
> >    Little cluster (4 CPUs) |    -0.35%   |   +0.35%  |   +0.79%  |
> >    ------------------------+-------------+-----------+-----------+
> >    Medium cluster (3 CPUs) |    -6.3%    |    -18%   |    +12%   |
> >    ------------------------+-------------+-----------+-----------+
> >    Big cluster    (1 CPU)  |    -6.4%    |    -6.5%  |    +2.8%  |
> >    ------------------------+-------------+-----------+-----------+
> >
> > On the SD855, the inefficient OPPs are found on the little cluster. By
> > removing them from the Energy Model, we make the most efficient CPUs more
> > appealing for task placement, helping to reduce the running time for the
> > medium and big CPUs. Increasing idle time is crucial for this platform due
> > to the substantial energy cost differences among the clusters. Also,
> > despite not appearing in the statistics (the idle driver used here doesn't
> > report it), we can speculate that we also improve the cluster idle time.
> >
> > [1] WFI: Wait for interrupt.
> >
> > Changelog since v6:
> >   - Bring back CPUFREQ_RELATION_E as a relation flag.
> >   - Make the policy min/max hard limits.
> >   - Remove the "efficient" member from the freq_table that was pointing to the
> >     next efficient frequency.
>
> Had a quick look, LGTM.
>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

All patches in the series applied as 5.16 material, thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-10-05 14:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08 14:05 [PATCH v7 0/9] Inefficient OPPs Vincent Donnefort
2021-09-08 14:05 ` [PATCH v7 1/9] PM / EM: Fix inefficient states detection Vincent Donnefort
2021-09-08 14:05 ` [PATCH v7 2/9] PM / EM: Mark inefficient states Vincent Donnefort
2021-09-08 14:05 ` [PATCH v7 3/9] PM / EM: Extend em_perf_domain with a flag field Vincent Donnefort
2021-09-08 14:05 ` [PATCH v7 4/9] PM / EM: Allow skipping inefficient states Vincent Donnefort
2021-09-08 14:05 ` [PATCH v7 5/9] cpufreq: Make policy min/max hard requirements Vincent Donnefort
2021-09-08 14:05 ` [PATCH v7 6/9] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
2021-09-08 14:05 ` [PATCH v7 7/9] cpufreq: Introducing CPUFREQ_RELATION_E Vincent Donnefort
2021-09-08 14:05 ` [PATCH v7 8/9] cpufreq: Use CPUFREQ_RELATION_E in DVFS governors Vincent Donnefort
2021-09-08 14:05 ` [PATCH v7 9/9] PM / EM: Mark inefficiencies in CPUFreq Vincent Donnefort
2021-10-04  9:25 ` [PATCH v7 0/9] Inefficient OPPs Viresh Kumar
2021-10-05 14:34   ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.