All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Inefficient OPPs
@ 2021-08-31 10:24 Vincent Donnefort
  2021-08-31 10:24 ` [PATCH v6 1/7] PM / EM: Fix inefficient states detection Vincent Donnefort
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Vincent Donnefort @ 2021-08-31 10:24 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Hi all,

Here's the new version for the inefficient OPPs. This patch-set is based on the
following series from Viresh:

  [PATCH V3 0/9] Add callback to register with energy model
  https://lore.kernel.org/linux-arm-msm/cover.1628742634.git.viresh.kumar@linaro.org/

A bit of context:

We (Power team in Arm) are working with an experimental kernel for the
Google's Pixel4 to evaluate and improve the current mainline performance
and energy consumption on a real life device with Android.

The SD855 SoC found in this phone has several OPPs that are inefficient.
I.e. despite a lower frequency, they have a greater cost. (That cost being
fmax * OPP power / OPP freq). This issue is twofold. First of course,
running a specific workload at an inefficient OPP is counterproductive
since it wastes wasting energy. But also, inefficient OPPs make a
performance domain less appealing for task placement than it really is.

We evaluated the change presented here by running 30 iterations of Android
PCMark "Work 2.0 Performance". While we did not see any statistically
significant performance impact, this change allowed to drastically improve
the idle time residency.


                           |   Running   |  WFI [1]  |    Idle   |
   ------------------------+-------------+-----------+-----------+
   Little cluster (4 CPUs) |    -0.35%   |   +0.35%  |   +0.79%  |
   ------------------------+-------------+-----------+-----------+
   Medium cluster (3 CPUs) |    -6.3%    |    -18%   |    +12%   |
   ------------------------+-------------+-----------+-----------+
   Big cluster    (1 CPU)  |    -6.4%    |    -6.5%  |    +2.8%  |
   ------------------------+-------------+-----------+-----------+

On the SD855, the inefficient OPPs are found on the little cluster. By
removing them from the Energy Model, we make the most efficient CPUs more
appealing for task placement, helping to reduce the running time for the
medium and big CPUs. Increasing idle time is crucial for this platform due
to the substantial energy cost differences among the clusters. Also,
despite not appearing in the statistics (the idle driver used here doesn't
report it), we can speculate that we also improve the cluster idle time.

[1] WFI: Wait for interrupt.

Changelog since v5:
  - EM setup inefficient frequencies in CPUFreq, instead of CPUFreq settings its
    own inefficient frequencies from EM.

Changelog since v4:
  - Remove CPUFREQ_RELATION_E.
  - Skip inefficient OPPs for all governors with CPUFREQ_GOV_DYNAMIC_SWITCHING
  - Remove CPUFREQ_READ_ENERGY_MODEL in favor of the register_em callback.

Changelog since v3:
  - New freq-table relation CPUFREQ_RELATION_E.
  - New CPUFreq driver flag CPUFREQ_READ_ENERGY_MODEL.
  - EM flag to skip or not inefficiencies (driven by CPUFreq).
  - Fix infinite loop in set_freq_table_efficiencies().

Changelog since v2:
  - Add separated support for inefficiencies into CPUFreq.
  - Collect Reviewed-by for the first patch.

Changelog since v1:
  - Remove the Look-up table as the numbers weren't strong enough to

Vincent Donnefort (7):
  PM / EM: Fix inefficient states detection
  PM / EM: Mark inefficient states
  PM / EM: Extend em_perf_domain with a flag field
  PM / EM: Allow skipping inefficient states
  cpufreq: Add an interface to mark inefficient frequencies
  cpufreq: Skip inefficient frequencies
  PM / EM: Mark inefficiencies in CPUFreq

 drivers/cpufreq/cpufreq.c    | 39 +++++++++++++++++++
 drivers/cpufreq/freq_table.c | 53 ++++++++++++++++++++++++++
 include/linux/cpufreq.h      | 75 ++++++++++++++++++++++++++++++++++---
 include/linux/energy_model.h | 68 ++++++++++++++++++++++++++++-----
 kernel/power/energy_model.c  | 89 +++++++++++++++++++++++++++++++++++---------
 5 files changed, 291 insertions(+), 33 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v6 1/7] PM / EM: Fix inefficient states detection
  2021-08-31 10:24 [PATCH v6 0/7] Inefficient OPPs Vincent Donnefort
@ 2021-08-31 10:24 ` Vincent Donnefort
  2021-08-31 10:24 ` [PATCH v6 2/7] PM / EM: Mark inefficient states Vincent Donnefort
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Vincent Donnefort @ 2021-08-31 10:24 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Currently, a debug message is printed if an inefficient state is detected
in the Energy Model. Unfortunately, it won't detect if the first state is
inefficient or if two successive states are. Fix this behavior.

Fixes: 27871f7a8a34 (PM: Introduce an Energy Model management framework)
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index a332ccd829e2..97e62469a6b3 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -107,8 +107,7 @@ static void em_debug_remove_pd(struct device *dev) {}
 static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 				int nr_states, struct em_data_callback *cb)
 {
-	unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
-	unsigned long power, freq, prev_freq = 0;
+	unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX;
 	struct em_perf_state *table;
 	int i, ret;
 	u64 fmax;
@@ -153,27 +152,21 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 
 		table[i].power = power;
 		table[i].frequency = prev_freq = freq;
-
-		/*
-		 * The hertz/watts efficiency ratio should decrease as the
-		 * frequency grows on sane platforms. But this isn't always
-		 * true in practice so warn the user if a higher OPP is more
-		 * power efficient than a lower one.
-		 */
-		opp_eff = freq / power;
-		if (opp_eff >= prev_opp_eff)
-			dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
-					i, i - 1);
-		prev_opp_eff = opp_eff;
 	}
 
 	/* Compute the cost of each performance state. */
 	fmax = (u64) table[nr_states - 1].frequency;
-	for (i = 0; i < nr_states; i++) {
+	for (i = nr_states - 1; i >= 0; i--) {
 		unsigned long power_res = em_scale_power(table[i].power);
 
 		table[i].cost = div64_u64(fmax * power_res,
 					  table[i].frequency);
+		if (table[i].cost >= prev_cost) {
+			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
+				table[i].frequency);
+		} else {
+			prev_cost = table[i].cost;
+		}
 	}
 
 	pd->table = table;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 2/7] PM / EM: Mark inefficient states
  2021-08-31 10:24 [PATCH v6 0/7] Inefficient OPPs Vincent Donnefort
  2021-08-31 10:24 ` [PATCH v6 1/7] PM / EM: Fix inefficient states detection Vincent Donnefort
@ 2021-08-31 10:24 ` Vincent Donnefort
  2021-08-31 10:24 ` [PATCH v6 3/7] PM / EM: Extend em_perf_domain with a flag field Vincent Donnefort
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Vincent Donnefort @ 2021-08-31 10:24 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Some SoCs, such as the sd855 have OPPs within the same performance domain,
whose cost is higher than others with a higher frequency. Even though
those OPPs are interesting from a cooling perspective, it makes no sense
to use them when the device can run at full capacity. Those OPPs handicap
the performance domain, when choosing the most energy-efficient CPU and
are wasting energy. They are inefficient.

Hence, add support for such OPPs to the Energy Model. The table can now
be read skipping inefficient performance states (and by extension,
inefficient OPPs).

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 1834752c5617..629f5f63a7d7 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -17,13 +17,25 @@
  *		device). It can be a total power: static and dynamic.
  * @cost:	The cost coefficient associated with this level, used during
  *		energy calculation. Equal to: power * max_frequency / frequency
+ * @flags:	see "em_perf_state flags" description below.
  */
 struct em_perf_state {
 	unsigned long frequency;
 	unsigned long power;
 	unsigned long cost;
+	unsigned long flags;
 };
 
+/*
+ * em_perf_state flags:
+ *
+ * EM_PERF_STATE_INEFFICIENT: The performance state is inefficient. There is
+ * in this em_perf_domain, another performance state with a higher frequency
+ * but a lower or equal power cost. Such inefficient states are ignored when
+ * using em_pd_get_efficient_*() functions.
+ */
+#define EM_PERF_STATE_INEFFICIENT BIT(0)
+
 /**
  * em_perf_domain - Performance domain
  * @table:		List of performance states, in ascending order
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 97e62469a6b3..6d8438347535 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -2,7 +2,7 @@
 /*
  * Energy Model of devices
  *
- * Copyright (c) 2018-2020, Arm ltd.
+ * Copyright (c) 2018-2021, Arm ltd.
  * Written by: Quentin Perret, Arm ltd.
  * Improvements provided by: Lukasz Luba, Arm ltd.
  */
@@ -42,6 +42,7 @@ static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
 	debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
 	debugfs_create_ulong("power", 0444, d, &ps->power);
 	debugfs_create_ulong("cost", 0444, d, &ps->cost);
+	debugfs_create_ulong("inefficient", 0444, d, &ps->flags);
 }
 
 static int em_debug_cpus_show(struct seq_file *s, void *unused)
@@ -162,6 +163,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 		table[i].cost = div64_u64(fmax * power_res,
 					  table[i].frequency);
 		if (table[i].cost >= prev_cost) {
+			table[i].flags = EM_PERF_STATE_INEFFICIENT;
 			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
 				table[i].frequency);
 		} else {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 3/7] PM / EM: Extend em_perf_domain with a flag field
  2021-08-31 10:24 [PATCH v6 0/7] Inefficient OPPs Vincent Donnefort
  2021-08-31 10:24 ` [PATCH v6 1/7] PM / EM: Fix inefficient states detection Vincent Donnefort
  2021-08-31 10:24 ` [PATCH v6 2/7] PM / EM: Mark inefficient states Vincent Donnefort
@ 2021-08-31 10:24 ` Vincent Donnefort
  2021-08-31 10:24 ` [PATCH v6 4/7] PM / EM: Allow skipping inefficient states Vincent Donnefort
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Vincent Donnefort @ 2021-08-31 10:24 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Merge the current "milliwatts" option into a "flag" field. This intends to
prepare the extension of this structure for inefficient states support in
the Energy Model.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 629f5f63a7d7..ac2f7d0ab946 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -40,8 +40,7 @@ struct em_perf_state {
  * em_perf_domain - Performance domain
  * @table:		List of performance states, in ascending order
  * @nr_perf_states:	Number of performance states
- * @milliwatts:		Flag indicating the power values are in milli-Watts
- *			or some other scale.
+ * @flags:		See "em_perf_domain flags"
  * @cpus:		Cpumask covering the CPUs of the domain. It's here
  *			for performance reasons to avoid potential cache
  *			misses during energy calculations in the scheduler
@@ -56,10 +55,18 @@ struct em_perf_state {
 struct em_perf_domain {
 	struct em_perf_state *table;
 	int nr_perf_states;
-	int milliwatts;
+	unsigned long flags;
 	unsigned long cpus[];
 };
 
+/*
+ *  em_perf_domain flags:
+ *
+ *  EM_PERF_DOMAIN_MILLIWATTS: The power values are in milli-Watts or some
+ *  other scale.
+ */
+#define EM_PERF_DOMAIN_MILLIWATTS BIT(0)
+
 #define em_span_cpus(em) (to_cpumask((em)->cpus))
 
 #ifdef CONFIG_ENERGY_MODEL
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 6d8438347535..3a7d1573b214 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -56,7 +56,8 @@ DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
 static int em_debug_units_show(struct seq_file *s, void *unused)
 {
 	struct em_perf_domain *pd = s->private;
-	char *units = pd->milliwatts ? "milliWatts" : "bogoWatts";
+	char *units = (pd->flags & EM_PERF_DOMAIN_MILLIWATTS) ?
+		"milliWatts" : "bogoWatts";
 
 	seq_printf(s, "%s\n", units);
 
@@ -330,7 +331,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 	if (ret)
 		goto unlock;
 
-	dev->em_pd->milliwatts = milliwatts;
+	if (milliwatts)
+		dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS;
 
 	em_debug_create_pd(dev);
 	dev_info(dev, "EM: created perf domain\n");
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 4/7] PM / EM: Allow skipping inefficient states
  2021-08-31 10:24 [PATCH v6 0/7] Inefficient OPPs Vincent Donnefort
                   ` (2 preceding siblings ...)
  2021-08-31 10:24 ` [PATCH v6 3/7] PM / EM: Extend em_perf_domain with a flag field Vincent Donnefort
@ 2021-08-31 10:24 ` Vincent Donnefort
  2021-08-31 10:24 ` [PATCH v6 5/7] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Vincent Donnefort @ 2021-08-31 10:24 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

The new performance domain flag EM_PERF_DOMAIN_SKIP_INEFFICIENCIES allows
to not take into account inefficient states when estimating energy
consumption. This intends to let the Energy Model know that CPUFreq itself
will skip inefficiencies and such states don't need to be part of the
estimation anymore.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index ac2f7d0ab946..15d41f09f009 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -64,8 +64,12 @@ struct em_perf_domain {
  *
  *  EM_PERF_DOMAIN_MILLIWATTS: The power values are in milli-Watts or some
  *  other scale.
+ *
+ *  EM_PERF_DOMAIN_SKIP_INEFFICIENCIES: Skip inefficient states when estimating
+ *  energy consumption.
  */
 #define EM_PERF_DOMAIN_MILLIWATTS BIT(0)
+#define EM_PERF_DOMAIN_SKIP_INEFFICIENCIES BIT(1)
 
 #define em_span_cpus(em) (to_cpumask((em)->cpus))
 
@@ -121,6 +125,37 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 void em_dev_unregister_perf_domain(struct device *dev);
 
 /**
+ * em_pd_get_efficient_state() - Get an efficient performance state from the EM
+ * @pd   : Performance domain for which we want an efficient frequency
+ * @freq : Frequency to map with the EM
+ *
+ * It is called from the scheduler code quite frequently and as a consequence
+ * doesn't implement any check.
+ *
+ * Return: An efficient performance state, high enough to meet @freq
+ * requirement.
+ */
+static inline
+struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd,
+						unsigned long freq)
+{
+	struct em_perf_state *ps;
+	int i;
+
+	for (i = 0; i < pd->nr_perf_states; i++) {
+		ps = &pd->table[i];
+		if (ps->frequency >= freq) {
+			if (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
+			    ps->flags & EM_PERF_STATE_INEFFICIENT)
+				continue;
+			break;
+		}
+	}
+
+	return ps;
+}
+
+/**
  * em_cpu_energy() - Estimates the energy consumed by the CPUs of a
 		performance domain
  * @pd		: performance domain for which energy has to be estimated
@@ -142,7 +177,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 {
 	unsigned long freq, scale_cpu;
 	struct em_perf_state *ps;
-	int i, cpu;
+	int cpu;
 
 	if (!sum_util)
 		return 0;
@@ -167,11 +202,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 * Find the lowest performance state of the Energy Model above the
 	 * requested frequency.
 	 */
-	for (i = 0; i < pd->nr_perf_states; i++) {
-		ps = &pd->table[i];
-		if (ps->frequency >= freq)
-			break;
-	}
+	ps = em_pd_get_efficient_state(pd, freq);
 
 	/*
 	 * The capacity of a CPU in the domain at the performance state (ps)
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 3a7d1573b214..d353ef29e37f 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -65,6 +65,17 @@ static int em_debug_units_show(struct seq_file *s, void *unused)
 }
 DEFINE_SHOW_ATTRIBUTE(em_debug_units);
 
+static int em_debug_skip_inefficiencies_show(struct seq_file *s, void *unused)
+{
+	struct em_perf_domain *pd = s->private;
+	int enabled = (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES) ? 1 : 0;
+
+	seq_printf(s, "%d\n", enabled);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(em_debug_skip_inefficiencies);
+
 static void em_debug_create_pd(struct device *dev)
 {
 	struct dentry *d;
@@ -78,6 +89,8 @@ static void em_debug_create_pd(struct device *dev)
 				    &em_debug_cpus_fops);
 
 	debugfs_create_file("units", 0444, d, dev->em_pd, &em_debug_units_fops);
+	debugfs_create_file("skip-inefficiencies", 0444, d, dev->em_pd,
+			    &em_debug_skip_inefficiencies_fops);
 
 	/* Create a sub-directory for each performance state */
 	for (i = 0; i < dev->em_pd->nr_perf_states; i++)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 5/7] cpufreq: Add an interface to mark inefficient frequencies
  2021-08-31 10:24 [PATCH v6 0/7] Inefficient OPPs Vincent Donnefort
                   ` (3 preceding siblings ...)
  2021-08-31 10:24 ` [PATCH v6 4/7] PM / EM: Allow skipping inefficient states Vincent Donnefort
@ 2021-08-31 10:24 ` Vincent Donnefort
  2021-09-01  4:55   ` Viresh Kumar
  2021-09-01 17:25   ` Rafael J. Wysocki
  2021-08-31 10:24 ` [PATCH v6 6/7] cpufreq: Skip " Vincent Donnefort
  2021-08-31 10:24 ` [PATCH v6 7/7] PM / EM: Mark inefficiencies in CPUFreq Vincent Donnefort
  6 siblings, 2 replies; 24+ messages in thread
From: Vincent Donnefort @ 2021-08-31 10:24 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

Some SoCs such as the sd855 have OPPs within the same policy whose cost is
higher than others with a higher frequency. Those OPPs are inefficients
and it might be interesting for a governor to not use them.

The inefficient interface is composed of two calls:

 1. cpufreq_table_set_inefficient() marks a frequency as inefficient.

 2. cpufreq_table_update_efficiencies() use the inefficiences marked by the
    previous function to generate a mapping inefficient->efficient.

Resolving an inefficient frequency to an efficient on can then be done
by accessing the cpufreq_frequency_table member "efficient". The
resolution doesn't guarantee the policy maximum.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index 67e56cf638ef..d3fa38af2aa6 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -365,6 +365,59 @@ int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
 	return set_freq_table_sorted(policy);
 }
 
+/**
+ * cpufreq_table_update_efficiencies() - Update efficiency resolution
+ *
+ * @policy:	the &struct cpufreq_policy to update
+ *
+ * Allow quick resolution from inefficient frequencies to efficient ones.
+ * Inefficient frequencies must have been previously marked with
+ * cpufreq_table_set_inefficient().
+ *
+ * Return: %0 on success or a negative errno code
+ */
+int cpufreq_table_update_efficiencies(struct cpufreq_policy *policy)
+{
+	struct cpufreq_frequency_table *pos, *table = policy->freq_table;
+	enum cpufreq_table_sorting sort = policy->freq_table_sorted;
+	int efficient, idx;
+
+	/* Not supported */
+	if (sort == CPUFREQ_TABLE_UNSORTED)
+		return -EINVAL;
+
+	/* The highest frequency is always efficient */
+	cpufreq_for_each_valid_entry_idx(pos, table, idx) {
+		efficient = idx;
+		if (sort == CPUFREQ_TABLE_SORTED_DESCENDING)
+			break;
+	}
+
+	for (;;) {
+		pos = &table[idx];
+
+		if (pos->frequency != CPUFREQ_ENTRY_INVALID) {
+			if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
+				pos->efficient = efficient;
+			} else {
+				pos->efficient = idx;
+				efficient = idx;
+			}
+		}
+
+		if (sort == CPUFREQ_TABLE_SORTED_ASCENDING) {
+			if (--idx < 0)
+				break;
+		} else {
+			if (table[++idx].frequency == CPUFREQ_TABLE_END)
+				break;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(cpufreq_table_update_efficiencies);
+
 MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
 MODULE_DESCRIPTION("CPUfreq frequency table helpers");
 MODULE_LICENSE("GPL");
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index c65a1d7385f8..4e901ebd104d 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -664,13 +664,15 @@ struct governor_attr {
 #define CPUFREQ_ENTRY_INVALID	~0u
 #define CPUFREQ_TABLE_END	~1u
 /* Special Values of .flags field */
-#define CPUFREQ_BOOST_FREQ	(1 << 0)
+#define CPUFREQ_BOOST_FREQ	 (1 << 0)
+#define CPUFREQ_INEFFICIENT_FREQ (1 << 1)
 
 struct cpufreq_frequency_table {
 	unsigned int	flags;
 	unsigned int	driver_data; /* driver specific data, not used by core */
 	unsigned int	frequency; /* kHz - doesn't need to be in ascending
 				    * order */
+	unsigned int	efficient; /* idx of an efficient frequency */
 };
 
 #if defined(CONFIG_CPU_FREQ) && defined(CONFIG_PM_OPP)
@@ -762,6 +764,7 @@ int cpufreq_boost_trigger_state(int state);
 int cpufreq_boost_enabled(void);
 int cpufreq_enable_boost_support(void);
 bool policy_has_boost_freq(struct cpufreq_policy *policy);
+int cpufreq_table_update_efficiencies(struct cpufreq_policy *policy);
 
 /* Find lowest freq at or above target in a table in ascending order */
 static inline int cpufreq_table_find_index_al(struct cpufreq_policy *policy,
@@ -1003,6 +1006,29 @@ static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy
 
 	return count;
 }
+
+/**
+ * cpufreq_table_set_inefficient() - Mark a frequency as inefficient
+ *
+ * @policy:	the &struct cpufreq_policy containing the inefficient frequency
+ * @frequency:	the inefficient frequency
+ *
+ * Once inefficiencies marked, the efficient resolution must be updated with the
+ * function cpufreq_table_update_efficiencies().
+ */
+static inline void
+cpufreq_table_set_inefficient(const struct cpufreq_policy *policy,
+			      unsigned int frequency)
+{
+	struct cpufreq_frequency_table *pos;
+
+	cpufreq_for_each_valid_entry(pos, policy->freq_table) {
+		if (pos->frequency == frequency) {
+			pos->flags |= CPUFREQ_INEFFICIENT_FREQ;
+			break;
+		}
+	}
+}
 #else
 static inline int cpufreq_boost_trigger_state(int state)
 {
@@ -1022,6 +1048,16 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
 {
 	return false;
 }
+
+static inline void
+cpufreq_table_set_inefficient(const struct cpufreq_policy *policy,
+			      unsigned int frequency) {}
+
+static inline int
+cpufreq_table_update_efficiencies(struct cpufreq_policy *policy)
+{
+	return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-08-31 10:24 [PATCH v6 0/7] Inefficient OPPs Vincent Donnefort
                   ` (4 preceding siblings ...)
  2021-08-31 10:24 ` [PATCH v6 5/7] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
@ 2021-08-31 10:24 ` Vincent Donnefort
  2021-09-01  5:30   ` Viresh Kumar
  2021-09-01 18:13   ` Rafael J. Wysocki
  2021-08-31 10:24 ` [PATCH v6 7/7] PM / EM: Mark inefficiencies in CPUFreq Vincent Donnefort
  6 siblings, 2 replies; 24+ messages in thread
From: Vincent Donnefort @ 2021-08-31 10:24 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

CPUFreq governors that do DVFS (i.e. CPUFREQ_GOV_DYNAMIC_SWITCHING flag)
can skip frequencies marked as inefficient, as long as the efficient
frequency found meet the policy maximum requirement.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 7d5f170ecad1..b46fe2d7baf1 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2295,6 +2295,44 @@ __weak struct cpufreq_governor *cpufreq_fallback_governor(void)
 	return NULL;
 }
 
+static inline bool
+cpufreq_can_skip_inefficiencies(struct cpufreq_policy *policy)
+{
+	struct cpufreq_frequency_table *pos;
+	bool valid = false;
+	int idx;
+
+	if (!(policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING))
+		return false;
+
+	if (policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED)
+		return false;
+
+	/* Is there at least one inefficiency ? */
+	cpufreq_for_each_valid_entry(pos, policy->freq_table) {
+		if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
+			valid = true;
+			break;
+		}
+	}
+
+	if (!valid)
+		return false;
+
+	/*
+	 * Has cpufreq_table_update_efficiencies been called? i.e. is the
+	 * highest frequency efficient.
+	 */
+	cpufreq_for_each_valid_entry_idx(pos, policy->freq_table, idx) {
+		valid = !!(idx == pos->efficient);
+		if (policy->freq_table_sorted ==
+					CPUFREQ_TABLE_SORTED_DESCENDING)
+			break;
+	}
+
+	return valid;
+}
+
 static int cpufreq_init_governor(struct cpufreq_policy *policy)
 {
 	int ret;
@@ -2337,6 +2375,7 @@ static int cpufreq_init_governor(struct cpufreq_policy *policy)
 	}
 
 	policy->strict_target = !!(policy->governor->flags & CPUFREQ_GOV_STRICT_TARGET);
+	policy->skip_inefficiencies = cpufreq_can_skip_inefficiencies(policy);
 
 	return 0;
 }
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 4e901ebd104d..cb09afbf01e2 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -117,6 +117,13 @@ struct cpufreq_policy {
 	bool			strict_target;
 
 	/*
+	 * Set if the CPUFREQ_GOV_DYNAMIC_SWITCHING flag is set for the current
+	 * governor and if inefficient frequencies were found in the frequency
+	 * table.
+	 */
+	bool			skip_inefficiencies;
+
+	/*
 	 * Preferred average time interval between consecutive invocations of
 	 * the driver to set the frequency for this policy.  To be set by the
 	 * scaling driver (0, which is the default, means no preference).
@@ -972,25 +979,46 @@ static inline int cpufreq_table_find_index_c(struct cpufreq_policy *policy,
 		return cpufreq_table_find_index_dc(policy, target_freq);
 }
 
+static inline unsigned int
+cpufreq_frequency_find_efficient(struct cpufreq_policy *policy,
+				 unsigned int idx)
+{
+	struct cpufreq_frequency_table *table = policy->freq_table;
+	unsigned int efficient_idx = table[idx].efficient;
+
+	return table[efficient_idx].frequency <= policy->max ? efficient_idx :
+		idx;
+}
+
 static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
 						 unsigned int target_freq,
 						 unsigned int relation)
 {
+	int idx;
+
 	if (unlikely(policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED))
 		return cpufreq_table_index_unsorted(policy, target_freq,
 						    relation);
 
 	switch (relation) {
 	case CPUFREQ_RELATION_L:
-		return cpufreq_table_find_index_l(policy, target_freq);
+		idx = cpufreq_table_find_index_l(policy, target_freq);
+		break;
 	case CPUFREQ_RELATION_H:
-		return cpufreq_table_find_index_h(policy, target_freq);
+		idx = cpufreq_table_find_index_h(policy, target_freq);
+		break;
 	case CPUFREQ_RELATION_C:
-		return cpufreq_table_find_index_c(policy, target_freq);
+		idx = cpufreq_table_find_index_c(policy, target_freq);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
 	}
+
+	if (policy->skip_inefficiencies)
+		idx = cpufreq_frequency_find_efficient(policy, idx);
+
+	return idx;
 }
 
 static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy *policy)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 7/7] PM / EM: Mark inefficiencies in CPUFreq
  2021-08-31 10:24 [PATCH v6 0/7] Inefficient OPPs Vincent Donnefort
                   ` (5 preceding siblings ...)
  2021-08-31 10:24 ` [PATCH v6 6/7] cpufreq: Skip " Vincent Donnefort
@ 2021-08-31 10:24 ` Vincent Donnefort
  2021-09-01  5:31   ` Viresh Kumar
  6 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2021-08-31 10:24 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann, mka,
	Vincent Donnefort

The Energy Model has a 1:1 mapping between OPPs and performance states
(em_perf_state). If a CPUFreq driver registers an Energy Model,
inefficiencies found by the latter can be applied to CPUFreq.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index cb09afbf01e2..153ddc7b0506 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1121,7 +1121,6 @@ void cpufreq_generic_init(struct cpufreq_policy *policy,
 
 static inline void cpufreq_register_em_with_opp(struct cpufreq_policy *policy)
 {
-	dev_pm_opp_of_register_em(get_cpu_device(policy->cpu),
-				  policy->related_cpus);
+	dev_pm_opp_of_register_em(get_cpu_device(policy->cpu), policy->related_cpus);
 }
 #endif /* _LINUX_CPUFREQ_H */
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index d353ef29e37f..dfcbb2deb794 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -10,6 +10,7 @@
 #define pr_fmt(fmt) "energy_model: " fmt
 
 #include <linux/cpu.h>
+#include <linux/cpufreq.h>
 #include <linux/cpumask.h>
 #include <linux/debugfs.h>
 #include <linux/energy_model.h>
@@ -231,6 +232,46 @@ static int em_create_pd(struct device *dev, int nr_states,
 	return 0;
 }
 
+static void em_cpufreq_update_efficiencies(struct device *dev)
+{
+	struct em_perf_domain *pd = dev->em_pd;
+	struct em_perf_state *table;
+	struct cpufreq_policy *policy;
+	bool found = false;
+	int i;
+
+	if (!_is_cpu_device(dev) || !pd)
+		return;
+
+	policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd)));
+	if (!policy) {
+		dev_warn(dev, "EM: Access to CPUFreq policy failed");
+		return;
+	}
+
+	table = pd->table;
+
+	for (i = 0; i < pd->nr_perf_states; i++) {
+		if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
+			continue;
+
+		cpufreq_table_set_inefficient(policy, table[i].frequency);
+		found = true;
+	}
+
+	if (!found)
+		return;
+
+	if (cpufreq_table_update_efficiencies(policy))
+		return;
+
+	/*
+	 * Efficiencies have been installed in CPUFreq, inefficient frequencies
+	 * will be skipped. The EM can do the same.
+	 */
+	pd->flags |= EM_PERF_DOMAIN_SKIP_INEFFICIENCIES;
+}
+
 /**
  * em_pd_get() - Return the performance domain for a device
  * @dev : Device to find the performance domain for
@@ -347,6 +388,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 	if (milliwatts)
 		dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS;
 
+	em_cpufreq_update_efficiencies(dev);
+
 	em_debug_create_pd(dev);
 	dev_info(dev, "EM: created perf domain\n");
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 5/7] cpufreq: Add an interface to mark inefficient frequencies
  2021-08-31 10:24 ` [PATCH v6 5/7] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
@ 2021-09-01  4:55   ` Viresh Kumar
  2021-09-01 17:25   ` Rafael J. Wysocki
  1 sibling, 0 replies; 24+ messages in thread
From: Viresh Kumar @ 2021-09-01  4:55 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann, mka

On 31-08-21, 11:24, Vincent Donnefort wrote:
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index c65a1d7385f8..4e901ebd104d 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -664,13 +664,15 @@ struct governor_attr {
>  #define CPUFREQ_ENTRY_INVALID	~0u
>  #define CPUFREQ_TABLE_END	~1u
>  /* Special Values of .flags field */
> -#define CPUFREQ_BOOST_FREQ	(1 << 0)
> +#define CPUFREQ_BOOST_FREQ	 (1 << 0)

You are mixing tabs and spaces here, I am sure some tool complains
about it, but perhaps if space is before tab.

Either keep the new entry unaligned with the above ones or just add a
tab instead to keep things aligned. Add a tab to CPUFREQ_TABLE_END and
CPUFREQ_ENTRY_INVALID as well in case then.

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-08-31 10:24 ` [PATCH v6 6/7] cpufreq: Skip " Vincent Donnefort
@ 2021-09-01  5:30   ` Viresh Kumar
  2021-09-01 18:13   ` Rafael J. Wysocki
  1 sibling, 0 replies; 24+ messages in thread
From: Viresh Kumar @ 2021-09-01  5:30 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann, mka

On 31-08-21, 11:24, Vincent Donnefort wrote:
> +static inline bool

I would drop the inline part here.

> +cpufreq_can_skip_inefficiencies(struct cpufreq_policy *policy)
> +{
> +	struct cpufreq_frequency_table *pos;
> +	bool valid = false;
> +	int idx;
> +
> +	if (!(policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING))
> +		return false;
> +
> +	if (policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED)
> +		return false;
> +
> +	/* Is there at least one inefficiency ? */
> +	cpufreq_for_each_valid_entry(pos, policy->freq_table) {
> +		if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
> +			valid = true;
> +			break;
> +		}
> +	}
> +
> +	if (!valid)
> +		return false;
> +
> +	/*
> +	 * Has cpufreq_table_update_efficiencies been called? i.e. is the
> +	 * highest frequency efficient.
> +	 */
> +	cpufreq_for_each_valid_entry_idx(pos, policy->freq_table, idx) {
> +		valid = !!(idx == pos->efficient);

I don't think !! is required here. == already returns a bool.

> +		if (policy->freq_table_sorted ==
> +					CPUFREQ_TABLE_SORTED_DESCENDING)
> +			break;
> +	}

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 7/7] PM / EM: Mark inefficiencies in CPUFreq
  2021-08-31 10:24 ` [PATCH v6 7/7] PM / EM: Mark inefficiencies in CPUFreq Vincent Donnefort
@ 2021-09-01  5:31   ` Viresh Kumar
  0 siblings, 0 replies; 24+ messages in thread
From: Viresh Kumar @ 2021-09-01  5:31 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann, mka

On 31-08-21, 11:24, Vincent Donnefort wrote:
> The Energy Model has a 1:1 mapping between OPPs and performance states
> (em_perf_state). If a CPUFreq driver registers an Energy Model,
> inefficiencies found by the latter can be applied to CPUFreq.
> 
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index cb09afbf01e2..153ddc7b0506 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -1121,7 +1121,6 @@ void cpufreq_generic_init(struct cpufreq_policy *policy,
>  
>  static inline void cpufreq_register_em_with_opp(struct cpufreq_policy *policy)
>  {
> -	dev_pm_opp_of_register_em(get_cpu_device(policy->cpu),
> -				  policy->related_cpus);
> +	dev_pm_opp_of_register_em(get_cpu_device(policy->cpu), policy->related_cpus);
>  }
>  #endif /* _LINUX_CPUFREQ_H */
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index d353ef29e37f..dfcbb2deb794 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -10,6 +10,7 @@
>  #define pr_fmt(fmt) "energy_model: " fmt
>  
>  #include <linux/cpu.h>
> +#include <linux/cpufreq.h>
>  #include <linux/cpumask.h>
>  #include <linux/debugfs.h>
>  #include <linux/energy_model.h>
> @@ -231,6 +232,46 @@ static int em_create_pd(struct device *dev, int nr_states,
>  	return 0;
>  }
>  
> +static void em_cpufreq_update_efficiencies(struct device *dev)
> +{
> +	struct em_perf_domain *pd = dev->em_pd;
> +	struct em_perf_state *table;
> +	struct cpufreq_policy *policy;
> +	bool found = false;
> +	int i;
> +
> +	if (!_is_cpu_device(dev) || !pd)
> +		return;
> +
> +	policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd)));
> +	if (!policy) {
> +		dev_warn(dev, "EM: Access to CPUFreq policy failed");
> +		return;
> +	}
> +
> +	table = pd->table;
> +
> +	for (i = 0; i < pd->nr_perf_states; i++) {
> +		if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
> +			continue;
> +
> +		cpufreq_table_set_inefficient(policy, table[i].frequency);
> +		found = true;
> +	}
> +
> +	if (!found)
> +		return;
> +
> +	if (cpufreq_table_update_efficiencies(policy))
> +		return;
> +
> +	/*
> +	 * Efficiencies have been installed in CPUFreq, inefficient frequencies
> +	 * will be skipped. The EM can do the same.
> +	 */
> +	pd->flags |= EM_PERF_DOMAIN_SKIP_INEFFICIENCIES;
> +}
> +
>  /**
>   * em_pd_get() - Return the performance domain for a device
>   * @dev : Device to find the performance domain for
> @@ -347,6 +388,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>  	if (milliwatts)
>  		dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS;
>  
> +	em_cpufreq_update_efficiencies(dev);

This is how I always wanted this to be :)

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 5/7] cpufreq: Add an interface to mark inefficient frequencies
  2021-08-31 10:24 ` [PATCH v6 5/7] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
  2021-09-01  4:55   ` Viresh Kumar
@ 2021-09-01 17:25   ` Rafael J. Wysocki
  1 sibling, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2021-09-01 17:25 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar, Vincent Guittot,
	Quentin Perret, Linux PM, Ionela Voinescu, Lukasz Luba,
	Dietmar Eggemann, Matthias Kaehlcke

On Tue, Aug 31, 2021 at 12:24 PM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> Some SoCs such as the sd855 have OPPs within the same policy whose cost is
> higher than others with a higher frequency. Those OPPs are inefficients
> and it might be interesting for a governor to not use them.
>
> The inefficient interface is composed of two calls:
>
>  1. cpufreq_table_set_inefficient() marks a frequency as inefficient.
>
>  2. cpufreq_table_update_efficiencies() use the inefficiences marked by the
>     previous function to generate a mapping inefficient->efficient.
>
> Resolving an inefficient frequency to an efficient on can then be done
> by accessing the cpufreq_frequency_table member "efficient". The
> resolution doesn't guarantee the policy maximum.
>
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
>
> diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
> index 67e56cf638ef..d3fa38af2aa6 100644
> --- a/drivers/cpufreq/freq_table.c
> +++ b/drivers/cpufreq/freq_table.c
> @@ -365,6 +365,59 @@ int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
>         return set_freq_table_sorted(policy);
>  }
>
> +/**
> + * cpufreq_table_update_efficiencies() - Update efficiency resolution
> + *

I'm not sure what the extra empty line here is for.

> + * @policy:    the &struct cpufreq_policy to update
> + *
> + * Allow quick resolution from inefficient frequencies to efficient ones.
> + * Inefficient frequencies must have been previously marked with
> + * cpufreq_table_set_inefficient().
> + *
> + * Return: %0 on success or a negative errno code
> + */
> +int cpufreq_table_update_efficiencies(struct cpufreq_policy *policy)
> +{
> +       struct cpufreq_frequency_table *pos, *table = policy->freq_table;
> +       enum cpufreq_table_sorting sort = policy->freq_table_sorted;
> +       int efficient, idx;
> +
> +       /* Not supported */
> +       if (sort == CPUFREQ_TABLE_UNSORTED)
> +               return -EINVAL;
> +
> +       /* The highest frequency is always efficient */
> +       cpufreq_for_each_valid_entry_idx(pos, table, idx) {
> +               efficient = idx;
> +               if (sort == CPUFREQ_TABLE_SORTED_DESCENDING)
> +                       break;
> +       }
> +
> +       for (;;) {
> +               pos = &table[idx];
> +
> +               if (pos->frequency != CPUFREQ_ENTRY_INVALID) {
> +                       if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
> +                               pos->efficient = efficient;
> +                       } else {
> +                               pos->efficient = idx;
> +                               efficient = idx;
> +                       }
> +               }
> +
> +               if (sort == CPUFREQ_TABLE_SORTED_ASCENDING) {
> +                       if (--idx < 0)
> +                               break;
> +               } else {
> +                       if (table[++idx].frequency == CPUFREQ_TABLE_END)
> +                               break;
> +               }
> +       }
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(cpufreq_table_update_efficiencies);
> +
>  MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
>  MODULE_DESCRIPTION("CPUfreq frequency table helpers");
>  MODULE_LICENSE("GPL");
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index c65a1d7385f8..4e901ebd104d 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -664,13 +664,15 @@ struct governor_attr {
>  #define CPUFREQ_ENTRY_INVALID  ~0u
>  #define CPUFREQ_TABLE_END      ~1u
>  /* Special Values of .flags field */
> -#define CPUFREQ_BOOST_FREQ     (1 << 0)
> +#define CPUFREQ_BOOST_FREQ      (1 << 0)
> +#define CPUFREQ_INEFFICIENT_FREQ (1 << 1)
>
>  struct cpufreq_frequency_table {
>         unsigned int    flags;
>         unsigned int    driver_data; /* driver specific data, not used by core */
>         unsigned int    frequency; /* kHz - doesn't need to be in ascending
>                                     * order */
> +       unsigned int    efficient; /* idx of an efficient frequency */

It looks to me like having just one efficient frequency index here may
not work in general (I'll explain this in a reply to the next patch).

Also, it's a bit weird to kind of point it to self for the efficient ones.

>  };
>
>  #if defined(CONFIG_CPU_FREQ) && defined(CONFIG_PM_OPP)
> @@ -762,6 +764,7 @@ int cpufreq_boost_trigger_state(int state);
>  int cpufreq_boost_enabled(void);
>  int cpufreq_enable_boost_support(void);
>  bool policy_has_boost_freq(struct cpufreq_policy *policy);
> +int cpufreq_table_update_efficiencies(struct cpufreq_policy *policy);
>
>  /* Find lowest freq at or above target in a table in ascending order */
>  static inline int cpufreq_table_find_index_al(struct cpufreq_policy *policy,
> @@ -1003,6 +1006,29 @@ static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy
>
>         return count;
>  }
> +
> +/**
> + * cpufreq_table_set_inefficient() - Mark a frequency as inefficient
> + *
> + * @policy:    the &struct cpufreq_policy containing the inefficient frequency
> + * @frequency: the inefficient frequency
> + *
> + * Once inefficiencies marked, the efficient resolution must be updated with the
> + * function cpufreq_table_update_efficiencies().
> + */
> +static inline void
> +cpufreq_table_set_inefficient(const struct cpufreq_policy *policy,
> +                             unsigned int frequency)
> +{
> +       struct cpufreq_frequency_table *pos;
> +
> +       cpufreq_for_each_valid_entry(pos, policy->freq_table) {
> +               if (pos->frequency == frequency) {
> +                       pos->flags |= CPUFREQ_INEFFICIENT_FREQ;
> +                       break;
> +               }
> +       }
> +}
>  #else
>  static inline int cpufreq_boost_trigger_state(int state)
>  {
> @@ -1022,6 +1048,16 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
>  {
>         return false;
>  }
> +
> +static inline void
> +cpufreq_table_set_inefficient(const struct cpufreq_policy *policy,
> +                             unsigned int frequency) {}
> +
> +static inline int
> +cpufreq_table_update_efficiencies(struct cpufreq_policy *policy)
> +{
> +       return -EINVAL;
> +}
>  #endif
>
>  #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-08-31 10:24 ` [PATCH v6 6/7] cpufreq: Skip " Vincent Donnefort
  2021-09-01  5:30   ` Viresh Kumar
@ 2021-09-01 18:13   ` Rafael J. Wysocki
  2021-09-02 10:50     ` Vincent Donnefort
  1 sibling, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2021-09-01 18:13 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar, Vincent Guittot,
	Quentin Perret, Linux PM, Ionela Voinescu, Lukasz Luba,
	Dietmar Eggemann, Matthias Kaehlcke

On Tue, Aug 31, 2021 at 12:24 PM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> CPUFreq governors that do DVFS (i.e. CPUFREQ_GOV_DYNAMIC_SWITCHING flag)
> can skip frequencies marked as inefficient, as long as the efficient
> frequency found meet the policy maximum requirement.
>
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 7d5f170ecad1..b46fe2d7baf1 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2295,6 +2295,44 @@ __weak struct cpufreq_governor *cpufreq_fallback_governor(void)
>         return NULL;
>  }
>
> +static inline bool
> +cpufreq_can_skip_inefficiencies(struct cpufreq_policy *policy)

This is not just about the ability to skp the inefficient frequencies,
but about whether or not they should be skipped.  Moreover, the
inefficient frequencies are not really skipped, but mapped to specific
efficient ones.

I would call this function cpufreq_use_efficient_frequencies() or similar.

> +{
> +       struct cpufreq_frequency_table *pos;
> +       bool valid = false;
> +       int idx;
> +
> +       if (!(policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING))
> +               return false;
> +
> +       if (policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED)
> +               return false;
> +
> +       /* Is there at least one inefficiency ? */
> +       cpufreq_for_each_valid_entry(pos, policy->freq_table) {
> +               if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
> +                       valid = true;
> +                       break;
> +               }
> +       }
> +
> +       if (!valid)
> +               return false;
> +
> +       /*
> +        * Has cpufreq_table_update_efficiencies been called? i.e. is the
> +        * highest frequency efficient.
> +        */
> +       cpufreq_for_each_valid_entry_idx(pos, policy->freq_table, idx) {
> +               valid = !!(idx == pos->efficient);
> +               if (policy->freq_table_sorted ==
> +                                       CPUFREQ_TABLE_SORTED_DESCENDING)
> +                       break;
> +       }
> +
> +       return valid;
> +}
> +
>  static int cpufreq_init_governor(struct cpufreq_policy *policy)
>  {
>         int ret;
> @@ -2337,6 +2375,7 @@ static int cpufreq_init_governor(struct cpufreq_policy *policy)
>         }
>
>         policy->strict_target = !!(policy->governor->flags & CPUFREQ_GOV_STRICT_TARGET);
> +       policy->skip_inefficiencies = cpufreq_can_skip_inefficiencies(policy);
>
>         return 0;
>  }
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 4e901ebd104d..cb09afbf01e2 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -117,6 +117,13 @@ struct cpufreq_policy {
>         bool                    strict_target;
>
>         /*
> +        * Set if the CPUFREQ_GOV_DYNAMIC_SWITCHING flag is set for the current
> +        * governor and if inefficient frequencies were found in the frequency
> +        * table.
> +        */
> +       bool                    skip_inefficiencies;
> +
> +       /*
>          * Preferred average time interval between consecutive invocations of
>          * the driver to set the frequency for this policy.  To be set by the
>          * scaling driver (0, which is the default, means no preference).
> @@ -972,25 +979,46 @@ static inline int cpufreq_table_find_index_c(struct cpufreq_policy *policy,
>                 return cpufreq_table_find_index_dc(policy, target_freq);
>  }
>
> +static inline unsigned int
> +cpufreq_frequency_find_efficient(struct cpufreq_policy *policy,
> +                                unsigned int idx)
> +{
> +       struct cpufreq_frequency_table *table = policy->freq_table;
> +       unsigned int efficient_idx = table[idx].efficient;
> +
> +       return table[efficient_idx].frequency <= policy->max ? efficient_idx :
> +               idx;

I'm not sure about this.

In the _RELATION_L case table[idx].frequency can be above the policy
max, so you may end up running at an inefficient frequency above the
policy max, but you won't use an efficient one above it.  Isn't this
slightly confusing?

> +}
> +
>  static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
>                                                  unsigned int target_freq,
>                                                  unsigned int relation)
>  {
> +       int idx;
> +
>         if (unlikely(policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED))
>                 return cpufreq_table_index_unsorted(policy, target_freq,
>                                                     relation);

Don't you want to take this case into account?

>
>         switch (relation) {
>         case CPUFREQ_RELATION_L:
> -               return cpufreq_table_find_index_l(policy, target_freq);
> +               idx = cpufreq_table_find_index_l(policy, target_freq);
> +               break;
>         case CPUFREQ_RELATION_H:
> -               return cpufreq_table_find_index_h(policy, target_freq);
> +               idx = cpufreq_table_find_index_h(policy, target_freq);
> +               break;
>         case CPUFREQ_RELATION_C:
> -               return cpufreq_table_find_index_c(policy, target_freq);
> +               idx = cpufreq_table_find_index_c(policy, target_freq);
> +               break;
>         default:
>                 WARN_ON_ONCE(1);
>                 return 0;
>         }
> +
> +       if (policy->skip_inefficiencies)
> +               idx = cpufreq_frequency_find_efficient(policy, idx);

Here, it matters which _RELATION_ is used.  For instance, in the
RELATION_H case, the frequency used cannot be above the target, which
is not guaranteed now AFAICS.

I would just really skip the inefficient frequencies as though they
were not there in the table, and then refine the search to take the
inefficient ones into account when the policy max limit is in effect
(which also depends on the relation type AFAICS).

In addition to the above, please note that __cpufreq_driver_target()
is not only called by governors and at least in the suspend frequency
case I don't see why it should be an efficient one even if
skip_inefficiencies is set.

> +
> +       return idx;
>  }
>
>  static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy *policy)
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-01 18:13   ` Rafael J. Wysocki
@ 2021-09-02 10:50     ` Vincent Donnefort
  2021-09-02 13:09       ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2021-09-02 10:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar, Vincent Guittot,
	Quentin Perret, Linux PM, Ionela Voinescu, Lukasz Luba,
	Dietmar Eggemann, Matthias Kaehlcke

On Wed, Sep 01, 2021 at 08:13:24PM +0200, Rafael J. Wysocki wrote:
> On Tue, Aug 31, 2021 at 12:24 PM Vincent Donnefort
> <vincent.donnefort@arm.com> wrote:
> >
> > CPUFreq governors that do DVFS (i.e. CPUFREQ_GOV_DYNAMIC_SWITCHING flag)
> > can skip frequencies marked as inefficient, as long as the efficient
> > frequency found meet the policy maximum requirement.
> >
> > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 7d5f170ecad1..b46fe2d7baf1 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -2295,6 +2295,44 @@ __weak struct cpufreq_governor *cpufreq_fallback_governor(void)
> >         return NULL;
> >  }
> >
> > +static inline bool
> > +cpufreq_can_skip_inefficiencies(struct cpufreq_policy *policy)
> 
> This is not just about the ability to skp the inefficient frequencies,
> but about whether or not they should be skipped.  Moreover, the
> inefficient frequencies are not really skipped, but mapped to specific
> efficient ones.
> 
> I would call this function cpufreq_use_efficient_frequencies() or similar.

Ack.

> 
> > +{
> > +       struct cpufreq_frequency_table *pos;
> > +       bool valid = false;
> > +       int idx;
> > +
> > +       if (!(policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING))
> > +               return false;
> > +
> > +       if (policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED)
> > +               return false;
> > +
> > +       /* Is there at least one inefficiency ? */
> > +       cpufreq_for_each_valid_entry(pos, policy->freq_table) {
> > +               if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
> > +                       valid = true;
> > +                       break;
> > +               }
> > +       }
> > +
> > +       if (!valid)
> > +               return false;
> > +
> > +       /*
> > +        * Has cpufreq_table_update_efficiencies been called? i.e. is the
> > +        * highest frequency efficient.
> > +        */
> > +       cpufreq_for_each_valid_entry_idx(pos, policy->freq_table, idx) {
> > +               valid = !!(idx == pos->efficient);
> > +               if (policy->freq_table_sorted ==
> > +                                       CPUFREQ_TABLE_SORTED_DESCENDING)
> > +                       break;
> > +       }
> > +
> > +       return valid;
> > +}
> > +
> >  static int cpufreq_init_governor(struct cpufreq_policy *policy)
> >  {
> >         int ret;
> > @@ -2337,6 +2375,7 @@ static int cpufreq_init_governor(struct cpufreq_policy *policy)
> >         }
> >
> >         policy->strict_target = !!(policy->governor->flags & CPUFREQ_GOV_STRICT_TARGET);
> > +       policy->skip_inefficiencies = cpufreq_can_skip_inefficiencies(policy);
> >
> >         return 0;
> >  }
> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > index 4e901ebd104d..cb09afbf01e2 100644
> > --- a/include/linux/cpufreq.h
> > +++ b/include/linux/cpufreq.h
> > @@ -117,6 +117,13 @@ struct cpufreq_policy {
> >         bool                    strict_target;
> >
> >         /*
> > +        * Set if the CPUFREQ_GOV_DYNAMIC_SWITCHING flag is set for the current
> > +        * governor and if inefficient frequencies were found in the frequency
> > +        * table.
> > +        */
> > +       bool                    skip_inefficiencies;
> > +
> > +       /*
> >          * Preferred average time interval between consecutive invocations of
> >          * the driver to set the frequency for this policy.  To be set by the
> >          * scaling driver (0, which is the default, means no preference).
> > @@ -972,25 +979,46 @@ static inline int cpufreq_table_find_index_c(struct cpufreq_policy *policy,
> >                 return cpufreq_table_find_index_dc(policy, target_freq);
> >  }
> >
> > +static inline unsigned int
> > +cpufreq_frequency_find_efficient(struct cpufreq_policy *policy,
> > +                                unsigned int idx)
> > +{
> > +       struct cpufreq_frequency_table *table = policy->freq_table;
> > +       unsigned int efficient_idx = table[idx].efficient;
> > +
> > +       return table[efficient_idx].frequency <= policy->max ? efficient_idx :
> > +               idx;
> 
> I'm not sure about this.
> 
> In the _RELATION_L case table[idx].frequency can be above the policy
> max, so you may end up running at an inefficient frequency above the
> policy max, but you won't use an efficient one above it.  Isn't this
> slightly confusing?

This can indeed happen when policy->max isn't equal to an available frequency.
But nontheless, we can't let the efficient resolution violate the policy->max,
which is used for thermal capping. The fact that we can overshoot a max
limit is confusing as well.

So I could add a policy->max sanity, to make sure this value is an actual
frequency and that RELATION_L will never overshoot that value. Or we can have a
flag somewhere to indicate thermal capping is happening and we shouldn't skip
inefficient frequencies.

> 
> > +}
> > +
> >  static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
> >                                                  unsigned int target_freq,
> >                                                  unsigned int relation)
> >  {
> > +       int idx;
> > +
> >         if (unlikely(policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED))
> >                 return cpufreq_table_index_unsorted(policy, target_freq,
> >                                                     relation);
> 
> Don't you want to take this case into account?

As the inefficient frequencies are only configured by the EM so far, and this
latter needs sorted frequencies to work, I didn't add support for unsorted
table.

> 
> >
> >         switch (relation) {
> >         case CPUFREQ_RELATION_L:
> > -               return cpufreq_table_find_index_l(policy, target_freq);
> > +               idx = cpufreq_table_find_index_l(policy, target_freq);
> > +               break;
> >         case CPUFREQ_RELATION_H:
> > -               return cpufreq_table_find_index_h(policy, target_freq);
> > +               idx = cpufreq_table_find_index_h(policy, target_freq);
> > +               break;
> >         case CPUFREQ_RELATION_C:
> > -               return cpufreq_table_find_index_c(policy, target_freq);
> > +               idx = cpufreq_table_find_index_c(policy, target_freq);
> > +               break;
> >         default:
> >                 WARN_ON_ONCE(1);
> >                 return 0;
> >         }
> > +
> > +       if (policy->skip_inefficiencies)
> > +               idx = cpufreq_frequency_find_efficient(policy, idx);
> 
> Here, it matters which _RELATION_ is used.  For instance, in the
> RELATION_H case, the frequency used cannot be above the target, which
> is not guaranteed now AFAICS.

RELATION_H is used in ondemand, when powersave_bias_target is set. In that
case, it doesn't seem to be an issue to overshoot the target freq in that case.

> 
> I would just really skip the inefficient frequencies as though they
> were not there in the table, and then refine the search to take the
> inefficient ones into account when the policy max limit is in effect
> (which also depends on the relation type AFAICS).
> 
> In addition to the above, please note that __cpufreq_driver_target()
> is not only called by governors and at least in the suspend frequency
> case I don't see why it should be an efficient one even if
> skip_inefficiencies is set.

Then we need to separate the __cpufreq_driver_target() issued from DVFS
governors and the others. I can bring back the previous RELATION_E
and even have RELATION_LE and RELATION_HE to not leave behind
ondemands's powerbias?

> 
> > +
> > +       return idx;
> >  }
> >
> >  static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy *policy)
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-02 10:50     ` Vincent Donnefort
@ 2021-09-02 13:09       ` Rafael J. Wysocki
  2021-09-02 13:49         ` Vincent Donnefort
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2021-09-02 13:09 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Peter Zijlstra, Rafael J. Wysocki,
	Viresh Kumar, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Lukasz Luba, Dietmar Eggemann,
	Matthias Kaehlcke

On Thu, Sep 2, 2021 at 12:50 PM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> On Wed, Sep 01, 2021 at 08:13:24PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Aug 31, 2021 at 12:24 PM Vincent Donnefort
> > <vincent.donnefort@arm.com> wrote:
> > >
> > > CPUFreq governors that do DVFS (i.e. CPUFREQ_GOV_DYNAMIC_SWITCHING flag)
> > > can skip frequencies marked as inefficient, as long as the efficient
> > > frequency found meet the policy maximum requirement.
> > >
> > > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> > >
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index 7d5f170ecad1..b46fe2d7baf1 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -2295,6 +2295,44 @@ __weak struct cpufreq_governor *cpufreq_fallback_governor(void)
> > >         return NULL;
> > >  }
> > >
> > > +static inline bool
> > > +cpufreq_can_skip_inefficiencies(struct cpufreq_policy *policy)
> >
> > This is not just about the ability to skp the inefficient frequencies,
> > but about whether or not they should be skipped.  Moreover, the
> > inefficient frequencies are not really skipped, but mapped to specific
> > efficient ones.
> >
> > I would call this function cpufreq_use_efficient_frequencies() or similar.
>
> Ack.
>
> >
> > > +{
> > > +       struct cpufreq_frequency_table *pos;
> > > +       bool valid = false;
> > > +       int idx;
> > > +
> > > +       if (!(policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING))
> > > +               return false;
> > > +
> > > +       if (policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED)
> > > +               return false;
> > > +
> > > +       /* Is there at least one inefficiency ? */
> > > +       cpufreq_for_each_valid_entry(pos, policy->freq_table) {
> > > +               if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
> > > +                       valid = true;
> > > +                       break;
> > > +               }
> > > +       }
> > > +
> > > +       if (!valid)
> > > +               return false;
> > > +
> > > +       /*
> > > +        * Has cpufreq_table_update_efficiencies been called? i.e. is the
> > > +        * highest frequency efficient.
> > > +        */
> > > +       cpufreq_for_each_valid_entry_idx(pos, policy->freq_table, idx) {
> > > +               valid = !!(idx == pos->efficient);
> > > +               if (policy->freq_table_sorted ==
> > > +                                       CPUFREQ_TABLE_SORTED_DESCENDING)
> > > +                       break;
> > > +       }
> > > +
> > > +       return valid;
> > > +}
> > > +
> > >  static int cpufreq_init_governor(struct cpufreq_policy *policy)
> > >  {
> > >         int ret;
> > > @@ -2337,6 +2375,7 @@ static int cpufreq_init_governor(struct cpufreq_policy *policy)
> > >         }
> > >
> > >         policy->strict_target = !!(policy->governor->flags & CPUFREQ_GOV_STRICT_TARGET);
> > > +       policy->skip_inefficiencies = cpufreq_can_skip_inefficiencies(policy);
> > >
> > >         return 0;
> > >  }
> > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > > index 4e901ebd104d..cb09afbf01e2 100644
> > > --- a/include/linux/cpufreq.h
> > > +++ b/include/linux/cpufreq.h
> > > @@ -117,6 +117,13 @@ struct cpufreq_policy {
> > >         bool                    strict_target;
> > >
> > >         /*
> > > +        * Set if the CPUFREQ_GOV_DYNAMIC_SWITCHING flag is set for the current
> > > +        * governor and if inefficient frequencies were found in the frequency
> > > +        * table.
> > > +        */
> > > +       bool                    skip_inefficiencies;
> > > +
> > > +       /*
> > >          * Preferred average time interval between consecutive invocations of
> > >          * the driver to set the frequency for this policy.  To be set by the
> > >          * scaling driver (0, which is the default, means no preference).
> > > @@ -972,25 +979,46 @@ static inline int cpufreq_table_find_index_c(struct cpufreq_policy *policy,
> > >                 return cpufreq_table_find_index_dc(policy, target_freq);
> > >  }
> > >
> > > +static inline unsigned int
> > > +cpufreq_frequency_find_efficient(struct cpufreq_policy *policy,
> > > +                                unsigned int idx)
> > > +{
> > > +       struct cpufreq_frequency_table *table = policy->freq_table;
> > > +       unsigned int efficient_idx = table[idx].efficient;
> > > +
> > > +       return table[efficient_idx].frequency <= policy->max ? efficient_idx :
> > > +               idx;
> >
> > I'm not sure about this.
> >
> > In the _RELATION_L case table[idx].frequency can be above the policy
> > max, so you may end up running at an inefficient frequency above the
> > policy max, but you won't use an efficient one above it.  Isn't this
> > slightly confusing?
>
> This can indeed happen when policy->max isn't equal to an available frequency.
> But nontheless, we can't let the efficient resolution violate the policy->max,
> which is used for thermal capping. The fact that we can overshoot a max
> limit is confusing as well.
>
> So I could add a policy->max sanity, to make sure this value is an actual
> frequency and that RELATION_L will never overshoot that value. Or we can have a
> flag somewhere to indicate thermal capping is happening and we shouldn't skip
> inefficient frequencies.

I would prefer the first option, because user space may be applying
the limit for power capping or thermal reasons too and the kernel
can't really tell why it is doing that.

This needs to be added to cpufreq_set_policy(), I think after calling
the driver's ->verify().

And if this is done to the policy max, IMO it needs to be done to the
policy min too, for consistency.  So if a frequency table is used, the
policy max and the policy min could be only the frequencies present in
the table.

Moreover, if only efficient frequencies are to be used, RELATION_L
needs to return min(policy->max, the closest efficient frequency equal
to or above the target).

> >
> > > +}
> > > +
> > >  static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
> > >                                                  unsigned int target_freq,
> > >                                                  unsigned int relation)
> > >  {
> > > +       int idx;
> > > +
> > >         if (unlikely(policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED))
> > >                 return cpufreq_table_index_unsorted(policy, target_freq,
> > >                                                     relation);
> >
> > Don't you want to take this case into account?
>
> As the inefficient frequencies are only configured by the EM so far, and this
> latter needs sorted frequencies to work, I didn't add support for unsorted
> table.

So the assumption is that if inefficient OPPs are present in the
frequency table, it must be sorted.

It would be prudent to add a comment documenting this assumption, then.

> >
> > >
> > >         switch (relation) {
> > >         case CPUFREQ_RELATION_L:
> > > -               return cpufreq_table_find_index_l(policy, target_freq);
> > > +               idx = cpufreq_table_find_index_l(policy, target_freq);
> > > +               break;
> > >         case CPUFREQ_RELATION_H:
> > > -               return cpufreq_table_find_index_h(policy, target_freq);
> > > +               idx = cpufreq_table_find_index_h(policy, target_freq);
> > > +               break;
> > >         case CPUFREQ_RELATION_C:
> > > -               return cpufreq_table_find_index_c(policy, target_freq);
> > > +               idx = cpufreq_table_find_index_c(policy, target_freq);
> > > +               break;
> > >         default:
> > >                 WARN_ON_ONCE(1);
> > >                 return 0;
> > >         }
> > > +
> > > +       if (policy->skip_inefficiencies)
> > > +               idx = cpufreq_frequency_find_efficient(policy, idx);
> >
> > Here, it matters which _RELATION_ is used.  For instance, in the
> > RELATION_H case, the frequency used cannot be above the target, which
> > is not guaranteed now AFAICS.
>
> RELATION_H is used in ondemand, when powersave_bias_target is set. In that
> case, it doesn't seem to be an issue to overshoot the target freq in that case.

It actually is used in multiple places, including the performance
governor, so changing the behavior of it is generally risky.

> >
> > I would just really skip the inefficient frequencies as though they
> > were not there in the table, and then refine the search to take the
> > inefficient ones into account when the policy max limit is in effect
> > (which also depends on the relation type AFAICS).
> >
> > In addition to the above, please note that __cpufreq_driver_target()
> > is not only called by governors and at least in the suspend frequency
> > case I don't see why it should be an efficient one even if
> > skip_inefficiencies is set.
>
> Then we need to separate the __cpufreq_driver_target() issued from DVFS
> governors and the others.

I think so.

> I can bring back the previous RELATION_E
> and even have RELATION_LE and RELATION_HE to not leave behind
> ondemands's powerbias?

Or a flag to be ORed with the relation to indicate that the request
doesn't come from a DVFS governor (which basically boils down to the
same thing).

> >
> > > +
> > > +       return idx;
> > >  }
> > >
> > >  static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy *policy)
> > > --
> > > 2.7.4
> > >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-02 13:09       ` Rafael J. Wysocki
@ 2021-09-02 13:49         ` Vincent Donnefort
  2021-09-03 18:14           ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2021-09-02 13:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar, Vincent Guittot,
	Quentin Perret, Linux PM, Ionela Voinescu, Lukasz Luba,
	Dietmar Eggemann, Matthias Kaehlcke

[...]

> > > >
> > > > +static inline unsigned int
> > > > +cpufreq_frequency_find_efficient(struct cpufreq_policy *policy,
> > > > +                                unsigned int idx)
> > > > +{
> > > > +       struct cpufreq_frequency_table *table = policy->freq_table;
> > > > +       unsigned int efficient_idx = table[idx].efficient;
> > > > +
> > > > +       return table[efficient_idx].frequency <= policy->max ? efficient_idx :
> > > > +               idx;
> > >
> > > I'm not sure about this.
> > >
> > > In the _RELATION_L case table[idx].frequency can be above the policy
> > > max, so you may end up running at an inefficient frequency above the
> > > policy max, but you won't use an efficient one above it.  Isn't this
> > > slightly confusing?
> >
> > This can indeed happen when policy->max isn't equal to an available frequency.
> > But nontheless, we can't let the efficient resolution violate the policy->max,
> > which is used for thermal capping. The fact that we can overshoot a max
> > limit is confusing as well.
> >
> > So I could add a policy->max sanity, to make sure this value is an actual
> > frequency and that RELATION_L will never overshoot that value. Or we can have a
> > flag somewhere to indicate thermal capping is happening and we shouldn't skip
> > inefficient frequencies.
> 
> I would prefer the first option, because user space may be applying
> the limit for power capping or thermal reasons too and the kernel
> can't really tell why it is doing that.
> 
> This needs to be added to cpufreq_set_policy(), I think after calling
> the driver's ->verify().
> 
> And if this is done to the policy max, IMO it needs to be done to the
> policy min too, for consistency.  So if a frequency table is used, the
> policy max and the policy min could be only the frequencies present in
> the table.

Ack. I'll consolidate policy->max and ->min set behavior in v7 so we won't have
any problem having to know who set policy->max and if yes or no we can break it.

What relation do we want for min/max setting? I guess RELATION_H for policy->max
and RELATION_L for policy->min (i.e. The highest frequency existing right below
the maximum and the lowest frequency existing just above the minimum)

> 
> Moreover, if only efficient frequencies are to be used, RELATION_L
> needs to return min(policy->max, the closest efficient frequency equal
> to or above the target).

You mean, never returning an inefficient frequency, unless there are no
efficient ones in the range policy->min policy->max ? 

The problem is a frequency is inefficient compared to a higher one. If we have
F, inefficient, while F+1 and F-1 are efficient and policy->max restricting to F.
If we return F-1, because it is the highest _efficient_ frequency allowed, we'll
give a frequency that can do less than originally requested. It would downgrade
the performance, compared to what the platform is really allowed to.

> 
> > >
> > > > +}
> > > > +
> > > >  static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
> > > >                                                  unsigned int target_freq,
> > > >                                                  unsigned int relation)
> > > >  {
> > > > +       int idx;
> > > > +
> > > >         if (unlikely(policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED))
> > > >                 return cpufreq_table_index_unsorted(policy, target_freq,
> > > >                                                     relation);
> > >
> > > Don't you want to take this case into account?
> >
> > As the inefficient frequencies are only configured by the EM so far, and this
> > latter needs sorted frequencies to work, I didn't add support for unsorted
> > table.
> 
> So the assumption is that if inefficient OPPs are present in the
> frequency table, it must be sorted.
> 
> It would be prudent to add a comment documenting this assumption, then.

Ack. I'll add that to the documentation.

> 
> > >
> > > >
> > > >         switch (relation) {
> > > >         case CPUFREQ_RELATION_L:
> > > > -               return cpufreq_table_find_index_l(policy, target_freq);
> > > > +               idx = cpufreq_table_find_index_l(policy, target_freq);
> > > > +               break;
> > > >         case CPUFREQ_RELATION_H:
> > > > -               return cpufreq_table_find_index_h(policy, target_freq);
> > > > +               idx = cpufreq_table_find_index_h(policy, target_freq);
> > > > +               break;
> > > >         case CPUFREQ_RELATION_C:
> > > > -               return cpufreq_table_find_index_c(policy, target_freq);
> > > > +               idx = cpufreq_table_find_index_c(policy, target_freq);
> > > > +               break;
> > > >         default:
> > > >                 WARN_ON_ONCE(1);
> > > >                 return 0;
> > > >         }
> > > > +
> > > > +       if (policy->skip_inefficiencies)
> > > > +               idx = cpufreq_frequency_find_efficient(policy, idx);
> > >
> > > Here, it matters which _RELATION_ is used.  For instance, in the
> > > RELATION_H case, the frequency used cannot be above the target, which
> > > is not guaranteed now AFAICS.
> >
> > RELATION_H is used in ondemand, when powersave_bias_target is set. In that
> > case, it doesn't seem to be an issue to overshoot the target freq in that case.
> 
> It actually is used in multiple places, including the performance
> governor, so changing the behavior of it is generally risky.
> 
> > >
> > > I would just really skip the inefficient frequencies as though they
> > > were not there in the table, and then refine the search to take the
> > > inefficient ones into account when the policy max limit is in effect
> > > (which also depends on the relation type AFAICS).
> > >
> > > In addition to the above, please note that __cpufreq_driver_target()
> > > is not only called by governors and at least in the suspend frequency
> > > case I don't see why it should be an efficient one even if
> > > skip_inefficiencies is set.
> >
> > Then we need to separate the __cpufreq_driver_target() issued from DVFS
> > governors and the others.
> 
> I think so.
> 
> > I can bring back the previous RELATION_E
> > and even have RELATION_LE and RELATION_HE to not leave behind
> > ondemands's powerbias?
> 
> Or a flag to be ORed with the relation to indicate that the request
> doesn't come from a DVFS governor (which basically boils down to the
> same thing).

Ok. A flag is probably better, so we don't need to redefine each relation with
an "efficient" version.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-02 13:49         ` Vincent Donnefort
@ 2021-09-03 18:14           ` Rafael J. Wysocki
  2021-09-06  8:17             ` Vincent Donnefort
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2021-09-03 18:14 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Peter Zijlstra, Rafael J. Wysocki,
	Viresh Kumar, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Lukasz Luba, Dietmar Eggemann,
	Matthias Kaehlcke

On Thu, Sep 2, 2021 at 3:49 PM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> [...]
>
> > > > >
> > > > > +static inline unsigned int
> > > > > +cpufreq_frequency_find_efficient(struct cpufreq_policy *policy,
> > > > > +                                unsigned int idx)
> > > > > +{
> > > > > +       struct cpufreq_frequency_table *table = policy->freq_table;
> > > > > +       unsigned int efficient_idx = table[idx].efficient;
> > > > > +
> > > > > +       return table[efficient_idx].frequency <= policy->max ? efficient_idx :
> > > > > +               idx;
> > > >
> > > > I'm not sure about this.
> > > >
> > > > In the _RELATION_L case table[idx].frequency can be above the policy
> > > > max, so you may end up running at an inefficient frequency above the
> > > > policy max, but you won't use an efficient one above it.  Isn't this
> > > > slightly confusing?
> > >
> > > This can indeed happen when policy->max isn't equal to an available frequency.
> > > But nontheless, we can't let the efficient resolution violate the policy->max,
> > > which is used for thermal capping. The fact that we can overshoot a max
> > > limit is confusing as well.
> > >
> > > So I could add a policy->max sanity, to make sure this value is an actual
> > > frequency and that RELATION_L will never overshoot that value. Or we can have a
> > > flag somewhere to indicate thermal capping is happening and we shouldn't skip
> > > inefficient frequencies.
> >
> > I would prefer the first option, because user space may be applying
> > the limit for power capping or thermal reasons too and the kernel
> > can't really tell why it is doing that.
> >
> > This needs to be added to cpufreq_set_policy(), I think after calling
> > the driver's ->verify().
> >
> > And if this is done to the policy max, IMO it needs to be done to the
> > policy min too, for consistency.  So if a frequency table is used, the
> > policy max and the policy min could be only the frequencies present in
> > the table.
>
> Ack. I'll consolidate policy->max and ->min set behavior in v7 so we won't have
> any problem having to know who set policy->max and if yes or no we can break it.
>
> What relation do we want for min/max setting? I guess RELATION_H for policy->max
> and RELATION_L for policy->min (i.e. The highest frequency existing right below
> the maximum and the lowest frequency existing just above the minimum)

Yes.

> >
> > Moreover, if only efficient frequencies are to be used, RELATION_L
> > needs to return min(policy->max, the closest efficient frequency equal
> > to or above the target).
>
> You mean, never returning an inefficient frequency, unless there are no
> efficient ones in the range policy->min policy->max ?

No, that's not what I mean.

First note that the target here is clamped between the policy min and
max.  Also bear in mind that each of them is a frequency from the
table, either efficient or inefficient.

In the first step, search through the efficient frequencies only.
That will return you something at or above the target.  If it is at
the target, you're done.  If it is above the target, it may be either
within or above the policy max.  If it is within the policy max,
you're done.  If it is above the policy max, you need to search
through the inefficient frequencies between the target and the policy
max (and you know that there is at least one - the policy max itself).

So what I said previously wasn't particularly precise, sorry about that.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-03 18:14           ` Rafael J. Wysocki
@ 2021-09-06  8:17             ` Vincent Donnefort
  2021-09-06 12:08               ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2021-09-06  8:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar, Vincent Guittot,
	Quentin Perret, Linux PM, Ionela Voinescu, Lukasz Luba,
	Dietmar Eggemann, Matthias Kaehlcke

[...]

> > >
> > > Moreover, if only efficient frequencies are to be used, RELATION_L
> > > needs to return min(policy->max, the closest efficient frequency equal
> > > to or above the target).
> >
> > You mean, never returning an inefficient frequency, unless there are no
> > efficient ones in the range policy->min policy->max ?
> 
> No, that's not what I mean.
> 
> First note that the target here is clamped between the policy min and
> max.  Also bear in mind that each of them is a frequency from the
> table, either efficient or inefficient.
> 
> In the first step, search through the efficient frequencies only.
> That will return you something at or above the target.  If it is at
> the target, you're done.  If it is above the target, it may be either
> within or above the policy max.  If it is within the policy max,
> you're done.  If it is above the policy max, you need to search
> through the inefficient frequencies between the target and the policy
> max (and you know that there is at least one - the policy max itself).
> 
> So what I said previously wasn't particularly precise, sorry about that.

I might have missed something but it seems equivalent to what's currently done:

Find the appropriate frequency, if inefficient go to the efficient one, if
above policy->max return the original inefficient frequency.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-06  8:17             ` Vincent Donnefort
@ 2021-09-06 12:08               ` Rafael J. Wysocki
  2021-09-06 12:53                 ` Vincent Donnefort
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2021-09-06 12:08 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Peter Zijlstra, Rafael J. Wysocki,
	Viresh Kumar, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Lukasz Luba, Dietmar Eggemann,
	Matthias Kaehlcke

On Mon, Sep 6, 2021 at 10:17 AM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> [...]
>
> > > >
> > > > Moreover, if only efficient frequencies are to be used, RELATION_L
> > > > needs to return min(policy->max, the closest efficient frequency equal
> > > > to or above the target).
> > >
> > > You mean, never returning an inefficient frequency, unless there are no
> > > efficient ones in the range policy->min policy->max ?
> >
> > No, that's not what I mean.
> >
> > First note that the target here is clamped between the policy min and
> > max.  Also bear in mind that each of them is a frequency from the
> > table, either efficient or inefficient.
> >
> > In the first step, search through the efficient frequencies only.
> > That will return you something at or above the target.  If it is at
> > the target, you're done.  If it is above the target, it may be either
> > within or above the policy max.  If it is within the policy max,
> > you're done.  If it is above the policy max, you need to search
> > through the inefficient frequencies between the target and the policy
> > max (and you know that there is at least one - the policy max itself).
> >
> > So what I said previously wasn't particularly precise, sorry about that.
>
> I might have missed something but it seems equivalent to what's currently done:
>
> Find the appropriate frequency, if inefficient go to the efficient one, if
> above policy->max return the original inefficient frequency.

It may or may not be equivalent depending on what the efficient one is.

And what is there now doesn't work for RELATION_H if I'm not mistaken.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-06 12:08               ` Rafael J. Wysocki
@ 2021-09-06 12:53                 ` Vincent Donnefort
  2021-09-06 15:16                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2021-09-06 12:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar, Vincent Guittot,
	Quentin Perret, Linux PM, Ionela Voinescu, Lukasz Luba,
	Dietmar Eggemann, Matthias Kaehlcke

On Mon, Sep 06, 2021 at 02:08:36PM +0200, Rafael J. Wysocki wrote:
> On Mon, Sep 6, 2021 at 10:17 AM Vincent Donnefort
> <vincent.donnefort@arm.com> wrote:
> >
> > [...]
> >
> > > > >
> > > > > Moreover, if only efficient frequencies are to be used, RELATION_L
> > > > > needs to return min(policy->max, the closest efficient frequency equal
> > > > > to or above the target).
> > > >
> > > > You mean, never returning an inefficient frequency, unless there are no
> > > > efficient ones in the range policy->min policy->max ?
> > >
> > > No, that's not what I mean.
> > >
> > > First note that the target here is clamped between the policy min and
> > > max.  Also bear in mind that each of them is a frequency from the
> > > table, either efficient or inefficient.
> > >
> > > In the first step, search through the efficient frequencies only.
> > > That will return you something at or above the target.  If it is at
> > > the target, you're done.  If it is above the target, it may be either
> > > within or above the policy max.  If it is within the policy max,
> > > you're done.  If it is above the policy max, you need to search
> > > through the inefficient frequencies between the target and the policy
> > > max (and you know that there is at least one - the policy max itself).
> > >
> > > So what I said previously wasn't particularly precise, sorry about that.
> >
> > I might have missed something but it seems equivalent to what's currently done:
> >
> > Find the appropriate frequency, if inefficient go to the efficient one, if
> > above policy->max return the original inefficient frequency.
> 
> It may or may not be equivalent depending on what the efficient one is.
> 
> And what is there now doesn't work for RELATION_H if I'm not mistaken.

With the current approach, RELATION_H would find the best frequency and then
resolve it to an efficient one if possible. The efficient one might overshoot
the target_freq, but isn't it a good thing ?
 
 - If we consider only the efficient freqs in a first place, there's a risk we
   would return a frequency way smaller than the actual request.

 - What are the gain in capping RELATION_H to the target_freq if we can find a
   frequency higher than the request that wouldn't use more energy than the
   target?

RELATION_H is used in two DVFS governors: conservative and ondemand:

  - Ondemand is using RELATION_H in the context of powerbias. It seems a usecase
    where it is not a problem to overshoot RELATION_H.

  - Conservative is using RELATION_H when the load is increasing. Same as
    ondemand, I don't think we would have any gain by not overshooting the
    RELATION_H target.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-06 12:53                 ` Vincent Donnefort
@ 2021-09-06 15:16                   ` Rafael J. Wysocki
  2021-09-06 15:32                     ` Rafael J. Wysocki
  2021-09-06 15:38                     ` Vincent Donnefort
  0 siblings, 2 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2021-09-06 15:16 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Peter Zijlstra, Rafael J. Wysocki,
	Viresh Kumar, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Lukasz Luba, Dietmar Eggemann,
	Matthias Kaehlcke

On Mon, Sep 6, 2021 at 2:53 PM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> On Mon, Sep 06, 2021 at 02:08:36PM +0200, Rafael J. Wysocki wrote:
> > On Mon, Sep 6, 2021 at 10:17 AM Vincent Donnefort
> > <vincent.donnefort@arm.com> wrote:
> > >
> > > [...]
> > >
> > > > > >
> > > > > > Moreover, if only efficient frequencies are to be used, RELATION_L
> > > > > > needs to return min(policy->max, the closest efficient frequency equal
> > > > > > to or above the target).
> > > > >
> > > > > You mean, never returning an inefficient frequency, unless there are no
> > > > > efficient ones in the range policy->min policy->max ?
> > > >
> > > > No, that's not what I mean.
> > > >
> > > > First note that the target here is clamped between the policy min and
> > > > max.  Also bear in mind that each of them is a frequency from the
> > > > table, either efficient or inefficient.
> > > >
> > > > In the first step, search through the efficient frequencies only.
> > > > That will return you something at or above the target.  If it is at
> > > > the target, you're done.  If it is above the target, it may be either
> > > > within or above the policy max.  If it is within the policy max,
> > > > you're done.  If it is above the policy max, you need to search
> > > > through the inefficient frequencies between the target and the policy
> > > > max (and you know that there is at least one - the policy max itself).
> > > >
> > > > So what I said previously wasn't particularly precise, sorry about that.
> > >
> > > I might have missed something but it seems equivalent to what's currently done:
> > >
> > > Find the appropriate frequency, if inefficient go to the efficient one, if
> > > above policy->max return the original inefficient frequency.
> >
> > It may or may not be equivalent depending on what the efficient one is.
> >
> > And what is there now doesn't work for RELATION_H if I'm not mistaken.
>
> With the current approach, RELATION_H would find the best frequency and then
> resolve it to an efficient one if possible. The efficient one might overshoot
> the target_freq, but isn't it a good thing ?

No, it cannot.

>  - If we consider only the efficient freqs in a first place, there's a risk we
>    would return a frequency way smaller than the actual request.
>
>  - What are the gain in capping RELATION_H to the target_freq if we can find a
>    frequency higher than the request that wouldn't use more energy than the
>    target?

There may be a power constraint, so RELATION_H should never go above the target.

> RELATION_H is used in two DVFS governors: conservative and ondemand:
>
>   - Ondemand is using RELATION_H in the context of powerbias. It seems a usecase
>     where it is not a problem to overshoot RELATION_H.
>
>   - Conservative is using RELATION_H when the load is increasing. Same as
>     ondemand, I don't think we would have any gain by not overshooting the
>     RELATION_H target.

Well, let me repeat: the performance governor uses RELATION_H, so if
you allow it to go above the policy limit, it will stay there forever.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-06 15:16                   ` Rafael J. Wysocki
@ 2021-09-06 15:32                     ` Rafael J. Wysocki
  2021-09-06 15:38                     ` Vincent Donnefort
  1 sibling, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2021-09-06 15:32 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar, Vincent Guittot,
	Quentin Perret, Linux PM, Ionela Voinescu, Lukasz Luba,
	Dietmar Eggemann, Matthias Kaehlcke

On Mon, Sep 6, 2021 at 5:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Sep 6, 2021 at 2:53 PM Vincent Donnefort
> <vincent.donnefort@arm.com> wrote:
> >
> > On Mon, Sep 06, 2021 at 02:08:36PM +0200, Rafael J. Wysocki wrote:
> > > On Mon, Sep 6, 2021 at 10:17 AM Vincent Donnefort
> > > <vincent.donnefort@arm.com> wrote:
> > > >
> > > > [...]
> > > >
> > > > > > >
> > > > > > > Moreover, if only efficient frequencies are to be used, RELATION_L
> > > > > > > needs to return min(policy->max, the closest efficient frequency equal
> > > > > > > to or above the target).
> > > > > >
> > > > > > You mean, never returning an inefficient frequency, unless there are no
> > > > > > efficient ones in the range policy->min policy->max ?
> > > > >
> > > > > No, that's not what I mean.
> > > > >
> > > > > First note that the target here is clamped between the policy min and
> > > > > max.  Also bear in mind that each of them is a frequency from the
> > > > > table, either efficient or inefficient.
> > > > >
> > > > > In the first step, search through the efficient frequencies only.
> > > > > That will return you something at or above the target.  If it is at
> > > > > the target, you're done.  If it is above the target, it may be either
> > > > > within or above the policy max.  If it is within the policy max,
> > > > > you're done.  If it is above the policy max, you need to search
> > > > > through the inefficient frequencies between the target and the policy
> > > > > max (and you know that there is at least one - the policy max itself).
> > > > >
> > > > > So what I said previously wasn't particularly precise, sorry about that.
> > > >
> > > > I might have missed something but it seems equivalent to what's currently done:
> > > >
> > > > Find the appropriate frequency, if inefficient go to the efficient one, if
> > > > above policy->max return the original inefficient frequency.
> > >
> > > It may or may not be equivalent depending on what the efficient one is.
> > >
> > > And what is there now doesn't work for RELATION_H if I'm not mistaken.
> >
> > With the current approach, RELATION_H would find the best frequency and then
> > resolve it to an efficient one if possible. The efficient one might overshoot
> > the target_freq, but isn't it a good thing ?
>
> No, it cannot.
>
> >  - If we consider only the efficient freqs in a first place, there's a risk we
> >    would return a frequency way smaller than the actual request.
> >
> >  - What are the gain in capping RELATION_H to the target_freq if we can find a
> >    frequency higher than the request that wouldn't use more energy than the
> >    target?
>
> There may be a power constraint, so RELATION_H should never go above the target.
>
> > RELATION_H is used in two DVFS governors: conservative and ondemand:
> >
> >   - Ondemand is using RELATION_H in the context of powerbias. It seems a usecase
> >     where it is not a problem to overshoot RELATION_H.
> >
> >   - Conservative is using RELATION_H when the load is increasing. Same as
> >     ondemand, I don't think we would have any gain by not overshooting the
> >     RELATION_H target.
>
> Well, let me repeat: the performance governor uses RELATION_H, so if
> you allow it to go above the policy limit, it will stay there forever.

You'll probably say that the we'll always use the inefficient
frequencies with the performance governor, so this doesn't matter,
which is fair enough, but let me say that I'm not particularly keen on
storing the efficient frequency intex in every row of the table for
every CPU in the system especially on systems where all frequencies
are efficient (and these are the majority AFAICS), because this is
just plain confusing and it causes you to wonder why it may be a good
idea to let the target to be overshot in the RELATION_H case which is
even more confusing.

Can we just perhaps avoid this possibly premature optimization and get
back to it when it has been demonstrated to be necessary?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-06 15:16                   ` Rafael J. Wysocki
  2021-09-06 15:32                     ` Rafael J. Wysocki
@ 2021-09-06 15:38                     ` Vincent Donnefort
  2021-09-06 16:04                       ` Rafael J. Wysocki
  1 sibling, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2021-09-06 15:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar, Vincent Guittot,
	Quentin Perret, Linux PM, Ionela Voinescu, Lukasz Luba,
	Dietmar Eggemann, Matthias Kaehlcke

On Mon, Sep 06, 2021 at 05:16:32PM +0200, Rafael J. Wysocki wrote:
> On Mon, Sep 6, 2021 at 2:53 PM Vincent Donnefort
> <vincent.donnefort@arm.com> wrote:
> >
> > On Mon, Sep 06, 2021 at 02:08:36PM +0200, Rafael J. Wysocki wrote:
> > > On Mon, Sep 6, 2021 at 10:17 AM Vincent Donnefort
> > > <vincent.donnefort@arm.com> wrote:
> > > >
> > > > [...]
> > > >
> > > > > > >
> > > > > > > Moreover, if only efficient frequencies are to be used, RELATION_L
> > > > > > > needs to return min(policy->max, the closest efficient frequency equal
> > > > > > > to or above the target).
> > > > > >
> > > > > > You mean, never returning an inefficient frequency, unless there are no
> > > > > > efficient ones in the range policy->min policy->max ?
> > > > >
> > > > > No, that's not what I mean.
> > > > >
> > > > > First note that the target here is clamped between the policy min and
> > > > > max.  Also bear in mind that each of them is a frequency from the
> > > > > table, either efficient or inefficient.
> > > > >
> > > > > In the first step, search through the efficient frequencies only.
> > > > > That will return you something at or above the target.  If it is at
> > > > > the target, you're done.  If it is above the target, it may be either
> > > > > within or above the policy max.  If it is within the policy max,
> > > > > you're done.  If it is above the policy max, you need to search
> > > > > through the inefficient frequencies between the target and the policy
> > > > > max (and you know that there is at least one - the policy max itself).
> > > > >
> > > > > So what I said previously wasn't particularly precise, sorry about that.
> > > >
> > > > I might have missed something but it seems equivalent to what's currently done:
> > > >
> > > > Find the appropriate frequency, if inefficient go to the efficient one, if
> > > > above policy->max return the original inefficient frequency.
> > >
> > > It may or may not be equivalent depending on what the efficient one is.
> > >
> > > And what is there now doesn't work for RELATION_H if I'm not mistaken.
> >
> > With the current approach, RELATION_H would find the best frequency and then
> > resolve it to an efficient one if possible. The efficient one might overshoot
> > the target_freq, but isn't it a good thing ?
> 
> No, it cannot.
> 
> >  - If we consider only the efficient freqs in a first place, there's a risk we
> >    would return a frequency way smaller than the actual request.
> >
> >  - What are the gain in capping RELATION_H to the target_freq if we can find a
> >    frequency higher than the request that wouldn't use more energy than the
> >    target?
> 
> There may be a power constraint, so RELATION_H should never go above the target.

Fair enough, let's not overshoot RELATION_H's target and only return (if
possible) an efficient freq <= target_freq

> 
> > RELATION_H is used in two DVFS governors: conservative and ondemand:
> >
> >   - Ondemand is using RELATION_H in the context of powerbias. It seems a usecase
> >     where it is not a problem to overshoot RELATION_H.
> >
> >   - Conservative is using RELATION_H when the load is increasing. Same as
> >     ondemand, I don't think we would have any gain by not overshooting the
> >     RELATION_H target.
> 
> Well, let me repeat: the performance governor uses RELATION_H, so if
> you allow it to go above the policy limit, it will stay there forever.

The performance governor isn't concerned by this change, I was only talking
about conservative and ondemand. Also overshooting policy->max wouldn't have
ever happened.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] cpufreq: Skip inefficient frequencies
  2021-09-06 15:38                     ` Vincent Donnefort
@ 2021-09-06 16:04                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2021-09-06 16:04 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Peter Zijlstra, Rafael J. Wysocki,
	Viresh Kumar, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Lukasz Luba, Dietmar Eggemann,
	Matthias Kaehlcke

On Mon, Sep 6, 2021 at 5:38 PM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> On Mon, Sep 06, 2021 at 05:16:32PM +0200, Rafael J. Wysocki wrote:
> > On Mon, Sep 6, 2021 at 2:53 PM Vincent Donnefort
> > <vincent.donnefort@arm.com> wrote:
> > >
> > > On Mon, Sep 06, 2021 at 02:08:36PM +0200, Rafael J. Wysocki wrote:
> > > > On Mon, Sep 6, 2021 at 10:17 AM Vincent Donnefort
> > > > <vincent.donnefort@arm.com> wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > > > > >
> > > > > > > > Moreover, if only efficient frequencies are to be used, RELATION_L
> > > > > > > > needs to return min(policy->max, the closest efficient frequency equal
> > > > > > > > to or above the target).
> > > > > > >
> > > > > > > You mean, never returning an inefficient frequency, unless there are no
> > > > > > > efficient ones in the range policy->min policy->max ?
> > > > > >
> > > > > > No, that's not what I mean.
> > > > > >
> > > > > > First note that the target here is clamped between the policy min and
> > > > > > max.  Also bear in mind that each of them is a frequency from the
> > > > > > table, either efficient or inefficient.
> > > > > >
> > > > > > In the first step, search through the efficient frequencies only.
> > > > > > That will return you something at or above the target.  If it is at
> > > > > > the target, you're done.  If it is above the target, it may be either
> > > > > > within or above the policy max.  If it is within the policy max,
> > > > > > you're done.  If it is above the policy max, you need to search
> > > > > > through the inefficient frequencies between the target and the policy
> > > > > > max (and you know that there is at least one - the policy max itself).
> > > > > >
> > > > > > So what I said previously wasn't particularly precise, sorry about that.
> > > > >
> > > > > I might have missed something but it seems equivalent to what's currently done:
> > > > >
> > > > > Find the appropriate frequency, if inefficient go to the efficient one, if
> > > > > above policy->max return the original inefficient frequency.
> > > >
> > > > It may or may not be equivalent depending on what the efficient one is.
> > > >
> > > > And what is there now doesn't work for RELATION_H if I'm not mistaken.
> > >
> > > With the current approach, RELATION_H would find the best frequency and then
> > > resolve it to an efficient one if possible. The efficient one might overshoot
> > > the target_freq, but isn't it a good thing ?
> >
> > No, it cannot.
> >
> > >  - If we consider only the efficient freqs in a first place, there's a risk we
> > >    would return a frequency way smaller than the actual request.
> > >
> > >  - What are the gain in capping RELATION_H to the target_freq if we can find a
> > >    frequency higher than the request that wouldn't use more energy than the
> > >    target?
> >
> > There may be a power constraint, so RELATION_H should never go above the target.
>
> Fair enough, let's not overshoot RELATION_H's target and only return (if
> possible) an efficient freq <= target_freq

Thanks!

Please also note that there is a similar problem in the current patch
set for RELATION_C.

Namely, say that there are four frequencies, f_a < f_b < f_c < f_d,
such that f_b - f_a == f_d - f_c == (f_c - f_b) / 2 and f_a, f_d are
efficient.  If the efficient frequencies are preferred and the target
falls between f_b and f_c, the current series will always return f_d
(if I understand it correctly), whereas if the search is limited to
the efficient frequencies, f_a will often be closer to the target than
f_d (50% of the time under random distribution of target values).

> >
> > > RELATION_H is used in two DVFS governors: conservative and ondemand:
> > >
> > >   - Ondemand is using RELATION_H in the context of powerbias. It seems a usecase
> > >     where it is not a problem to overshoot RELATION_H.
> > >
> > >   - Conservative is using RELATION_H when the load is increasing. Same as
> > >     ondemand, I don't think we would have any gain by not overshooting the
> > >     RELATION_H target.
> >
> > Well, let me repeat: the performance governor uses RELATION_H, so if
> > you allow it to go above the policy limit, it will stay there forever.
>
> The performance governor isn't concerned by this change, I was only talking
> about conservative and ondemand. Also overshooting policy->max wouldn't have
> ever happened.

Right.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-09-06 16:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31 10:24 [PATCH v6 0/7] Inefficient OPPs Vincent Donnefort
2021-08-31 10:24 ` [PATCH v6 1/7] PM / EM: Fix inefficient states detection Vincent Donnefort
2021-08-31 10:24 ` [PATCH v6 2/7] PM / EM: Mark inefficient states Vincent Donnefort
2021-08-31 10:24 ` [PATCH v6 3/7] PM / EM: Extend em_perf_domain with a flag field Vincent Donnefort
2021-08-31 10:24 ` [PATCH v6 4/7] PM / EM: Allow skipping inefficient states Vincent Donnefort
2021-08-31 10:24 ` [PATCH v6 5/7] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
2021-09-01  4:55   ` Viresh Kumar
2021-09-01 17:25   ` Rafael J. Wysocki
2021-08-31 10:24 ` [PATCH v6 6/7] cpufreq: Skip " Vincent Donnefort
2021-09-01  5:30   ` Viresh Kumar
2021-09-01 18:13   ` Rafael J. Wysocki
2021-09-02 10:50     ` Vincent Donnefort
2021-09-02 13:09       ` Rafael J. Wysocki
2021-09-02 13:49         ` Vincent Donnefort
2021-09-03 18:14           ` Rafael J. Wysocki
2021-09-06  8:17             ` Vincent Donnefort
2021-09-06 12:08               ` Rafael J. Wysocki
2021-09-06 12:53                 ` Vincent Donnefort
2021-09-06 15:16                   ` Rafael J. Wysocki
2021-09-06 15:32                     ` Rafael J. Wysocki
2021-09-06 15:38                     ` Vincent Donnefort
2021-09-06 16:04                       ` Rafael J. Wysocki
2021-08-31 10:24 ` [PATCH v6 7/7] PM / EM: Mark inefficiencies in CPUFreq Vincent Donnefort
2021-09-01  5:31   ` Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.