All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] EM / PM: Inefficient OPPs
@ 2021-06-04 11:05 Vincent Donnefort
  2021-06-04 11:05 ` [PATCH v3 1/6] PM / EM: Fix inefficient states detection Vincent Donnefort
                   ` (5 more replies)
  0 siblings, 6 replies; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-04 11:05 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann,
	Vincent Donnefort

We (Power team in Arm) are working with an experimental kernel for the
Google's Pixel4 to evaluate and improve the current mainline performance
and energy consumption on a real life device with Android.

The SD855 SoC found in this phone has several OPPs that are inefficient.
I.e. despite a lower frequency, they have a greater cost. (That cost being 
fmax * OPP power / OPP freq). This issue is twofold. First of course,
running a specific workload at an inefficient OPP is counterproductive
since it wastes wasting energy. But also, inefficient OPPs make a
performance domain less appealing for task placement than it really is.

We evaluated the change presented here by running 30 iterations of Android 
PCMark "Work 2.0 Performance". While we did not see any statistically
significant performance impact, this change allowed to drastically improve 
the idle time residency.   
 

                           |   Running   |  WFI [1]  |    Idle   |
   ------------------------+-------------+-----------+-----------+
   Little cluster (4 CPUs) |    -0.35%   |   +0.35%  |   +0.79%  |
   ------------------------+-------------+-----------+-----------+
   Medium cluster (3 CPUs) |    -6.3%    |    -18%   |    +12%   |
   ------------------------+-------------+-----------+-----------+
   Big cluster    (1 CPU)  |    -6.4%    |    -6.5%  |    +2.8%  |
   ------------------------+-------------+-----------+-----------+

On the SD855, the inefficient OPPs are found on the little cluster. By
removing them from the Energy Model, we make the most efficient CPUs more
appealing for task placement, helping to reduce the running time for the
medium and big CPUs. Increasing idle time is crucial for this platform due 
to the substantial energy cost differences among the clusters. Also,
despite not appearing in the statistics (the idle driver used here doesn't 
report it), we can speculate that we also improve the cluster idle time.

[1] WFI: Wait for interrupt.


Changelog since v2:
  - Add separated support for inefficiencies into CPUFreq.
  - Collect Reviewed-by for the first patch.

Changelog since v1:
  - Remove the Look-up table as the numbers weren't strong enough to justify the
    implementation.
  - Split the patch.

Vincent Donnefort (6):
  PM / EM: Fix inefficient states detection
  PM / EM: Mark inefficient states
  cpufreq: Add an interface to mark inefficient frequencies
  cpufreq: Skip inefficient frequencies in cpufreq_driver_resolve_freq()
  cpufreq: Mark inefficient frequencies using the Energy Model
  PM / EM: Skip inefficient states

 drivers/cpufreq/cpufreq.c    | 36 +++++++++++++++++++++++++++++
 drivers/cpufreq/freq_table.c | 55 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/cpufreq.h      | 15 +++++++++++-
 include/linux/energy_model.h | 50 +++++++++++++++++++++++++++++++++++-----
 kernel/power/energy_model.c  | 27 +++++++++-------------
 5 files changed, 159 insertions(+), 24 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v3 1/6] PM / EM: Fix inefficient states detection
  2021-06-04 11:05 [PATCH v3 0/6] EM / PM: Inefficient OPPs Vincent Donnefort
@ 2021-06-04 11:05 ` Vincent Donnefort
  2021-06-04 18:09   ` Matthias Kaehlcke
  2021-06-04 11:05 ` [PATCH v3 2/6] PM / EM: Mark inefficient states Vincent Donnefort
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-04 11:05 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann,
	Vincent Donnefort

Currently, a debug message is printed if an inefficient state is detected
in the Energy Model. Unfortunately, it won't detect if the first state is
inefficient or if two successive states are. Fix this behavior.

Fixes: 27871f7a (PM: Introduce an Energy Model management framework)
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 0c620eb..c4871a8 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -107,8 +107,7 @@ static void em_debug_remove_pd(struct device *dev) {}
 static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 				int nr_states, struct em_data_callback *cb)
 {
-	unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
-	unsigned long power, freq, prev_freq = 0;
+	unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX;
 	struct em_perf_state *table;
 	int i, ret;
 	u64 fmax;
@@ -153,25 +152,19 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 
 		table[i].power = power;
 		table[i].frequency = prev_freq = freq;
-
-		/*
-		 * The hertz/watts efficiency ratio should decrease as the
-		 * frequency grows on sane platforms. But this isn't always
-		 * true in practice so warn the user if a higher OPP is more
-		 * power efficient than a lower one.
-		 */
-		opp_eff = freq / power;
-		if (opp_eff >= prev_opp_eff)
-			dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
-					i, i - 1);
-		prev_opp_eff = opp_eff;
 	}
 
 	/* Compute the cost of each performance state. */
 	fmax = (u64) table[nr_states - 1].frequency;
-	for (i = 0; i < nr_states; i++) {
+	for (i = nr_states - 1; i >= 0; i--) {
 		table[i].cost = div64_u64(fmax * table[i].power,
 					  table[i].frequency);
+		if (table[i].cost >= prev_cost) {
+			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
+				table[i].frequency);
+		} else {
+			prev_cost = table[i].cost;
+		}
 	}
 
 	pd->table = table;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 2/6] PM / EM: Mark inefficient states
  2021-06-04 11:05 [PATCH v3 0/6] EM / PM: Inefficient OPPs Vincent Donnefort
  2021-06-04 11:05 ` [PATCH v3 1/6] PM / EM: Fix inefficient states detection Vincent Donnefort
@ 2021-06-04 11:05 ` Vincent Donnefort
  2021-06-04 18:12   ` Matthias Kaehlcke
  2021-06-04 11:05 ` [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-04 11:05 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann,
	Vincent Donnefort

Some SoCs, such as the sd855 have OPPs within the same performance domain,
whose cost is higher than others with a higher frequency. Even though
those OPPs are interesting from a cooling perspective, it makes no sense
to use them when the device can run at full capacity. Those OPPs handicap
the performance domain, when choosing the most energy-efficient CPU and
are wasting energy. They are inefficient.

Hence, add support for such OPPs to the Energy Model. The table can now
be read skipping inefficient performance states (and by extension,
inefficient OPPs).

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 757fc60..2531325 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -17,13 +17,25 @@
  *		device). It can be a total power: static and dynamic.
  * @cost:	The cost coefficient associated with this level, used during
  *		energy calculation. Equal to: power * max_frequency / frequency
+ * @flags:	see "em_perf_state flags" description below.
  */
 struct em_perf_state {
 	unsigned long frequency;
 	unsigned long power;
 	unsigned long cost;
+	unsigned long flags;
 };
 
+/*
+ * em_perf_state flags:
+ *
+ * EM_PERF_STATE_INEFFICIENT: The performance state is inefficient. There is
+ * in this em_perf_domain, another performance state with a higher frequency
+ * but a lower or equal power cost. Such inefficient states are ignored when
+ * using em_pd_get_efficient_*() functions.
+ */
+#define EM_PERF_STATE_INEFFICIENT BIT(0)
+
 /**
  * em_perf_domain - Performance domain
  * @table:		List of performance states, in ascending order
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index c4871a8..30ab73a 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -2,7 +2,7 @@
 /*
  * Energy Model of devices
  *
- * Copyright (c) 2018-2020, Arm ltd.
+ * Copyright (c) 2018-2021, Arm ltd.
  * Written by: Quentin Perret, Arm ltd.
  * Improvements provided by: Lukasz Luba, Arm ltd.
  */
@@ -42,6 +42,7 @@ static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
 	debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
 	debugfs_create_ulong("power", 0444, d, &ps->power);
 	debugfs_create_ulong("cost", 0444, d, &ps->cost);
+	debugfs_create_ulong("inefficient", 0444, d, &ps->flags);
 }
 
 static int em_debug_cpus_show(struct seq_file *s, void *unused)
@@ -160,6 +161,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
 		table[i].cost = div64_u64(fmax * table[i].power,
 					  table[i].frequency);
 		if (table[i].cost >= prev_cost) {
+			table[i].flags = EM_PERF_STATE_INEFFICIENT;
 			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
 				table[i].frequency);
 		} else {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-04 11:05 [PATCH v3 0/6] EM / PM: Inefficient OPPs Vincent Donnefort
  2021-06-04 11:05 ` [PATCH v3 1/6] PM / EM: Fix inefficient states detection Vincent Donnefort
  2021-06-04 11:05 ` [PATCH v3 2/6] PM / EM: Mark inefficient states Vincent Donnefort
@ 2021-06-04 11:05 ` Vincent Donnefort
  2021-06-04 18:19   ` Matthias Kaehlcke
                     ` (2 more replies)
  2021-06-04 11:05 ` [PATCH v3 4/6] cpufreq: Skip inefficient frequencies in cpufreq_driver_resolve_freq() Vincent Donnefort
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-04 11:05 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann,
	Vincent Donnefort

Some SoCs such as the sd855 have OPPs within the same policy whose cost is
higher than others with a higher frequency. Those OPPs are inefficients and
it might be interesting for a governor to not use them.

Adding a flag, CPUFREQ_INEFFICIENT_FREQ, to mark such OPPs into the
frequency table, as well as a new cpufreq_frequency_table member
"efficient". This new member will allow a governor to quickly resolve an
inefficient frequency to an efficient one.

Efficient OPPs point to themselves. Governors must also ensure that the
efficiency resolution does not break the policy maximum.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index 67e56cf..0756d7d6 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -351,6 +351,53 @@ static int set_freq_table_sorted(struct cpufreq_policy *policy)
 	return 0;
 }
 
+static void set_freq_table_efficiencies(struct cpufreq_policy *policy)
+{
+	struct cpufreq_frequency_table *pos, *table = policy->freq_table;
+	enum cpufreq_table_sorting sort = policy->freq_table_sorted;
+	int efficient, idx;
+
+	/* Not supported */
+	if (sort == CPUFREQ_TABLE_UNSORTED) {
+		cpufreq_for_each_entry_idx(pos, table, idx)
+			pos->efficient = idx;
+		return;
+	}
+
+	/* The highest frequency is always efficient */
+	cpufreq_for_each_entry_idx(pos, table, idx) {
+		if (pos->frequency == CPUFREQ_ENTRY_INVALID)
+			continue;
+
+		efficient = idx;
+
+		if (sort == CPUFREQ_TABLE_SORTED_DESCENDING)
+			break;
+	}
+
+	for (;;) {
+		pos = &table[idx];
+
+		if (pos->frequency == CPUFREQ_ENTRY_INVALID)
+			continue;
+
+		if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
+			pos->efficient = efficient;
+		} else {
+			pos->efficient = idx;
+			efficient = idx;
+		}
+
+		if (sort == CPUFREQ_TABLE_SORTED_ASCENDING) {
+			if (--idx < 0)
+				break;
+		} else {
+			if (table[++idx].frequency == CPUFREQ_TABLE_END)
+				break;
+		}
+	}
+}
+
 int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
 {
 	int ret;
@@ -362,7 +409,13 @@ int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
 	if (ret)
 		return ret;
 
-	return set_freq_table_sorted(policy);
+	ret = set_freq_table_sorted(policy);
+	if (ret)
+		return ret;
+
+	set_freq_table_efficiencies(policy);
+
+	return ret;
 }
 
 MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 353969c..d10784c 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -666,13 +666,15 @@ struct governor_attr {
 #define CPUFREQ_ENTRY_INVALID	~0u
 #define CPUFREQ_TABLE_END	~1u
 /* Special Values of .flags field */
-#define CPUFREQ_BOOST_FREQ	(1 << 0)
+#define CPUFREQ_BOOST_FREQ	 (1 << 0)
+#define CPUFREQ_INEFFICIENT_FREQ (1 << 1)
 
 struct cpufreq_frequency_table {
 	unsigned int	flags;
 	unsigned int	driver_data; /* driver specific data, not used by core */
 	unsigned int	frequency; /* kHz - doesn't need to be in ascending
 				    * order */
+	unsigned int	efficient; /* idx of an efficient frequency */
 };
 
 #if defined(CONFIG_CPU_FREQ) && defined(CONFIG_PM_OPP)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 4/6] cpufreq: Skip inefficient frequencies in cpufreq_driver_resolve_freq()
  2021-06-04 11:05 [PATCH v3 0/6] EM / PM: Inefficient OPPs Vincent Donnefort
                   ` (2 preceding siblings ...)
  2021-06-04 11:05 ` [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
@ 2021-06-04 11:05 ` Vincent Donnefort
  2021-06-04 18:25   ` Matthias Kaehlcke
  2021-06-04 11:06 ` [PATCH v3 5/6] cpufreq: Mark inefficient frequencies using the Energy Model Vincent Donnefort
  2021-06-04 11:06 ` [PATCH v3 6/6] PM / EM: Skip inefficient states Vincent Donnefort
  5 siblings, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-04 11:05 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann,
	Vincent Donnefort

Avoid using frequencies marked as inefficient. This change affects
schedutil, which is the only in-tree governor using the function
cpufreq_driver_resolve_freq().

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 802abc9..7431f40a 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -546,6 +546,10 @@ unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
 
 		idx = cpufreq_frequency_table_target(policy, target_freq,
 						     CPUFREQ_RELATION_L);
+
+		/* Replace the target with an efficient one */
+		idx = cpufreq_frequency_find_efficient(policy, idx);
+
 		policy->cached_resolved_idx = idx;
 		return policy->freq_table[idx].frequency;
 	}
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index d10784c..0ca4c9a 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -994,6 +994,17 @@ static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
 	}
 }
 
+static inline unsigned int
+cpufreq_frequency_find_efficient(struct cpufreq_policy *policy,
+				 unsigned int idx)
+{
+	struct cpufreq_frequency_table *table = policy->freq_table;
+	unsigned int efficient_idx = table[idx].efficient;
+
+	return table[efficient_idx].frequency <= policy->max ? efficient_idx :
+							       idx;
+}
+
 static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy *policy)
 {
 	struct cpufreq_frequency_table *pos;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 5/6] cpufreq: Mark inefficient frequencies using the Energy Model
  2021-06-04 11:05 [PATCH v3 0/6] EM / PM: Inefficient OPPs Vincent Donnefort
                   ` (3 preceding siblings ...)
  2021-06-04 11:05 ` [PATCH v3 4/6] cpufreq: Skip inefficient frequencies in cpufreq_driver_resolve_freq() Vincent Donnefort
@ 2021-06-04 11:06 ` Vincent Donnefort
  2021-06-04 18:35   ` Matthias Kaehlcke
  2021-06-04 11:06 ` [PATCH v3 6/6] PM / EM: Skip inefficient states Vincent Donnefort
  5 siblings, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-04 11:06 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann,
	Vincent Donnefort

The Energy Model has a 1:1 mapping between OPPs and performance states
(em_perf_state). We can then read which states are inefficient and use this
information to mark the cpufreq table.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 7431f40a..34d344d 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -19,6 +19,7 @@
 #include <linux/cpu_cooling.h>
 #include <linux/delay.h>
 #include <linux/device.h>
+#include <linux/energy_model.h>
 #include <linux/init.h>
 #include <linux/kernel_stat.h>
 #include <linux/module.h>
@@ -1317,6 +1318,31 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
 	kfree(policy);
 }
 
+static void cpufreq_inefficiencies_from_em(struct cpufreq_policy *policy,
+					   struct em_perf_domain *em_pd)
+{
+	struct cpufreq_frequency_table *pos, *table = policy->freq_table;
+	struct em_perf_state *em_table;
+	int i;
+
+	if (!em_pd)
+		return;
+
+	em_table = em_pd->table;
+
+	for (i = 0; i < em_pd->nr_perf_states; i++) {
+		if (!(em_table[i].flags & EM_PERF_STATE_INEFFICIENT))
+			continue;
+
+		cpufreq_for_each_valid_entry(pos, table) {
+			if (pos->frequency == em_table[i].frequency) {
+				pos->flags |= CPUFREQ_INEFFICIENT_FREQ;
+				break;
+			}
+		}
+	}
+}
+
 static int cpufreq_online(unsigned int cpu)
 {
 	struct cpufreq_policy *policy;
@@ -1371,6 +1397,12 @@ static int cpufreq_online(unsigned int cpu)
 			goto out_free_policy;
 		}
 
+		/*
+		 * Sync potential inefficiencies with an Energy Model that the
+		 * driver might have registered.
+		 */
+		cpufreq_inefficiencies_from_em(policy, em_cpu_get(cpu));
+
 		ret = cpufreq_table_validate_and_sort(policy);
 		if (ret)
 			goto out_exit_policy;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 6/6] PM / EM: Skip inefficient states
  2021-06-04 11:05 [PATCH v3 0/6] EM / PM: Inefficient OPPs Vincent Donnefort
                   ` (4 preceding siblings ...)
  2021-06-04 11:06 ` [PATCH v3 5/6] cpufreq: Mark inefficient frequencies using the Energy Model Vincent Donnefort
@ 2021-06-04 11:06 ` Vincent Donnefort
  2021-06-04 18:49   ` Matthias Kaehlcke
  5 siblings, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-04 11:06 UTC (permalink / raw)
  To: peterz, rjw, viresh.kumar, vincent.guittot, qperret
  Cc: linux-pm, ionela.voinescu, lukasz.luba, dietmar.eggemann,
	Vincent Donnefort

Now that Schedutil can leverage the inefficiency identification from the
Energy Model, we can skip inefficient states when estimating the energy
consumption.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 2531325..d20f6afe 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -98,6 +98,36 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 void em_dev_unregister_perf_domain(struct device *dev);
 
 /**
+ * em_pd_get_efficient_state() - Get an efficient performance state from the EM
+ * @pd   : Performance domain for which we want an efficient frequency
+ * @freq : Frequency to map with the EM
+ *
+ * It is called from the scheduler code quite frequently and as a consequence
+ * doesn't implement any check.
+ *
+ * Return: An efficient performance state, high enough to meet @freq
+ * requirement.
+ */
+static inline
+struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd,
+						unsigned long freq)
+{
+	struct em_perf_state *ps;
+	int i;
+
+	for (i = 0; i < pd->nr_perf_states; i++) {
+		ps = &pd->table[i];
+		if (ps->frequency >= freq) {
+			if (ps->flags & EM_PERF_STATE_INEFFICIENT)
+				continue;
+			break;
+		}
+	}
+
+	return ps;
+}
+
+/**
  * em_cpu_energy() - Estimates the energy consumed by the CPUs of a
 		performance domain
  * @pd		: performance domain for which energy has to be estimated
@@ -116,7 +146,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 {
 	unsigned long freq, scale_cpu;
 	struct em_perf_state *ps;
-	int i, cpu;
+	int cpu;
 
 	if (!sum_util)
 		return 0;
@@ -135,11 +165,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 * Find the lowest performance state of the Energy Model above the
 	 * requested frequency.
 	 */
-	for (i = 0; i < pd->nr_perf_states; i++) {
-		ps = &pd->table[i];
-		if (ps->frequency >= freq)
-			break;
-	}
+	ps = em_pd_get_efficient_state(pd, freq);
 
 	/*
 	 * The capacity of a CPU in the domain at the performance state (ps)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 1/6] PM / EM: Fix inefficient states detection
  2021-06-04 11:05 ` [PATCH v3 1/6] PM / EM: Fix inefficient states detection Vincent Donnefort
@ 2021-06-04 18:09   ` Matthias Kaehlcke
  0 siblings, 0 replies; 45+ messages in thread
From: Matthias Kaehlcke @ 2021-06-04 18:09 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, viresh.kumar, vincent.guittot, qperret, linux-pm,
	ionela.voinescu, lukasz.luba, dietmar.eggemann

On Fri, Jun 04, 2021 at 12:05:56PM +0100, Vincent Donnefort wrote:
> Currently, a debug message is printed if an inefficient state is detected
> in the Energy Model. Unfortunately, it won't detect if the first state is
> inefficient or if two successive states are. Fix this behavior.
> 
> Fixes: 27871f7a (PM: Introduce an Energy Model management framework)
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> Reviewed-by: Quentin Perret <qperret@google.com>
> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
> 
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 0c620eb..c4871a8 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -107,8 +107,7 @@ static void em_debug_remove_pd(struct device *dev) {}
>  static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
>  				int nr_states, struct em_data_callback *cb)
>  {
> -	unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
> -	unsigned long power, freq, prev_freq = 0;
> +	unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX;
>  	struct em_perf_state *table;
>  	int i, ret;
>  	u64 fmax;
> @@ -153,25 +152,19 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
>  
>  		table[i].power = power;
>  		table[i].frequency = prev_freq = freq;
> -
> -		/*
> -		 * The hertz/watts efficiency ratio should decrease as the
> -		 * frequency grows on sane platforms. But this isn't always
> -		 * true in practice so warn the user if a higher OPP is more
> -		 * power efficient than a lower one.
> -		 */
> -		opp_eff = freq / power;
> -		if (opp_eff >= prev_opp_eff)
> -			dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
> -					i, i - 1);
> -		prev_opp_eff = opp_eff;
>  	}
>  
>  	/* Compute the cost of each performance state. */
>  	fmax = (u64) table[nr_states - 1].frequency;
> -	for (i = 0; i < nr_states; i++) {
> +	for (i = nr_states - 1; i >= 0; i--) {
>  		table[i].cost = div64_u64(fmax * table[i].power,
>  					  table[i].frequency);
> +		if (table[i].cost >= prev_cost) {
> +			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
> +				table[i].frequency);
> +		} else {
> +			prev_cost = table[i].cost;
> +		}

nit: curly braces aren't needed, especially if you change the 'dev_dbg'
statement to be a single line.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 2/6] PM / EM: Mark inefficient states
  2021-06-04 11:05 ` [PATCH v3 2/6] PM / EM: Mark inefficient states Vincent Donnefort
@ 2021-06-04 18:12   ` Matthias Kaehlcke
  0 siblings, 0 replies; 45+ messages in thread
From: Matthias Kaehlcke @ 2021-06-04 18:12 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, viresh.kumar, vincent.guittot, qperret, linux-pm,
	ionela.voinescu, lukasz.luba, dietmar.eggemann

On Fri, Jun 04, 2021 at 12:05:57PM +0100, Vincent Donnefort wrote:
> Some SoCs, such as the sd855 have OPPs within the same performance domain,
> whose cost is higher than others with a higher frequency. Even though
> those OPPs are interesting from a cooling perspective, it makes no sense
> to use them when the device can run at full capacity. Those OPPs handicap
> the performance domain, when choosing the most energy-efficient CPU and
> are wasting energy. They are inefficient.
> 
> Hence, add support for such OPPs to the Energy Model. The table can now
> be read skipping inefficient performance states (and by extension,
> inefficient OPPs).
> 
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 757fc60..2531325 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -17,13 +17,25 @@
>   *		device). It can be a total power: static and dynamic.
>   * @cost:	The cost coefficient associated with this level, used during
>   *		energy calculation. Equal to: power * max_frequency / frequency
> + * @flags:	see "em_perf_state flags" description below.
>   */
>  struct em_perf_state {
>  	unsigned long frequency;
>  	unsigned long power;
>  	unsigned long cost;
> +	unsigned long flags;
>  };
>  
> +/*
> + * em_perf_state flags:
> + *
> + * EM_PERF_STATE_INEFFICIENT: The performance state is inefficient. There is
> + * in this em_perf_domain, another performance state with a higher frequency
> + * but a lower or equal power cost. Such inefficient states are ignored when
> + * using em_pd_get_efficient_*() functions.
> + */
> +#define EM_PERF_STATE_INEFFICIENT BIT(0)
> +
>  /**
>   * em_perf_domain - Performance domain
>   * @table:		List of performance states, in ascending order
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index c4871a8..30ab73a 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -2,7 +2,7 @@
>  /*
>   * Energy Model of devices
>   *
> - * Copyright (c) 2018-2020, Arm ltd.
> + * Copyright (c) 2018-2021, Arm ltd.
>   * Written by: Quentin Perret, Arm ltd.
>   * Improvements provided by: Lukasz Luba, Arm ltd.
>   */
> @@ -42,6 +42,7 @@ static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
>  	debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
>  	debugfs_create_ulong("power", 0444, d, &ps->power);
>  	debugfs_create_ulong("cost", 0444, d, &ps->cost);
> +	debugfs_create_ulong("inefficient", 0444, d, &ps->flags);
>  }
>  
>  static int em_debug_cpus_show(struct seq_file *s, void *unused)
> @@ -160,6 +161,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
>  		table[i].cost = div64_u64(fmax * table[i].power,
>  					  table[i].frequency);
>  		if (table[i].cost >= prev_cost) {
> +			table[i].flags = EM_PERF_STATE_INEFFICIENT;
>  			dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
>  				table[i].frequency);
>  		} else {

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>

Please ignore my nit about the curly braces on patch [1/6].

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-04 11:05 ` [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
@ 2021-06-04 18:19   ` Matthias Kaehlcke
  2021-06-14 13:40     ` Vincent Donnefort
  2021-06-07  5:02   ` Viresh Kumar
  2021-06-14  7:28   ` Viresh Kumar
  2 siblings, 1 reply; 45+ messages in thread
From: Matthias Kaehlcke @ 2021-06-04 18:19 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, viresh.kumar, vincent.guittot, qperret, linux-pm,
	ionela.voinescu, lukasz.luba, dietmar.eggemann

On Fri, Jun 04, 2021 at 12:05:58PM +0100, Vincent Donnefort wrote:
> Some SoCs such as the sd855 have OPPs within the same policy whose cost is
> higher than others with a higher frequency. Those OPPs are inefficients and
> it might be interesting for a governor to not use them.
> 
> Adding a flag, CPUFREQ_INEFFICIENT_FREQ, to mark such OPPs into the
> frequency table, as well as a new cpufreq_frequency_table member
> "efficient". This new member will allow a governor to quickly resolve an
> inefficient frequency to an efficient one.
> 
> Efficient OPPs point to themselves. Governors must also ensure that the
> efficiency resolution does not break the policy maximum.
> 
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
> index 67e56cf..0756d7d6 100644
> --- a/drivers/cpufreq/freq_table.c
> +++ b/drivers/cpufreq/freq_table.c
> @@ -351,6 +351,53 @@ static int set_freq_table_sorted(struct cpufreq_policy *policy)
>  	return 0;
>  }
>  
> +static void set_freq_table_efficiencies(struct cpufreq_policy *policy)
> +{
> +	struct cpufreq_frequency_table *pos, *table = policy->freq_table;
> +	enum cpufreq_table_sorting sort = policy->freq_table_sorted;
> +	int efficient, idx;
> +
> +	/* Not supported */
> +	if (sort == CPUFREQ_TABLE_UNSORTED) {
> +		cpufreq_for_each_entry_idx(pos, table, idx)
> +			pos->efficient = idx;
> +		return;
> +	}
> +
> +	/* The highest frequency is always efficient */
> +	cpufreq_for_each_entry_idx(pos, table, idx) {
> +		if (pos->frequency == CPUFREQ_ENTRY_INVALID)
> +			continue;
> +
> +		efficient = idx;
> +
> +		if (sort == CPUFREQ_TABLE_SORTED_DESCENDING)
> +			break;
> +	}
> +
> +	for (;;) {
> +		pos = &table[idx];
> +
> +		if (pos->frequency == CPUFREQ_ENTRY_INVALID)
> +			continue;

That would result in an infinite loop, you still want to execute the code
at the bottom of the loop which increments/decrements 'idx' and exits the
loop when the end of the table is reached.

> +
> +		if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
> +			pos->efficient = efficient;
> +		} else {
> +			pos->efficient = idx;
> +			efficient = idx;
> +		}
> +
> +		if (sort == CPUFREQ_TABLE_SORTED_ASCENDING) {
> +			if (--idx < 0)
> +				break;
> +		} else {
> +			if (table[++idx].frequency == CPUFREQ_TABLE_END)
> +				break;
> +		}
> +	}
> +}
> +
>  int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
>  {
>  	int ret;
> @@ -362,7 +409,13 @@ int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
>  	if (ret)
>  		return ret;
>  
> -	return set_freq_table_sorted(policy);
> +	ret = set_freq_table_sorted(policy);
> +	if (ret)
> +		return ret;
> +
> +	set_freq_table_efficiencies(policy);
> +
> +	return ret;
>  }
>  
>  MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 353969c..d10784c 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -666,13 +666,15 @@ struct governor_attr {
>  #define CPUFREQ_ENTRY_INVALID	~0u
>  #define CPUFREQ_TABLE_END	~1u
>  /* Special Values of .flags field */
> -#define CPUFREQ_BOOST_FREQ	(1 << 0)
> +#define CPUFREQ_BOOST_FREQ	 (1 << 0)
> +#define CPUFREQ_INEFFICIENT_FREQ (1 << 1)
>  
>  struct cpufreq_frequency_table {
>  	unsigned int	flags;
>  	unsigned int	driver_data; /* driver specific data, not used by core */
>  	unsigned int	frequency; /* kHz - doesn't need to be in ascending
>  				    * order */
> +	unsigned int	efficient; /* idx of an efficient frequency */
>  };
>  
>  #if defined(CONFIG_CPU_FREQ) && defined(CONFIG_PM_OPP)
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 4/6] cpufreq: Skip inefficient frequencies in cpufreq_driver_resolve_freq()
  2021-06-04 11:05 ` [PATCH v3 4/6] cpufreq: Skip inefficient frequencies in cpufreq_driver_resolve_freq() Vincent Donnefort
@ 2021-06-04 18:25   ` Matthias Kaehlcke
  0 siblings, 0 replies; 45+ messages in thread
From: Matthias Kaehlcke @ 2021-06-04 18:25 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, viresh.kumar, vincent.guittot, qperret, linux-pm,
	ionela.voinescu, lukasz.luba, dietmar.eggemann

On Fri, Jun 04, 2021 at 12:05:59PM +0100, Vincent Donnefort wrote:
> Avoid using frequencies marked as inefficient. This change affects
> schedutil, which is the only in-tree governor using the function
> cpufreq_driver_resolve_freq().
> 
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 802abc9..7431f40a 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -546,6 +546,10 @@ unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
>  
>  		idx = cpufreq_frequency_table_target(policy, target_freq,
>  						     CPUFREQ_RELATION_L);
> +
> +		/* Replace the target with an efficient one */
> +		idx = cpufreq_frequency_find_efficient(policy, idx);
> +
>  		policy->cached_resolved_idx = idx;
>  		return policy->freq_table[idx].frequency;
>  	}
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index d10784c..0ca4c9a 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -994,6 +994,17 @@ static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
>  	}
>  }
>  
> +static inline unsigned int
> +cpufreq_frequency_find_efficient(struct cpufreq_policy *policy,
> +				 unsigned int idx)
> +{
> +	struct cpufreq_frequency_table *table = policy->freq_table;
> +	unsigned int efficient_idx = table[idx].efficient;
> +
> +	return table[efficient_idx].frequency <= policy->max ? efficient_idx :
> +							       idx;
> +}
> +
>  static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy *policy)
>  {
>  	struct cpufreq_frequency_table *pos;

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 5/6] cpufreq: Mark inefficient frequencies using the Energy Model
  2021-06-04 11:06 ` [PATCH v3 5/6] cpufreq: Mark inefficient frequencies using the Energy Model Vincent Donnefort
@ 2021-06-04 18:35   ` Matthias Kaehlcke
  0 siblings, 0 replies; 45+ messages in thread
From: Matthias Kaehlcke @ 2021-06-04 18:35 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, viresh.kumar, vincent.guittot, qperret, linux-pm,
	ionela.voinescu, lukasz.luba, dietmar.eggemann

On Fri, Jun 04, 2021 at 12:06:00PM +0100, Vincent Donnefort wrote:
> The Energy Model has a 1:1 mapping between OPPs and performance states
> (em_perf_state). We can then read which states are inefficient and use this
> information to mark the cpufreq table.
> 
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 7431f40a..34d344d 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -19,6 +19,7 @@
>  #include <linux/cpu_cooling.h>
>  #include <linux/delay.h>
>  #include <linux/device.h>
> +#include <linux/energy_model.h>
>  #include <linux/init.h>
>  #include <linux/kernel_stat.h>
>  #include <linux/module.h>
> @@ -1317,6 +1318,31 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
>  	kfree(policy);
>  }
>  
> +static void cpufreq_inefficiencies_from_em(struct cpufreq_policy *policy,
> +					   struct em_perf_domain *em_pd)
> +{
> +	struct cpufreq_frequency_table *pos, *table = policy->freq_table;
> +	struct em_perf_state *em_table;
> +	int i;
> +
> +	if (!em_pd)
> +		return;
> +
> +	em_table = em_pd->table;
> +
> +	for (i = 0; i < em_pd->nr_perf_states; i++) {
> +		if (!(em_table[i].flags & EM_PERF_STATE_INEFFICIENT))
> +			continue;
> +
> +		cpufreq_for_each_valid_entry(pos, table) {
> +			if (pos->frequency == em_table[i].frequency) {
> +				pos->flags |= CPUFREQ_INEFFICIENT_FREQ;
> +				break;
> +			}
> +		}
> +	}
> +}
> +
>  static int cpufreq_online(unsigned int cpu)
>  {
>  	struct cpufreq_policy *policy;
> @@ -1371,6 +1397,12 @@ static int cpufreq_online(unsigned int cpu)
>  			goto out_free_policy;
>  		}
>  
> +		/*
> +		 * Sync potential inefficiencies with an Energy Model that the
> +		 * driver might have registered.
> +		 */
> +		cpufreq_inefficiencies_from_em(policy, em_cpu_get(cpu));
> +
>  		ret = cpufreq_table_validate_and_sort(policy);
>  		if (ret)
>  			goto out_exit_policy;

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 6/6] PM / EM: Skip inefficient states
  2021-06-04 11:06 ` [PATCH v3 6/6] PM / EM: Skip inefficient states Vincent Donnefort
@ 2021-06-04 18:49   ` Matthias Kaehlcke
  0 siblings, 0 replies; 45+ messages in thread
From: Matthias Kaehlcke @ 2021-06-04 18:49 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, viresh.kumar, vincent.guittot, qperret, linux-pm,
	ionela.voinescu, lukasz.luba, dietmar.eggemann

On Fri, Jun 04, 2021 at 12:06:01PM +0100, Vincent Donnefort wrote:
> Now that Schedutil can leverage the inefficiency identification from the
> Energy Model, we can skip inefficient states when estimating the energy
> consumption.
> 
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 2531325..d20f6afe 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -98,6 +98,36 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>  void em_dev_unregister_perf_domain(struct device *dev);
>  
>  /**
> + * em_pd_get_efficient_state() - Get an efficient performance state from the EM
> + * @pd   : Performance domain for which we want an efficient frequency
> + * @freq : Frequency to map with the EM
> + *
> + * It is called from the scheduler code quite frequently and as a consequence
> + * doesn't implement any check.
> + *
> + * Return: An efficient performance state, high enough to meet @freq
> + * requirement.
> + */
> +static inline
> +struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd,
> +						unsigned long freq)
> +{
> +	struct em_perf_state *ps;
> +	int i;
> +
> +	for (i = 0; i < pd->nr_perf_states; i++) {
> +		ps = &pd->table[i];
> +		if (ps->frequency >= freq) {
> +			if (ps->flags & EM_PERF_STATE_INEFFICIENT)
> +				continue;
> +			break;
> +		}
> +	}
> +
> +	return ps;
> +}
> +
> +/**
>   * em_cpu_energy() - Estimates the energy consumed by the CPUs of a
>  		performance domain
>   * @pd		: performance domain for which energy has to be estimated
> @@ -116,7 +146,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>  {
>  	unsigned long freq, scale_cpu;
>  	struct em_perf_state *ps;
> -	int i, cpu;
> +	int cpu;
>  
>  	if (!sum_util)
>  		return 0;
> @@ -135,11 +165,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>  	 * Find the lowest performance state of the Energy Model above the
>  	 * requested frequency.
>  	 */
> -	for (i = 0; i < pd->nr_perf_states; i++) {
> -		ps = &pd->table[i];
> -		if (ps->frequency >= freq)
> -			break;
> -	}
> +	ps = em_pd_get_efficient_state(pd, freq);
>  
>  	/*
>  	 * The capacity of a CPU in the domain at the performance state (ps)

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-04 11:05 ` [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
  2021-06-04 18:19   ` Matthias Kaehlcke
@ 2021-06-07  5:02   ` Viresh Kumar
  2021-06-07 10:14     ` Lukasz Luba
  2021-06-14  7:28   ` Viresh Kumar
  2 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2021-06-07  5:02 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann

On 04-06-21, 12:05, Vincent Donnefort wrote:
> Some SoCs such as the sd855 have OPPs within the same policy whose cost is
> higher than others with a higher frequency. Those OPPs are inefficients and
> it might be interesting for a governor to not use them.
> 
> Adding a flag, CPUFREQ_INEFFICIENT_FREQ, to mark such OPPs into the
> frequency table, as well as a new cpufreq_frequency_table member
> "efficient". This new member will allow a governor to quickly resolve an
> inefficient frequency to an efficient one.
> 
> Efficient OPPs point to themselves. Governors must also ensure that the
> efficiency resolution does not break the policy maximum.
> 
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

I was thinking about a different approach when I suggested it.

- The cpufreq table instead of an index, will have "efficient" as bool.

- EM will set an OPP as efficient or inefficient, based on the OPP
  table, there will be a flag for this in the OPP table.

- The cpufreq table is then created from OPP table and this
  information is automatically retrieved.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-07  5:02   ` Viresh Kumar
@ 2021-06-07 10:14     ` Lukasz Luba
  0 siblings, 0 replies; 45+ messages in thread
From: Lukasz Luba @ 2021-06-07 10:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Vincent Donnefort, peterz, rjw, vincent.guittot, qperret,
	linux-pm, ionela.voinescu, dietmar.eggemann



On 6/7/21 6:02 AM, Viresh Kumar wrote:
> On 04-06-21, 12:05, Vincent Donnefort wrote:
>> Some SoCs such as the sd855 have OPPs within the same policy whose cost is
>> higher than others with a higher frequency. Those OPPs are inefficients and
>> it might be interesting for a governor to not use them.
>>
>> Adding a flag, CPUFREQ_INEFFICIENT_FREQ, to mark such OPPs into the
>> frequency table, as well as a new cpufreq_frequency_table member
>> "efficient". This new member will allow a governor to quickly resolve an
>> inefficient frequency to an efficient one.
>>
>> Efficient OPPs point to themselves. Governors must also ensure that the
>> efficiency resolution does not break the policy maximum.
>>
>> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> I was thinking about a different approach when I suggested it.
> 
> - The cpufreq table instead of an index, will have "efficient" as bool.
> 
> - EM will set an OPP as efficient or inefficient, based on the OPP
>    table, there will be a flag for this in the OPP table.
> 
> - The cpufreq table is then created from OPP table and this
>    information is automatically retrieved.
> 

There are some issues with your proposed approach:
The EM doesn't depend on OPP framework (even no header from opp.h)
and we don't want to add this dependency in EM.

The Vincent's proposed implementation doesn't introduce this
dependency.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-04 11:05 ` [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
  2021-06-04 18:19   ` Matthias Kaehlcke
  2021-06-07  5:02   ` Viresh Kumar
@ 2021-06-14  7:28   ` Viresh Kumar
  2021-06-14 13:35     ` Vincent Donnefort
  2 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2021-06-14  7:28 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann

On 04-06-21, 12:05, Vincent Donnefort wrote:
>  int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
>  {
>  	int ret;
> @@ -362,7 +409,13 @@ int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
>  	if (ret)
>  		return ret;
>  
> -	return set_freq_table_sorted(policy);
> +	ret = set_freq_table_sorted(policy);
> +	if (ret)
> +		return ret;
> +
> +	set_freq_table_efficiencies(policy);
> +
> +	return ret;
>  }

Lets provide a single callback from the cpufreq core to do all
settings related to efficient frequencies and call it from
em_dev_register_perf_domain() ?

So we only do them for the cpufreq drivers that register themselves
with EM.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-14  7:28   ` Viresh Kumar
@ 2021-06-14 13:35     ` Vincent Donnefort
  2021-06-15  5:02       ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-14 13:35 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann

On Mon, Jun 14, 2021 at 12:58:35PM +0530, Viresh Kumar wrote:
> On 04-06-21, 12:05, Vincent Donnefort wrote:
> >  int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
> >  {
> >  	int ret;
> > @@ -362,7 +409,13 @@ int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
> >  	if (ret)
> >  		return ret;
> >  
> > -	return set_freq_table_sorted(policy);
> > +	ret = set_freq_table_sorted(policy);
> > +	if (ret)
> > +		return ret;
> > +
> > +	set_freq_table_efficiencies(policy);
> > +
> > +	return ret;
> >  }
> 
> Lets provide a single callback from the cpufreq core to do all
> settings related to efficient frequencies and call it from
> em_dev_register_perf_domain() ?

Sadly we cannot do it in em_dev_register_perf_domain(). This function is called
from the cpufreq driver init, when the table isn't available yet.

> 
> So we only do them for the cpufreq drivers that register themselves
> with EM.

Currently, only the EM would set those inefficient OPPs. But it also gives the
possibility for individual drivers to set them, even if they do not use the EM.

> 
> -- 
> viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-04 18:19   ` Matthias Kaehlcke
@ 2021-06-14 13:40     ` Vincent Donnefort
  0 siblings, 0 replies; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-14 13:40 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: peterz, rjw, viresh.kumar, vincent.guittot, qperret, linux-pm,
	ionela.voinescu, lukasz.luba, dietmar.eggemann

> > +	for (;;) {
> > +		pos = &table[idx];
> > +
> > +		if (pos->frequency == CPUFREQ_ENTRY_INVALID)
> > +			continue;
> 
> That would result in an infinite loop, you still want to execute the code
> at the bottom of the loop which increments/decrements 'idx' and exits the
> loop when the end of the table is reached.

Arg, indeed, sorry for that ugly thing.

I'll wait for more input from Viresh before submitting a fixed version.

> 
> > +
> > +		if (pos->flags & CPUFREQ_INEFFICIENT_FREQ) {
> > +			pos->efficient = efficient;
> > +		} else {
> > +			pos->efficient = idx;
> > +			efficient = idx;
> > +		}
> > +
> > +		if (sort == CPUFREQ_TABLE_SORTED_ASCENDING) {
> > +			if (--idx < 0)
> > +				break;
> > +		} else {
> > +			if (table[++idx].frequency == CPUFREQ_TABLE_END)
> > +				break;
> > +		}
> > +	}
> > +}
> > +

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-14 13:35     ` Vincent Donnefort
@ 2021-06-15  5:02       ` Viresh Kumar
  2021-06-15  8:44         ` Vincent Donnefort
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2021-06-15  5:02 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann

On 14-06-21, 14:35, Vincent Donnefort wrote:
> On Mon, Jun 14, 2021 at 12:58:35PM +0530, Viresh Kumar wrote:
> > On 04-06-21, 12:05, Vincent Donnefort wrote:
> > >  int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
> > >  {
> > >  	int ret;
> > > @@ -362,7 +409,13 @@ int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > -	return set_freq_table_sorted(policy);
> > > +	ret = set_freq_table_sorted(policy);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	set_freq_table_efficiencies(policy);
> > > +
> > > +	return ret;
> > >  }
> > 
> > Lets provide a single callback from the cpufreq core to do all
> > settings related to efficient frequencies and call it from
> > em_dev_register_perf_domain() ?
> 
> Sadly we cannot do it in em_dev_register_perf_domain(). This function is called
> from the cpufreq driver init, when the table isn't available yet.

I looked at all the users of em_dev_register_perf_domain() and each
one of them have the freq table ready at that point of time.

> > 
> > So we only do them for the cpufreq drivers that register themselves
> > with EM.
> 
> Currently, only the EM would set those inefficient OPPs. But it also gives the
> possibility for individual drivers to set them, even if they do not use the EM.

This is exactly why I want those parts of the kernel to call a
specific API to get this done. This shouldn't be done automatically by
the cpufreq core.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-15  5:02       ` Viresh Kumar
@ 2021-06-15  8:44         ` Vincent Donnefort
  2021-06-15 10:17           ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-15  8:44 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann

On Tue, Jun 15, 2021 at 10:32:11AM +0530, Viresh Kumar wrote:
> On 14-06-21, 14:35, Vincent Donnefort wrote:
> > On Mon, Jun 14, 2021 at 12:58:35PM +0530, Viresh Kumar wrote:
> > > On 04-06-21, 12:05, Vincent Donnefort wrote:
> > > >  int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
> > > >  {
> > > >  	int ret;
> > > > @@ -362,7 +409,13 @@ int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
> > > >  	if (ret)
> > > >  		return ret;
> > > >  
> > > > -	return set_freq_table_sorted(policy);
> > > > +	ret = set_freq_table_sorted(policy);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	set_freq_table_efficiencies(policy);
> > > > +
> > > > +	return ret;
> > > >  }
> > > 
> > > Lets provide a single callback from the cpufreq core to do all
> > > settings related to efficient frequencies and call it from
> > > em_dev_register_perf_domain() ?
> > 
> > Sadly we cannot do it in em_dev_register_perf_domain(). This function is called
> > from the cpufreq driver init, when the table isn't available yet.
> 
> I looked at all the users of em_dev_register_perf_domain() and each
> one of them have the freq table ready at that point of time.

The cpufreq_policy is accessed through percpu data. I originally tried to get it
with cpufreq_cpu_get_raw(). But when we init the cpufreq driver (and by
extension when we call em_dev_register_perf_domain()), the percpu data isn't
updated yet.

I guess the only way at that moment would be to propagate the cpufreq_policy
pointer through the functions parameters, which is a bit cumbersome, especially
as the EM can be used with devfreq as well and that em_dev_register_perf_domain()
can be called from dev_pm_opp_of_register_em()

Would we be ok with that?

I could use em_data_callback as well ... but that doesn't change the fact some
registration is going through dev_pm_opp_of_register_em() which has no knowledge
about the cpufreq_policy. Doesn't look a better approach.

> 
> > > 
> > > So we only do them for the cpufreq drivers that register themselves
> > > with EM.
> > 
> > Currently, only the EM would set those inefficient OPPs. But it also gives the
> > possibility for individual drivers to set them, even if they do not use the EM.
> 
> This is exactly why I want those parts of the kernel to call a
> specific API to get this done. This shouldn't be done automatically by
> the cpufreq core.
> 
> -- 
> viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-15  8:44         ` Vincent Donnefort
@ 2021-06-15 10:17           ` Viresh Kumar
  2021-06-15 17:15             ` Vincent Donnefort
  2021-06-22  9:01             ` Quentin Perret
  0 siblings, 2 replies; 45+ messages in thread
From: Viresh Kumar @ 2021-06-15 10:17 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann

On 15-06-21, 09:44, Vincent Donnefort wrote:
> The cpufreq_policy is accessed through percpu data. I originally tried to get it
> with cpufreq_cpu_get_raw(). But when we init the cpufreq driver (and by
> extension when we call em_dev_register_perf_domain()), the percpu data isn't
> updated yet.

Right.

> I guess the only way at that moment would be to propagate the cpufreq_policy
> pointer through the functions parameters, which is a bit cumbersome, especially
> as the EM can be used with devfreq as well and that em_dev_register_perf_domain()
> can be called from dev_pm_opp_of_register_em()
> 
> Would we be ok with that?

No.

You only talked about freq_table earlier and that's all I checked :)

> I could use em_data_callback as well ... but that doesn't change the fact some
> registration is going through dev_pm_opp_of_register_em() which has no knowledge
> about the cpufreq_policy. Doesn't look a better approach.

The point is that I don't want cpufreq to carry this for users, we
have EM today, tomorrow we may want to mark a frequency as inefficient
from somewhere else. The call need to initiate from EM core.

And this isn't a cpufreq only thing, but is going to be generic along
with other device types.

This is exactly why I asked you earlier to play with OPP core for
this. That is the central place for data for all such users.

If this information is present at the OPP table (somehow), then we can
just fix dev_pm_opp_init_cpufreq_table() to set this for cpufreq core
as well.

This is the sequence that is followed in cpufreq drivers today:

dev_pm_opp_of_cpumask_add_table();
dev_pm_opp_init_cpufreq_table();
dev_pm_opp_of_register_em();

What about changing this to:

dev_pm_opp_of_cpumask_add_table();

/* Mark OPPs are inefficient here */
dev_pm_opp_of_register_em();

/* This should automatically pick the right set */
dev_pm_opp_init_cpufreq_table();

Will this break anything ?

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-15 10:17           ` Viresh Kumar
@ 2021-06-15 17:15             ` Vincent Donnefort
  2021-06-16  7:35               ` Viresh Kumar
  2021-06-22  9:01             ` Quentin Perret
  1 sibling, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-06-15 17:15 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann

On Tue, Jun 15, 2021 at 03:47:06PM +0530, Viresh Kumar wrote:
> On 15-06-21, 09:44, Vincent Donnefort wrote:
> > The cpufreq_policy is accessed through percpu data. I originally tried to get it
> > with cpufreq_cpu_get_raw(). But when we init the cpufreq driver (and by
> > extension when we call em_dev_register_perf_domain()), the percpu data isn't
> > updated yet.
> 
> Right.
> 
> > I guess the only way at that moment would be to propagate the cpufreq_policy
> > pointer through the functions parameters, which is a bit cumbersome, especially
> > as the EM can be used with devfreq as well and that em_dev_register_perf_domain()
> > can be called from dev_pm_opp_of_register_em()
> > 
> > Would we be ok with that?
> 
> No.
> 
> You only talked about freq_table earlier and that's all I checked :)
> 
> > I could use em_data_callback as well ... but that doesn't change the fact some
> > registration is going through dev_pm_opp_of_register_em() which has no knowledge
> > about the cpufreq_policy. Doesn't look a better approach.
> 
> The point is that I don't want cpufreq to carry this for users, we
> have EM today, tomorrow we may want to mark a frequency as inefficient
> from somewhere else. The call need to initiate from EM core.

In the current version of this patchset, any driver can mark inefficiencies
without relying on the EM, just by adding the flag CPUFREQ_INEFFICIENT_FREQ in
cpufreq_frequency_table.

Populating cpufreq_frequency_table from the EM in cpufreq was just an attempt to
a less intrusive set of changes.

> 
> And this isn't a cpufreq only thing, but is going to be generic along
> with other device types.
> 
> This is exactly why I asked you earlier to play with OPP core for
> this. That is the central place for data for all such users.
> 
> If this information is present at the OPP table (somehow), then we can
> just fix dev_pm_opp_init_cpufreq_table() to set this for cpufreq core
> as well.
> 
> This is the sequence that is followed in cpufreq drivers today:
> 
> dev_pm_opp_of_cpumask_add_table();
> dev_pm_opp_init_cpufreq_table();
> dev_pm_opp_of_register_em();
> 
> What about changing this to:
> 
> dev_pm_opp_of_cpumask_add_table();
> 
> /* Mark OPPs are inefficient here */
> dev_pm_opp_of_register_em();
> 
> /* This should automatically pick the right set */
> dev_pm_opp_init_cpufreq_table();
> 
> Will this break anything ?

Probably not, but with this approach I'll have to modify all the cpufreq drivers
that are registering the EM, which I tried to avoid as much as possible so far.

But if we sum-up:

1. em_dev_register_perf_domain() find inefficiencies

2. dev_pm_opp_of_register_em() apply EM inefficiencies into the OPP structures

   Note: scmi-cpufreq would need special treatment, as it doesn't rely
   dev_pm_opp_of_register_em().

3. dev_pm_opp_init_cpufreq_table() marks the cpufreq table with the OPP
   inefficiencies 

Guess it would ease the adoption by other OPP clients. However this patchset
will clearly get bigger.

Would you agree with that?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-15 17:15             ` Vincent Donnefort
@ 2021-06-16  7:35               ` Viresh Kumar
  2021-06-16  9:03                 ` Lukasz Luba
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2021-06-16  7:35 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	lukasz.luba, dietmar.eggemann

On 15-06-21, 18:15, Vincent Donnefort wrote:
> On Tue, Jun 15, 2021 at 03:47:06PM +0530, Viresh Kumar wrote:
> > The point is that I don't want cpufreq to carry this for users, we
> > have EM today, tomorrow we may want to mark a frequency as inefficient
> > from somewhere else. The call need to initiate from EM core.
> 
> In the current version of this patchset, any driver can mark inefficiencies
> without relying on the EM, just by adding the flag CPUFREQ_INEFFICIENT_FREQ in
> cpufreq_frequency_table.

Yeah, I wasn't really talking about cpufreq drivers but external
entities, like EM.

> Populating cpufreq_frequency_table from the EM in cpufreq was just an attempt to
> a less intrusive set of changes.
 
> > And this isn't a cpufreq only thing, but is going to be generic along
> > with other device types.
> > 
> > This is exactly why I asked you earlier to play with OPP core for
> > this. That is the central place for data for all such users.
> > 
> > If this information is present at the OPP table (somehow), then we can
> > just fix dev_pm_opp_init_cpufreq_table() to set this for cpufreq core
> > as well.
> > 
> > This is the sequence that is followed in cpufreq drivers today:
> > 
> > dev_pm_opp_of_cpumask_add_table();
> > dev_pm_opp_init_cpufreq_table();
> > dev_pm_opp_of_register_em();
> > 
> > What about changing this to:
> > 
> > dev_pm_opp_of_cpumask_add_table();
> > 
> > /* Mark OPPs are inefficient here */
> > dev_pm_opp_of_register_em();
> > 
> > /* This should automatically pick the right set */
> > dev_pm_opp_init_cpufreq_table();
> > 
> > Will this break anything ?
> 
> Probably not, but with this approach I'll have to modify all the cpufreq drivers
> that are registering the EM, which I tried to avoid as much as possible so far.

Hmm. You are right as well, but I just want to get the right API in
place which lives a longer life :)

> But if we sum-up:
> 
> 1. em_dev_register_perf_domain() find inefficiencies
> 
> 2. dev_pm_opp_of_register_em() apply EM inefficiencies into the OPP structures

I was looking to add a new API to the OPP core
(dev_pm_opp_mark_inefficient()) to mark an OPP inefficient. And then
get it called from em_create_perf_table().

But I now see that EM core rather has callbacks to call into and with
that I think you should rather add another callback
(.mark_inefficient()) in struct em_data_callback, to set inefficient
frequencies.

>    Note: scmi-cpufreq would need special treatment, as it doesn't rely
>    dev_pm_opp_of_register_em().

For both dev_pm_opp_of_register_em() and scmi case, you can then set
this callback to dev_pm_opp_mark_inefficient().

> 3. dev_pm_opp_init_cpufreq_table() marks the cpufreq table with the OPP
>    inefficiencies 

Yes.

> Guess it would ease the adoption by other OPP clients. However this patchset
> will clearly get bigger.
> 
> Would you agree with that?

Yes, it is fine.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-16  7:35               ` Viresh Kumar
@ 2021-06-16  9:03                 ` Lukasz Luba
  2021-06-16  9:31                   ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Lukasz Luba @ 2021-06-16  9:03 UTC (permalink / raw)
  To: Viresh Kumar, Vincent Donnefort
  Cc: peterz, rjw, vincent.guittot, qperret, linux-pm, ionela.voinescu,
	dietmar.eggemann



On 6/16/21 8:35 AM, Viresh Kumar wrote:
> On 15-06-21, 18:15, Vincent Donnefort wrote:
>> On Tue, Jun 15, 2021 at 03:47:06PM +0530, Viresh Kumar wrote:
>>> The point is that I don't want cpufreq to carry this for users, we
>>> have EM today, tomorrow we may want to mark a frequency as inefficient
>>> from somewhere else. The call need to initiate from EM core.
>>
>> In the current version of this patchset, any driver can mark inefficiencies
>> without relying on the EM, just by adding the flag CPUFREQ_INEFFICIENT_FREQ in
>> cpufreq_frequency_table.
> 
> Yeah, I wasn't really talking about cpufreq drivers but external
> entities, like EM.
> 
>> Populating cpufreq_frequency_table from the EM in cpufreq was just an attempt to
>> a less intrusive set of changes.
>   
>>> And this isn't a cpufreq only thing, but is going to be generic along
>>> with other device types.
>>>
>>> This is exactly why I asked you earlier to play with OPP core for
>>> this. That is the central place for data for all such users.
>>>
>>> If this information is present at the OPP table (somehow), then we can
>>> just fix dev_pm_opp_init_cpufreq_table() to set this for cpufreq core
>>> as well.
>>>
>>> This is the sequence that is followed in cpufreq drivers today:
>>>
>>> dev_pm_opp_of_cpumask_add_table();
>>> dev_pm_opp_init_cpufreq_table();
>>> dev_pm_opp_of_register_em();
>>>
>>> What about changing this to:
>>>
>>> dev_pm_opp_of_cpumask_add_table();
>>>
>>> /* Mark OPPs are inefficient here */
>>> dev_pm_opp_of_register_em();
>>>
>>> /* This should automatically pick the right set */
>>> dev_pm_opp_init_cpufreq_table();
>>>
>>> Will this break anything ?
>>
>> Probably not, but with this approach I'll have to modify all the cpufreq drivers
>> that are registering the EM, which I tried to avoid as much as possible so far.
> 
> Hmm. You are right as well, but I just want to get the right API in
> place which lives a longer life :)
> 
>> But if we sum-up:
>>
>> 1. em_dev_register_perf_domain() find inefficiencies
>>
>> 2. dev_pm_opp_of_register_em() apply EM inefficiencies into the OPP structures
> 
> I was looking to add a new API to the OPP core
> (dev_pm_opp_mark_inefficient()) to mark an OPP inefficient. And then
> get it called from em_create_perf_table().
> 
> But I now see that EM core rather has callbacks to call into and with

Exactly, that's what I was trying to stress.

> that I think you should rather add another callback
> (.mark_inefficient()) in struct em_data_callback, to set inefficient
> frequencies.

I disagree. That's why I prefer Vincent's approach in this patch set.
This proposal would cause more mess.

Vincent proposed a small and clean modification. This modification
can be done easily in this cpufreq place because we have EM in
device cpu struct.

Let's don't over-engineering. The inefficient information is only valid
for schedutil, thus IMHO it can live like this patch set made - in the
cpufreq table.

Compare the v1 (I still don't understand why it was blocked), this v3
and your proposal.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-16  9:03                 ` Lukasz Luba
@ 2021-06-16  9:31                   ` Viresh Kumar
  2021-06-16 10:33                     ` Lukasz Luba
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2021-06-16  9:31 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Vincent Donnefort, peterz, rjw, vincent.guittot, qperret,
	linux-pm, ionela.voinescu, dietmar.eggemann

On 16-06-21, 10:03, Lukasz Luba wrote:
> On 6/16/21 8:35 AM, Viresh Kumar wrote:
> > On 15-06-21, 18:15, Vincent Donnefort wrote:
> > > But if we sum-up:
> > > 
> > > 1. em_dev_register_perf_domain() find inefficiencies
> > > 
> > > 2. dev_pm_opp_of_register_em() apply EM inefficiencies into the OPP structures
> > 
> > I was looking to add a new API to the OPP core
> > (dev_pm_opp_mark_inefficient()) to mark an OPP inefficient. And then
> > get it called from em_create_perf_table().
> > 
> > But I now see that EM core rather has callbacks to call into and with
> 
> Exactly, that's what I was trying to stress.
> 
> > that I think you should rather add another callback
> > (.mark_inefficient()) in struct em_data_callback, to set inefficient
> > frequencies.
> 
> I disagree. That's why I prefer Vincent's approach in this patch set.
> This proposal would cause more mess.
> 
> Vincent proposed a small and clean modification. This modification
> can be done easily in this cpufreq place because we have EM in
> device cpu struct.

This may look clean to you, but not to me, sorry about that.

Clean is not lesser number of lines for me, but rather having the
right ownership of such things.

For example this patch:

https://lore.kernel.org/linux-pm/1622804761-126737-6-git-send-email-vincent.donnefort@arm.com/

tries to add EM stuff in cpufreq core. Cpufreq core doesn't care about
EM and it shouldn't. And this piece of code doesn't belong here.

Would you guys like to add a cpufreq specific call into the EM core? I
won't, that's not a place for cpufreq stuff. It is the EM core. I was
fine with not including OPP core into this, and I gave up that
argument earlier, but then we realized that the cpufreq core isn't
ready at the time we register with EM core.

Honestly, OPP core looks to be a better place holder for such stuff.
This is exactly the purpose of the OPP core. Moreover, we can apply
the same logic to devfreq or other devices later, with or without EM
core. Again, OPP core fits better.

The cpufreq core already has the relevant APIs in place to the OPP
core and this won't require a new API there.

> Let's don't over-engineering. The inefficient information is only valid
> for schedutil, thus IMHO it can live like this patch set made - in the
> cpufreq table.

For now, yes. There is no guarantee though that we won't have more in
future.

> Compare the v1 (I still don't understand why it was blocked),

IIRC, it required more work to be done in the hotpath, i.e. traversing
the freq list twice.

> this v3 and your proposal.

IMHO, adding such callbacks to the EM core, like .mark_efficient(),
will only make this easier to handle for all different frameworks, and
not otherwise. The code will look much cleaner everywhere..

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-16  9:31                   ` Viresh Kumar
@ 2021-06-16 10:33                     ` Lukasz Luba
  2021-06-16 10:53                       ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Lukasz Luba @ 2021-06-16 10:33 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Vincent Donnefort, peterz, rjw, vincent.guittot, qperret,
	linux-pm, ionela.voinescu, dietmar.eggemann



On 6/16/21 10:31 AM, Viresh Kumar wrote:
> On 16-06-21, 10:03, Lukasz Luba wrote:
>> On 6/16/21 8:35 AM, Viresh Kumar wrote:
>>> On 15-06-21, 18:15, Vincent Donnefort wrote:
>>>> But if we sum-up:
>>>>
>>>> 1. em_dev_register_perf_domain() find inefficiencies
>>>>
>>>> 2. dev_pm_opp_of_register_em() apply EM inefficiencies into the OPP structures
>>>
>>> I was looking to add a new API to the OPP core
>>> (dev_pm_opp_mark_inefficient()) to mark an OPP inefficient. And then
>>> get it called from em_create_perf_table().
>>>
>>> But I now see that EM core rather has callbacks to call into and with
>>
>> Exactly, that's what I was trying to stress.
>>
>>> that I think you should rather add another callback
>>> (.mark_inefficient()) in struct em_data_callback, to set inefficient
>>> frequencies.
>>
>> I disagree. That's why I prefer Vincent's approach in this patch set.
>> This proposal would cause more mess.
>>
>> Vincent proposed a small and clean modification. This modification
>> can be done easily in this cpufreq place because we have EM in
>> device cpu struct.
> 
> This may look clean to you, but not to me, sorry about that.
> 
> Clean is not lesser number of lines for me, but rather having the
> right ownership of such things.

Some developers do like patches which removes more lines then adds ;)

> 
> For example this patch:
> 
> https://lore.kernel.org/linux-pm/1622804761-126737-6-git-send-email-vincent.donnefort@arm.com/
> 
> tries to add EM stuff in cpufreq core. Cpufreq core doesn't care about
> EM and it shouldn't. And this piece of code doesn't belong here.
> 
> Would you guys like to add a cpufreq specific call into the EM core? I
> won't, that's not a place for cpufreq stuff. It is the EM core. I was
> fine with not including OPP core into this, and I gave up that
> argument earlier, but then we realized that the cpufreq core isn't
> ready at the time we register with EM core.
> 
> Honestly, OPP core looks to be a better place holder for such stuff.
> This is exactly the purpose of the OPP core. Moreover, we can apply
> the same logic to devfreq or other devices later, with or without EM
> core. Again, OPP core fits better.
> 
> The cpufreq core already has the relevant APIs in place to the OPP
> core and this won't require a new API there.

I don't see an API function in the OPP framework or a field in the
OPP struct which gives information that this freq is inefficient.
Thus, it will require new API changes: cpufreq --> OPP.

> 
>> Let's don't over-engineering. The inefficient information is only valid
>> for schedutil, thus IMHO it can live like this patch set made - in the
>> cpufreq table.
> 
> For now, yes. There is no guarantee though that we won't have more in
> future.

And there won't be in near future. We don't build massive interfaces
because there *might* be potential *oneday*.
Even for this idea, it was a massive work to do the research and prove
it that this is worth to put mainline so all vendors will get it.

The GPUs are slightly different beasts and they have different
workloads (not util + SchedUtil driven AFAIK).

> 
>> Compare the v1 (I still don't understand why it was blocked),
> 
> IIRC, it required more work to be done in the hotpath, i.e. traversing
> the freq list twice.

In v1 there was LUT. IMHO we have too easily gave and said:
'Remove the Look-up table as the numbers weren't strong enough to 
justify the implementation.'
But it had other benefits, which are now pointed.

There was different issue, which we could fix now.
With this patch set [1] EAS could have the freq_max limit, which
SchedUtil has in the hotpath.

What could be the modified v1 [2]:
- LUT which holds two IDs: efficient, inefficient, take one
   according to the clamp f_max
- add new argument 'policy->max' to em_pd_get_efficient_freq()

freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq, policy->max);

The problem was that EAS couldn't know the clamp freq_max,
which shouldn't be the blocker now.

> 
>> this v3 and your proposal.
> 
> IMHO, adding such callbacks to the EM core, like .mark_efficient(),
> will only make this easier to handle for all different frameworks, and
> not otherwise. The code will look much cleaner everywhere..
> 

What about coming back to the slightly modified v1 idea?
That was really self-contained modification for this
inefficient opps heuristic.


[1] https://lore.kernel.org/lkml/20210614185815.15136-1-lukasz.luba@arm.com/
[2] 
https://lore.kernel.org/lkml/1617901829-381963-2-git-send-email-vincent.donnefort@arm.com/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-16 10:33                     ` Lukasz Luba
@ 2021-06-16 10:53                       ` Viresh Kumar
  2021-06-16 12:45                         ` Lukasz Luba
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2021-06-16 10:53 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Vincent Donnefort, peterz, rjw, vincent.guittot, qperret,
	linux-pm, ionela.voinescu, dietmar.eggemann

On 16-06-21, 11:33, Lukasz Luba wrote:
> On 6/16/21 10:31 AM, Viresh Kumar wrote:
> > On 16-06-21, 10:03, Lukasz Luba wrote:
> > Clean is not lesser number of lines for me, but rather having the
> > right ownership of such things.
> 
> Some developers do like patches which removes more lines then adds ;)

:)

> > 
> > For example this patch:
> > 
> > https://lore.kernel.org/linux-pm/1622804761-126737-6-git-send-email-vincent.donnefort@arm.com/
> > 
> > tries to add EM stuff in cpufreq core. Cpufreq core doesn't care about
> > EM and it shouldn't. And this piece of code doesn't belong here.
> > 
> > Would you guys like to add a cpufreq specific call into the EM core? I
> > won't, that's not a place for cpufreq stuff. It is the EM core. I was
> > fine with not including OPP core into this, and I gave up that
> > argument earlier, but then we realized that the cpufreq core isn't
> > ready at the time we register with EM core.
> > 
> > Honestly, OPP core looks to be a better place holder for such stuff.
> > This is exactly the purpose of the OPP core. Moreover, we can apply
> > the same logic to devfreq or other devices later, with or without EM
> > core. Again, OPP core fits better.
> > 
> > The cpufreq core already has the relevant APIs in place to the OPP
> > core and this won't require a new API there.
> 
> I don't see an API function in the OPP framework or a field in the
> OPP struct which gives information that this freq is inefficient.
> Thus, it will require new API changes: cpufreq --> OPP.

dev_pm_opp_init_cpufreq_table() is all we need here, we just need to
change it to update one more field in the cpufreq table's entries.

> > 
> > > Let's don't over-engineering. The inefficient information is only valid
> > > for schedutil, thus IMHO it can live like this patch set made - in the
> > > cpufreq table.
> > 
> > For now, yes. There is no guarantee though that we won't have more in
> > future.
> 
> And there won't be in near future. We don't build massive interfaces
> because there *might* be potential *oneday*.

Yes, true.

> Even for this idea, it was a massive work to do the research and prove
> it that this is worth to put mainline so all vendors will get it.
> 
> The GPUs are slightly different beasts and they have different
> workloads (not util + SchedUtil driven AFAIK).

Right, but even if there is a single user for this, I think getting
this through the right layers is a more cleaner solution.

> In v1 there was LUT.

Oops, yes, I started looking from V2 and not V1.

> IMHO we have too easily gave and said:
> 'Remove the Look-up table as the numbers weren't strong enough to justify
> the implementation.'
> But it had other benefits, which are now pointed.
> 
> There was different issue, which we could fix now.
> With this patch set [1] EAS could have the freq_max limit, which
> SchedUtil has in the hotpath.
> 
> What could be the modified v1 [2]:
> - LUT which holds two IDs: efficient, inefficient, take one
>   according to the clamp f_max
> - add new argument 'policy->max' to em_pd_get_efficient_freq()
> 
> freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq, policy->max);
> 
> The problem was that EAS couldn't know the clamp freq_max,
> which shouldn't be the blocker now.

If you can do that without adding any EM specific stuff in the cpufreq
core, I will mostly be fine.

But honestly speaking, creating more data structures to keep related
information doesn't scale well.

We already have so many tables for keeping freq/voltage pairs, OPP,
cpufreq, EM. You tried to add one more in EM I think V1, not sure.

It is always better to consolidate and we almost reached to a point
where that could have been done very easily. I understand that you
didn't want to touch so many different parts, but anyway..
 
> > > this v3 and your proposal.
> > 
> > IMHO, adding such callbacks to the EM core, like .mark_efficient(),
> > will only make this easier to handle for all different frameworks, and
> > not otherwise. The code will look much cleaner everywhere..
> > 
> 
> What about coming back to the slightly modified v1 idea?
> That was really self-contained modification for this
> inefficient opps heuristic.

I am not sure if I really understand what that would be, but again
adding another table is going to create more problems then it should.

Anyway, that's my view, which can be wrong as well.

Rafael: You have any suggestions here ?

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-16 10:53                       ` Viresh Kumar
@ 2021-06-16 12:45                         ` Lukasz Luba
  2021-07-02 14:21                           ` Lukasz Luba
  0 siblings, 1 reply; 45+ messages in thread
From: Lukasz Luba @ 2021-06-16 12:45 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Vincent Donnefort, peterz, rjw, vincent.guittot, qperret,
	linux-pm, ionela.voinescu, dietmar.eggemann



On 6/16/21 11:53 AM, Viresh Kumar wrote:
> On 16-06-21, 11:33, Lukasz Luba wrote:
>> On 6/16/21 10:31 AM, Viresh Kumar wrote:
>>> On 16-06-21, 10:03, Lukasz Luba wrote:
>>> Clean is not lesser number of lines for me, but rather having the
>>> right ownership of such things.
>>
>> Some developers do like patches which removes more lines then adds ;)
> 
> :)
> 

[snip]

>>
>> What could be the modified v1 [2]:
>> - LUT which holds two IDs: efficient, inefficient, take one
>>    according to the clamp f_max
>> - add new argument 'policy->max' to em_pd_get_efficient_freq()
>>
>> freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq, policy->max);
>>
>> The problem was that EAS couldn't know the clamp freq_max,
>> which shouldn't be the blocker now.
> 
> If you can do that without adding any EM specific stuff in the cpufreq
> core, I will mostly be fine.
> 
> But honestly speaking, creating more data structures to keep related
> information doesn't scale well.
> 
> We already have so many tables for keeping freq/voltage pairs, OPP,
> cpufreq, EM. You tried to add one more in EM I think V1, not sure.
> 
> It is always better to consolidate and we almost reached to a point
> where that could have been done very easily. I understand that you
> didn't want to touch so many different parts, but anyway..

Yes, I don't want to touch some many generic parts because they
are intended to be generic. This heuristic is only for EM platforms,
which are arm, arm64 battery powered (not servers or other).
Thus, I wanted to keep it locally. The cost of EM extra structures
(the LUT) will only be for platforms for which EM discovers that
they have inefficient performance levels.
The code would even not be compiled in for x86, ppc, etc, in hot path.

>   
>>>> this v3 and your proposal.
>>>
>>> IMHO, adding such callbacks to the EM core, like .mark_efficient(),
>>> will only make this easier to handle for all different frameworks, and
>>> not otherwise. The code will look much cleaner everywhere..
>>>
>>
>> What about coming back to the slightly modified v1 idea?
>> That was really self-contained modification for this
>> inefficient opps heuristic.
> 
> I am not sure if I really understand what that would be, but again
> adding another table is going to create more problems then it should.

IMHO your proposals are more invasive for the generic code, while
not always being even used. Plenty of architectures don't even set EM,
even arm64 for servers doesn't use it. You and Rafael would have to
maintain these modifications in generic code. It might be hard to remove
it. While I recommend to keep this heuristic feature inside the EM and
we will maintain it. If we decide after a few years that new arm64
platforms use some smarter FW for performance level, we might just
disable this heuristic.

> 
> Anyway, that's my view, which can be wrong as well.
> 
> Rafael: You have any suggestions here ?
> 

I would also like to hear Rafael's opinion :)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-15 10:17           ` Viresh Kumar
  2021-06-15 17:15             ` Vincent Donnefort
@ 2021-06-22  9:01             ` Quentin Perret
  2021-06-22  9:25               ` Viresh Kumar
  1 sibling, 1 reply; 45+ messages in thread
From: Quentin Perret @ 2021-06-22  9:01 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Vincent Donnefort, peterz, rjw, vincent.guittot, linux-pm,
	ionela.voinescu, lukasz.luba, dietmar.eggemann

On Tuesday 15 Jun 2021 at 15:47:06 (+0530), Viresh Kumar wrote:
> On 15-06-21, 09:44, Vincent Donnefort wrote:
> > The cpufreq_policy is accessed through percpu data. I originally tried to get it
> > with cpufreq_cpu_get_raw(). But when we init the cpufreq driver (and by
> > extension when we call em_dev_register_perf_domain()), the percpu data isn't
> > updated yet.
> 
> Right.
> 
> > I guess the only way at that moment would be to propagate the cpufreq_policy
> > pointer through the functions parameters, which is a bit cumbersome, especially
> > as the EM can be used with devfreq as well and that em_dev_register_perf_domain()
> > can be called from dev_pm_opp_of_register_em()
> > 
> > Would we be ok with that?
> 
> No.
> 
> You only talked about freq_table earlier and that's all I checked :)
> 
> > I could use em_data_callback as well ... but that doesn't change the fact some
> > registration is going through dev_pm_opp_of_register_em() which has no knowledge
> > about the cpufreq_policy. Doesn't look a better approach.
> 
> The point is that I don't want cpufreq to carry this for users, we
> have EM today, tomorrow we may want to mark a frequency as inefficient
> from somewhere else. The call need to initiate from EM core.
> 
> And this isn't a cpufreq only thing, but is going to be generic along
> with other device types.
>
> This is exactly why I asked you earlier to play with OPP core for
> this. That is the central place for data for all such users.

The thing is, the very reason for the existence of PM_EM in the first
place is to NOT depend on PM_OPP which was deemed too Arm-specific. So
let's not create those dependencies now please.

> If this information is present at the OPP table (somehow), then we can
> just fix dev_pm_opp_init_cpufreq_table() to set this for cpufreq core
> as well.

Honnestly, I don't think PM_OPP should have anything to do with this,
for the reason mentionned above.

I don't really see the problem with cpufreq core querying the EM data
directly, we already have core subsystems doing that (the scheduler) so
why would that be a problem? If the only concern is for non-CPU devices,
all we'd need is a matching feature in devfreq core and everything
should be good no?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-22  9:01             ` Quentin Perret
@ 2021-06-22  9:25               ` Viresh Kumar
  0 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2021-06-22  9:25 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Vincent Donnefort, peterz, rjw, vincent.guittot, linux-pm,
	ionela.voinescu, lukasz.luba, dietmar.eggemann

On 22-06-21, 09:01, Quentin Perret wrote:
> On Tuesday 15 Jun 2021 at 15:47:06 (+0530), Viresh Kumar wrote:
> > The point is that I don't want cpufreq to carry this for users, we
> > have EM today, tomorrow we may want to mark a frequency as inefficient
> > from somewhere else. The call need to initiate from EM core.
> > 
> > And this isn't a cpufreq only thing, but is going to be generic along
> > with other device types.
> >
> > This is exactly why I asked you earlier to play with OPP core for
> > this. That is the central place for data for all such users.
> 
> The thing is, the very reason for the existence of PM_EM in the first
> place is to NOT depend on PM_OPP which was deemed too Arm-specific. So
> let's not create those dependencies now please.

I am not asking to create one here.

What I am saying is that users of EM, which eventually call
em_dev_register_perf_domain() have the necessary details about about
the performance states (like voltage/freq pairs, or power) and the EM
core should talk to those users directly.

The struct em_data_callback is already there for this reason so we can
keep EM core independent of its users. I suggested to add another
callback there to mark a frequency inefficient, that's all.

In the case of ARM platforms, that would automatically mean OPP core,
and for others it can be anything that has this knowledge.

> > If this information is present at the OPP table (somehow), then we can
> > just fix dev_pm_opp_init_cpufreq_table() to set this for cpufreq core
> > as well.
> 
> Honnestly, I don't think PM_OPP should have anything to do with this,
> for the reason mentionned above.
> 
> I don't really see the problem with cpufreq core querying the EM data
> directly, we already have core subsystems doing that (the scheduler) so
> why would that be a problem?

If EM core is going to be the only entity ever which can provide this
information, then maybe we can get add some core support for this. But
if it is just another driver/layer which can add this information,
then it is always better to provide APIs which such users can call.
And so I was looking for cpufreq to provide an API, which can be
called from such users.

The problem here happened because we registered with the EM core from
policy's ->init() and we couldn't get back to cpufreq core from there
as the policy is only half initialized.

> If the only concern is for non-CPU devices,
> all we'd need is a matching feature in devfreq core and everything
> should be good no?

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-06-16 12:45                         ` Lukasz Luba
@ 2021-07-02 14:21                           ` Lukasz Luba
  2021-07-02 15:46                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 45+ messages in thread
From: Lukasz Luba @ 2021-07-02 14:21 UTC (permalink / raw)
  To: rjw
  Cc: Viresh Kumar, Vincent Donnefort, peterz, vincent.guittot,
	qperret, linux-pm, ionela.voinescu, dietmar.eggemann

Hi Rafael,

This is a gentle ping.
You have probably not seen this discussion thread.

On 6/16/21 1:45 PM, Lukasz Luba wrote:
> 
> 
> On 6/16/21 11:53 AM, Viresh Kumar wrote:
>> On 16-06-21, 11:33, Lukasz Luba wrote:
>>> On 6/16/21 10:31 AM, Viresh Kumar wrote:
>>>> On 16-06-21, 10:03, Lukasz Luba wrote:
>>>> Clean is not lesser number of lines for me, but rather having the
>>>> right ownership of such things.
>>>
>>> Some developers do like patches which removes more lines then adds ;)
>>
>> :)
>>
> 
> [snip]
> 
>>>
>>> What could be the modified v1 [2]:
>>> - LUT which holds two IDs: efficient, inefficient, take one
>>>    according to the clamp f_max
>>> - add new argument 'policy->max' to em_pd_get_efficient_freq()
>>>
>>> freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq, 
>>> policy->max);
>>>
>>> The problem was that EAS couldn't know the clamp freq_max,
>>> which shouldn't be the blocker now.
>>
>> If you can do that without adding any EM specific stuff in the cpufreq
>> core, I will mostly be fine.
>>
>> But honestly speaking, creating more data structures to keep related
>> information doesn't scale well.
>>
>> We already have so many tables for keeping freq/voltage pairs, OPP,
>> cpufreq, EM. You tried to add one more in EM I think V1, not sure.
>>
>> It is always better to consolidate and we almost reached to a point
>> where that could have been done very easily. I understand that you
>> didn't want to touch so many different parts, but anyway..
> 
> Yes, I don't want to touch some many generic parts because they
> are intended to be generic. This heuristic is only for EM platforms,
> which are arm, arm64 battery powered (not servers or other).
> Thus, I wanted to keep it locally. The cost of EM extra structures
> (the LUT) will only be for platforms for which EM discovers that
> they have inefficient performance levels.
> The code would even not be compiled in for x86, ppc, etc, in hot path.
> 
>>>>> this v3 and your proposal.
>>>>
>>>> IMHO, adding such callbacks to the EM core, like .mark_efficient(),
>>>> will only make this easier to handle for all different frameworks, and
>>>> not otherwise. The code will look much cleaner everywhere..
>>>>
>>>
>>> What about coming back to the slightly modified v1 idea?
>>> That was really self-contained modification for this
>>> inefficient opps heuristic.
>>
>> I am not sure if I really understand what that would be, but again
>> adding another table is going to create more problems then it should.
> 
> IMHO your proposals are more invasive for the generic code, while
> not always being even used. Plenty of architectures don't even set EM,
> even arm64 for servers doesn't use it. You and Rafael would have to
> maintain these modifications in generic code. It might be hard to remove
> it. While I recommend to keep this heuristic feature inside the EM and
> we will maintain it. If we decide after a few years that new arm64
> platforms use some smarter FW for performance level, we might just
> disable this heuristic.
> 
>>
>> Anyway, that's my view, which can be wrong as well.
>>
>> Rafael: You have any suggestions here ?
>>
> 
> I would also like to hear Rafael's opinion :)

It would be great if you could have a look.
I will be grateful for your time spend on it and opinion.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-02 14:21                           ` Lukasz Luba
@ 2021-07-02 15:46                             ` Rafael J. Wysocki
  2021-07-02 16:04                               ` Rafael J. Wysocki
  2021-07-02 16:13                               ` Vincent Donnefort
  0 siblings, 2 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2021-07-02 15:46 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Rafael J. Wysocki, Viresh Kumar, Vincent Donnefort,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann

On Fri, Jul 2, 2021 at 4:21 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> Hi Rafael,
>
> This is a gentle ping.
> You have probably not seen this discussion thread.

I have looked at it briefly for a few times, but not too much into detail.

> On 6/16/21 1:45 PM, Lukasz Luba wrote:
> >
> >
> > On 6/16/21 11:53 AM, Viresh Kumar wrote:
> >> On 16-06-21, 11:33, Lukasz Luba wrote:
> >>> On 6/16/21 10:31 AM, Viresh Kumar wrote:
> >>>> On 16-06-21, 10:03, Lukasz Luba wrote:
> >>>> Clean is not lesser number of lines for me, but rather having the
> >>>> right ownership of such things.
> >>>
> >>> Some developers do like patches which removes more lines then adds ;)
> >>
> >> :)
> >>
> >
> > [snip]
> >
> >>>
> >>> What could be the modified v1 [2]:
> >>> - LUT which holds two IDs: efficient, inefficient, take one
> >>>    according to the clamp f_max
> >>> - add new argument 'policy->max' to em_pd_get_efficient_freq()
> >>>
> >>> freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq,
> >>> policy->max);
> >>>
> >>> The problem was that EAS couldn't know the clamp freq_max,
> >>> which shouldn't be the blocker now.
> >>
> >> If you can do that without adding any EM specific stuff in the cpufreq
> >> core, I will mostly be fine.
> >>
> >> But honestly speaking, creating more data structures to keep related
> >> information doesn't scale well.
> >>
> >> We already have so many tables for keeping freq/voltage pairs, OPP,
> >> cpufreq, EM. You tried to add one more in EM I think V1, not sure.
> >>
> >> It is always better to consolidate and we almost reached to a point
> >> where that could have been done very easily. I understand that you
> >> didn't want to touch so many different parts, but anyway..
> >
> > Yes, I don't want to touch some many generic parts because they
> > are intended to be generic. This heuristic is only for EM platforms,
> > which are arm, arm64 battery powered (not servers or other).
> > Thus, I wanted to keep it locally. The cost of EM extra structures
> > (the LUT) will only be for platforms for which EM discovers that
> > they have inefficient performance levels.
> > The code would even not be compiled in for x86, ppc, etc, in hot path.
> >
> >>>>> this v3 and your proposal.
> >>>>
> >>>> IMHO, adding such callbacks to the EM core, like .mark_efficient(),
> >>>> will only make this easier to handle for all different frameworks, and
> >>>> not otherwise. The code will look much cleaner everywhere..
> >>>>
> >>>
> >>> What about coming back to the slightly modified v1 idea?
> >>> That was really self-contained modification for this
> >>> inefficient opps heuristic.
> >>
> >> I am not sure if I really understand what that would be, but again
> >> adding another table is going to create more problems then it should.
> >
> > IMHO your proposals are more invasive for the generic code, while
> > not always being even used. Plenty of architectures don't even set EM,
> > even arm64 for servers doesn't use it. You and Rafael would have to
> > maintain these modifications in generic code. It might be hard to remove
> > it. While I recommend to keep this heuristic feature inside the EM and
> > we will maintain it. If we decide after a few years that new arm64
> > platforms use some smarter FW for performance level, we might just
> > disable this heuristic.
> >
> >>
> >> Anyway, that's my view, which can be wrong as well.
> >>
> >> Rafael: You have any suggestions here ?
> >>
> >
> > I would also like to hear Rafael's opinion :)
>
> It would be great if you could have a look.
> I will be grateful for your time spend on it and opinion.

First of all, IMO checking whether or not a given frequency is
"efficient" doesn't belong to cpufreq governors.  The governor knows
what the min and max supported frequencies are and selects one from
that range and note that it doesn't even check whether or not the
selected frequency is present in the frequency table.  That part
belongs to the driver or the general frequency table handling in the
cpufreq core.

So the governor doesn't care and it shouldn't care IMO.

Besides, in the cpufreq_driver_fast_switch() case the driver's
->fast_switch() callback is entirely responsible for applying the
selected frequency, so I'm not sure how this "efficient" thing is
going to work then?

Anyway, since we are talking about frequency tables, that would be the
__cpufreq_driver_target() case only and specifically in the
__target_index() case only (note how far away this is from the
governor).

Now, you may think about modifying cpufreq_frequency_table_target() to
skip "inefficient" frequencies, but then the question is why those
frequencies need to be there in the frequency table in the first
place?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-02 15:46                             ` Rafael J. Wysocki
@ 2021-07-02 16:04                               ` Rafael J. Wysocki
  2021-07-02 16:08                                 ` Lukasz Luba
  2021-07-02 16:13                               ` Vincent Donnefort
  1 sibling, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2021-07-02 16:04 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Rafael J. Wysocki, Viresh Kumar, Vincent Donnefort,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann

On Fri, Jul 2, 2021 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Jul 2, 2021 at 4:21 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >
> > Hi Rafael,
> >
> > This is a gentle ping.
> > You have probably not seen this discussion thread.
>
> I have looked at it briefly for a few times, but not too much into detail.
>
> > On 6/16/21 1:45 PM, Lukasz Luba wrote:
> > >
> > >
> > > On 6/16/21 11:53 AM, Viresh Kumar wrote:
> > >> On 16-06-21, 11:33, Lukasz Luba wrote:
> > >>> On 6/16/21 10:31 AM, Viresh Kumar wrote:
> > >>>> On 16-06-21, 10:03, Lukasz Luba wrote:
> > >>>> Clean is not lesser number of lines for me, but rather having the
> > >>>> right ownership of such things.
> > >>>
> > >>> Some developers do like patches which removes more lines then adds ;)
> > >>
> > >> :)
> > >>
> > >
> > > [snip]
> > >
> > >>>
> > >>> What could be the modified v1 [2]:
> > >>> - LUT which holds two IDs: efficient, inefficient, take one
> > >>>    according to the clamp f_max
> > >>> - add new argument 'policy->max' to em_pd_get_efficient_freq()
> > >>>
> > >>> freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq,
> > >>> policy->max);
> > >>>
> > >>> The problem was that EAS couldn't know the clamp freq_max,
> > >>> which shouldn't be the blocker now.
> > >>
> > >> If you can do that without adding any EM specific stuff in the cpufreq
> > >> core, I will mostly be fine.
> > >>
> > >> But honestly speaking, creating more data structures to keep related
> > >> information doesn't scale well.
> > >>
> > >> We already have so many tables for keeping freq/voltage pairs, OPP,
> > >> cpufreq, EM. You tried to add one more in EM I think V1, not sure.
> > >>
> > >> It is always better to consolidate and we almost reached to a point
> > >> where that could have been done very easily. I understand that you
> > >> didn't want to touch so many different parts, but anyway..
> > >
> > > Yes, I don't want to touch some many generic parts because they
> > > are intended to be generic. This heuristic is only for EM platforms,
> > > which are arm, arm64 battery powered (not servers or other).
> > > Thus, I wanted to keep it locally. The cost of EM extra structures
> > > (the LUT) will only be for platforms for which EM discovers that
> > > they have inefficient performance levels.
> > > The code would even not be compiled in for x86, ppc, etc, in hot path.
> > >
> > >>>>> this v3 and your proposal.
> > >>>>
> > >>>> IMHO, adding such callbacks to the EM core, like .mark_efficient(),
> > >>>> will only make this easier to handle for all different frameworks, and
> > >>>> not otherwise. The code will look much cleaner everywhere..
> > >>>>
> > >>>
> > >>> What about coming back to the slightly modified v1 idea?
> > >>> That was really self-contained modification for this
> > >>> inefficient opps heuristic.
> > >>
> > >> I am not sure if I really understand what that would be, but again
> > >> adding another table is going to create more problems then it should.
> > >
> > > IMHO your proposals are more invasive for the generic code, while
> > > not always being even used. Plenty of architectures don't even set EM,
> > > even arm64 for servers doesn't use it. You and Rafael would have to
> > > maintain these modifications in generic code. It might be hard to remove
> > > it. While I recommend to keep this heuristic feature inside the EM and
> > > we will maintain it. If we decide after a few years that new arm64
> > > platforms use some smarter FW for performance level, we might just
> > > disable this heuristic.
> > >
> > >>
> > >> Anyway, that's my view, which can be wrong as well.
> > >>
> > >> Rafael: You have any suggestions here ?
> > >>
> > >
> > > I would also like to hear Rafael's opinion :)
> >
> > It would be great if you could have a look.
> > I will be grateful for your time spend on it and opinion.
>
> First of all, IMO checking whether or not a given frequency is
> "efficient" doesn't belong to cpufreq governors.  The governor knows
> what the min and max supported frequencies are and selects one from
> that range and note that it doesn't even check whether or not the
> selected frequency is present in the frequency table.  That part
> belongs to the driver or the general frequency table handling in the
> cpufreq core.
>
> So the governor doesn't care and it shouldn't care IMO.
>
> Besides, in the cpufreq_driver_fast_switch() case the driver's
> ->fast_switch() callback is entirely responsible for applying the
> selected frequency, so I'm not sure how this "efficient" thing is
> going to work then?
>
> Anyway, since we are talking about frequency tables, that would be the
> __cpufreq_driver_target() case only and specifically in the
> __target_index() case only (note how far away this is from the
> governor).
>
> Now, you may think about modifying cpufreq_frequency_table_target() to
> skip "inefficient" frequencies, but then the question is why those
> frequencies need to be there in the frequency table in the first
> place?

I'm guessing that the problem is that cpufreq_cooling works by using
freq_qos_update_request() to update the max frequency limit and if
that is in effect you'd rather use the inefficient frequencies,
whereas when the governor selects an inefficient frequency  below the
policy limit, you'd rather use a higher-but-efficient frequency
instead (within the policy limit).

Am I guessing correctly?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-02 16:04                               ` Rafael J. Wysocki
@ 2021-07-02 16:08                                 ` Lukasz Luba
  2021-07-02 17:53                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 45+ messages in thread
From: Lukasz Luba @ 2021-07-02 16:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Viresh Kumar, Vincent Donnefort,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann



On 7/2/21 5:04 PM, Rafael J. Wysocki wrote:
> On Fri, Jul 2, 2021 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>> On Fri, Jul 2, 2021 at 4:21 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>
>>> Hi Rafael,
>>>
>>> This is a gentle ping.
>>> You have probably not seen this discussion thread.
>>
>> I have looked at it briefly for a few times, but not too much into detail.
>>
>>> On 6/16/21 1:45 PM, Lukasz Luba wrote:
>>>>
>>>>
>>>> On 6/16/21 11:53 AM, Viresh Kumar wrote:
>>>>> On 16-06-21, 11:33, Lukasz Luba wrote:
>>>>>> On 6/16/21 10:31 AM, Viresh Kumar wrote:
>>>>>>> On 16-06-21, 10:03, Lukasz Luba wrote:
>>>>>>> Clean is not lesser number of lines for me, but rather having the
>>>>>>> right ownership of such things.
>>>>>>
>>>>>> Some developers do like patches which removes more lines then adds ;)
>>>>>
>>>>> :)
>>>>>
>>>>
>>>> [snip]
>>>>
>>>>>>
>>>>>> What could be the modified v1 [2]:
>>>>>> - LUT which holds two IDs: efficient, inefficient, take one
>>>>>>     according to the clamp f_max
>>>>>> - add new argument 'policy->max' to em_pd_get_efficient_freq()
>>>>>>
>>>>>> freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq,
>>>>>> policy->max);
>>>>>>
>>>>>> The problem was that EAS couldn't know the clamp freq_max,
>>>>>> which shouldn't be the blocker now.
>>>>>
>>>>> If you can do that without adding any EM specific stuff in the cpufreq
>>>>> core, I will mostly be fine.
>>>>>
>>>>> But honestly speaking, creating more data structures to keep related
>>>>> information doesn't scale well.
>>>>>
>>>>> We already have so many tables for keeping freq/voltage pairs, OPP,
>>>>> cpufreq, EM. You tried to add one more in EM I think V1, not sure.
>>>>>
>>>>> It is always better to consolidate and we almost reached to a point
>>>>> where that could have been done very easily. I understand that you
>>>>> didn't want to touch so many different parts, but anyway..
>>>>
>>>> Yes, I don't want to touch some many generic parts because they
>>>> are intended to be generic. This heuristic is only for EM platforms,
>>>> which are arm, arm64 battery powered (not servers or other).
>>>> Thus, I wanted to keep it locally. The cost of EM extra structures
>>>> (the LUT) will only be for platforms for which EM discovers that
>>>> they have inefficient performance levels.
>>>> The code would even not be compiled in for x86, ppc, etc, in hot path.
>>>>
>>>>>>>> this v3 and your proposal.
>>>>>>>
>>>>>>> IMHO, adding such callbacks to the EM core, like .mark_efficient(),
>>>>>>> will only make this easier to handle for all different frameworks, and
>>>>>>> not otherwise. The code will look much cleaner everywhere..
>>>>>>>
>>>>>>
>>>>>> What about coming back to the slightly modified v1 idea?
>>>>>> That was really self-contained modification for this
>>>>>> inefficient opps heuristic.
>>>>>
>>>>> I am not sure if I really understand what that would be, but again
>>>>> adding another table is going to create more problems then it should.
>>>>
>>>> IMHO your proposals are more invasive for the generic code, while
>>>> not always being even used. Plenty of architectures don't even set EM,
>>>> even arm64 for servers doesn't use it. You and Rafael would have to
>>>> maintain these modifications in generic code. It might be hard to remove
>>>> it. While I recommend to keep this heuristic feature inside the EM and
>>>> we will maintain it. If we decide after a few years that new arm64
>>>> platforms use some smarter FW for performance level, we might just
>>>> disable this heuristic.
>>>>
>>>>>
>>>>> Anyway, that's my view, which can be wrong as well.
>>>>>
>>>>> Rafael: You have any suggestions here ?
>>>>>
>>>>
>>>> I would also like to hear Rafael's opinion :)
>>>
>>> It would be great if you could have a look.
>>> I will be grateful for your time spend on it and opinion.
>>
>> First of all, IMO checking whether or not a given frequency is
>> "efficient" doesn't belong to cpufreq governors.  The governor knows
>> what the min and max supported frequencies are and selects one from
>> that range and note that it doesn't even check whether or not the
>> selected frequency is present in the frequency table.  That part
>> belongs to the driver or the general frequency table handling in the
>> cpufreq core.
>>
>> So the governor doesn't care and it shouldn't care IMO.
>>
>> Besides, in the cpufreq_driver_fast_switch() case the driver's
>> ->fast_switch() callback is entirely responsible for applying the
>> selected frequency, so I'm not sure how this "efficient" thing is
>> going to work then?
>>
>> Anyway, since we are talking about frequency tables, that would be the
>> __cpufreq_driver_target() case only and specifically in the
>> __target_index() case only (note how far away this is from the
>> governor).
>>
>> Now, you may think about modifying cpufreq_frequency_table_target() to
>> skip "inefficient" frequencies, but then the question is why those
>> frequencies need to be there in the frequency table in the first
>> place?
> 
> I'm guessing that the problem is that cpufreq_cooling works by using
> freq_qos_update_request() to update the max frequency limit and if
> that is in effect you'd rather use the inefficient frequencies,
> whereas when the governor selects an inefficient frequency  below the
> policy limit, you'd rather use a higher-but-efficient frequency
> instead (within the policy limit).
> 
> Am I guessing correctly?
> 

Yes, correct. Thermal would use all (efficient + inefficient), but
we in cpufreq governor would like to pick if possible the efficient
one (below the thermal limit).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-02 15:46                             ` Rafael J. Wysocki
  2021-07-02 16:04                               ` Rafael J. Wysocki
@ 2021-07-02 16:13                               ` Vincent Donnefort
  2021-07-02 17:38                                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-07-02 16:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Lukasz Luba, Rafael J. Wysocki, Viresh Kumar, Peter Zijlstra,
	Vincent Guittot, Quentin Perret, Linux PM, Ionela Voinescu,
	Dietmar Eggemann

[...]

Hi Rafael,

> >
> > It would be great if you could have a look.
> > I will be grateful for your time spend on it and opinion.
> 
> First of all, IMO checking whether or not a given frequency is
> "efficient" doesn't belong to cpufreq governors.  The governor knows
> what the min and max supported frequencies are and selects one from
> that range and note that it doesn't even check whether or not the
> selected frequency is present in the frequency table.  That part
> belongs to the driver or the general frequency table handling in the
> cpufreq core.
> 
> So the governor doesn't care and it shouldn't care IMO.

A governor such as userspace should probably be able select any frequency
with no regard to efficiency.

> 
> Besides, in the cpufreq_driver_fast_switch() case the driver's
> ->fast_switch() callback is entirely responsible for applying the
> selected frequency, so I'm not sure how this "efficient" thing is
> going to work then?

This shouldn't be a problem if a governor that leverages inefficient OPPs
resolves an inefficient one to an efficient one. The target_freq would simply
point to a frequency known as efficient.

> Anyway, since we are talking about frequency tables, that would be the
> __cpufreq_driver_target() case only and specifically in the
> __target_index() case only (note how far away this is from the
> governor).
> 
> Now, you may think about modifying cpufreq_frequency_table_target() to
> skip "inefficient" frequencies, but then the question is why those
> frequencies need to be there in the frequency table in the first
> place?

Cpufreq_cooling needs those frequencies, even inefficients, also there's
probably no reason to hide them from the userspace.

-- 
Vincent

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-02 16:13                               ` Vincent Donnefort
@ 2021-07-02 17:38                                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2021-07-02 17:38 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Lukasz Luba, Rafael J. Wysocki, Viresh Kumar,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann

On Fri, Jul 2, 2021 at 6:14 PM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> [...]
>
> Hi Rafael,
>
> > >
> > > It would be great if you could have a look.
> > > I will be grateful for your time spend on it and opinion.
> >
> > First of all, IMO checking whether or not a given frequency is
> > "efficient" doesn't belong to cpufreq governors.  The governor knows
> > what the min and max supported frequencies are and selects one from
> > that range and note that it doesn't even check whether or not the
> > selected frequency is present in the frequency table.  That part
> > belongs to the driver or the general frequency table handling in the
> > cpufreq core.
> >
> > So the governor doesn't care and it shouldn't care IMO.
>
> A governor such as userspace should probably be able select any frequency
> with no regard to efficiency.

You seem to be arguing that the decision whether or not to use
inefficient frequencies needs to be made at the governor level,
because the driver must use whatever frequency is selected by the
userspace governor, since the user process driving it may be assuming
that to be the case.

That's why you want to hack schedutil to filter out the inefficient
frequencies, so you don't need to worry about them below that level.

That would be putting driver knowledge into the governor which cpufreq
was specifically designed to avoid.

There is a way to do that, though, which is to put a governor into the
driver and use ->setoplicy(), but the good news is that you don't even
need to do that.

So there are those CPUFREQ_RELATION_* symbols used by governors to
tell drivers what to do next (I wonder what they would be good for
after your changes for that matter) and you can add more of them.  For
example, you can define a CPUFREQ_RELATION_EFFICIENT bit to tell the
driver to avoid inefficient frequencies and pass that from schedutil.

Besides, I'm not sure why you still want to use inefficient
frequencies if they are selected by the ondemand or conservative
governors.

> >
> > Besides, in the cpufreq_driver_fast_switch() case the driver's
> > ->fast_switch() callback is entirely responsible for applying the
> > selected frequency, so I'm not sure how this "efficient" thing is
> > going to work then?
>
> This shouldn't be a problem if a governor that leverages inefficient OPPs
> resolves an inefficient one to an efficient one. The target_freq would simply
> point to a frequency known as efficient.

No, that doesn't help.  Moreover, it is harmful, because it changes
the interface between this particular governor and drivers in a subtle
and potentially confusing way.

Namely, drivers now can expect that the frequency passed to
->fast_switch() is the one corresponding to the current utilization
demand as seen by the governor, subject to the policy limits, not
the-closest-efficient-one-present-in-the-frequency-table.

> > Anyway, since we are talking about frequency tables, that would be the
> > __cpufreq_driver_target() case only and specifically in the
> > __target_index() case only (note how far away this is from the
> > governor).
> >
> > Now, you may think about modifying cpufreq_frequency_table_target() to
> > skip "inefficient" frequencies, but then the question is why those
> > frequencies need to be there in the frequency table in the first
> > place?
>
> Cpufreq_cooling needs those frequencies, even inefficients,

I've already figured that out.

> also there's probably no reason to hide them from the userspace.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-02 16:08                                 ` Lukasz Luba
@ 2021-07-02 17:53                                   ` Rafael J. Wysocki
  2021-07-02 19:04                                     ` Lukasz Luba
  2021-07-02 19:17                                     ` Vincent Donnefort
  0 siblings, 2 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2021-07-02 17:53 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Viresh Kumar,
	Vincent Donnefort, Peter Zijlstra, Vincent Guittot,
	Quentin Perret, Linux PM, Ionela Voinescu, Dietmar Eggemann

On Fri, Jul 2, 2021 at 6:08 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 7/2/21 5:04 PM, Rafael J. Wysocki wrote:
> > On Fri, Jul 2, 2021 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>
> >> On Fri, Jul 2, 2021 at 4:21 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>
> >>> Hi Rafael,
> >>>
> >>> This is a gentle ping.
> >>> You have probably not seen this discussion thread.
> >>
> >> I have looked at it briefly for a few times, but not too much into detail.
> >>
> >>> On 6/16/21 1:45 PM, Lukasz Luba wrote:
> >>>>
> >>>>
> >>>> On 6/16/21 11:53 AM, Viresh Kumar wrote:
> >>>>> On 16-06-21, 11:33, Lukasz Luba wrote:
> >>>>>> On 6/16/21 10:31 AM, Viresh Kumar wrote:
> >>>>>>> On 16-06-21, 10:03, Lukasz Luba wrote:
> >>>>>>> Clean is not lesser number of lines for me, but rather having the
> >>>>>>> right ownership of such things.
> >>>>>>
> >>>>>> Some developers do like patches which removes more lines then adds ;)
> >>>>>
> >>>>> :)
> >>>>>
> >>>>
> >>>> [snip]
> >>>>
> >>>>>>
> >>>>>> What could be the modified v1 [2]:
> >>>>>> - LUT which holds two IDs: efficient, inefficient, take one
> >>>>>>     according to the clamp f_max
> >>>>>> - add new argument 'policy->max' to em_pd_get_efficient_freq()
> >>>>>>
> >>>>>> freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq,
> >>>>>> policy->max);
> >>>>>>
> >>>>>> The problem was that EAS couldn't know the clamp freq_max,
> >>>>>> which shouldn't be the blocker now.
> >>>>>
> >>>>> If you can do that without adding any EM specific stuff in the cpufreq
> >>>>> core, I will mostly be fine.
> >>>>>
> >>>>> But honestly speaking, creating more data structures to keep related
> >>>>> information doesn't scale well.
> >>>>>
> >>>>> We already have so many tables for keeping freq/voltage pairs, OPP,
> >>>>> cpufreq, EM. You tried to add one more in EM I think V1, not sure.
> >>>>>
> >>>>> It is always better to consolidate and we almost reached to a point
> >>>>> where that could have been done very easily. I understand that you
> >>>>> didn't want to touch so many different parts, but anyway..
> >>>>
> >>>> Yes, I don't want to touch some many generic parts because they
> >>>> are intended to be generic. This heuristic is only for EM platforms,
> >>>> which are arm, arm64 battery powered (not servers or other).
> >>>> Thus, I wanted to keep it locally. The cost of EM extra structures
> >>>> (the LUT) will only be for platforms for which EM discovers that
> >>>> they have inefficient performance levels.
> >>>> The code would even not be compiled in for x86, ppc, etc, in hot path.
> >>>>
> >>>>>>>> this v3 and your proposal.
> >>>>>>>
> >>>>>>> IMHO, adding such callbacks to the EM core, like .mark_efficient(),
> >>>>>>> will only make this easier to handle for all different frameworks, and
> >>>>>>> not otherwise. The code will look much cleaner everywhere..
> >>>>>>>
> >>>>>>
> >>>>>> What about coming back to the slightly modified v1 idea?
> >>>>>> That was really self-contained modification for this
> >>>>>> inefficient opps heuristic.
> >>>>>
> >>>>> I am not sure if I really understand what that would be, but again
> >>>>> adding another table is going to create more problems then it should.
> >>>>
> >>>> IMHO your proposals are more invasive for the generic code, while
> >>>> not always being even used. Plenty of architectures don't even set EM,
> >>>> even arm64 for servers doesn't use it. You and Rafael would have to
> >>>> maintain these modifications in generic code. It might be hard to remove
> >>>> it. While I recommend to keep this heuristic feature inside the EM and
> >>>> we will maintain it. If we decide after a few years that new arm64
> >>>> platforms use some smarter FW for performance level, we might just
> >>>> disable this heuristic.
> >>>>
> >>>>>
> >>>>> Anyway, that's my view, which can be wrong as well.
> >>>>>
> >>>>> Rafael: You have any suggestions here ?
> >>>>>
> >>>>
> >>>> I would also like to hear Rafael's opinion :)
> >>>
> >>> It would be great if you could have a look.
> >>> I will be grateful for your time spend on it and opinion.
> >>
> >> First of all, IMO checking whether or not a given frequency is
> >> "efficient" doesn't belong to cpufreq governors.  The governor knows
> >> what the min and max supported frequencies are and selects one from
> >> that range and note that it doesn't even check whether or not the
> >> selected frequency is present in the frequency table.  That part
> >> belongs to the driver or the general frequency table handling in the
> >> cpufreq core.
> >>
> >> So the governor doesn't care and it shouldn't care IMO.
> >>
> >> Besides, in the cpufreq_driver_fast_switch() case the driver's
> >> ->fast_switch() callback is entirely responsible for applying the
> >> selected frequency, so I'm not sure how this "efficient" thing is
> >> going to work then?
> >>
> >> Anyway, since we are talking about frequency tables, that would be the
> >> __cpufreq_driver_target() case only and specifically in the
> >> __target_index() case only (note how far away this is from the
> >> governor).
> >>
> >> Now, you may think about modifying cpufreq_frequency_table_target() to
> >> skip "inefficient" frequencies, but then the question is why those
> >> frequencies need to be there in the frequency table in the first
> >> place?
> >
> > I'm guessing that the problem is that cpufreq_cooling works by using
> > freq_qos_update_request() to update the max frequency limit and if
> > that is in effect you'd rather use the inefficient frequencies,
> > whereas when the governor selects an inefficient frequency  below the
> > policy limit, you'd rather use a higher-but-efficient frequency
> > instead (within the policy limit).
> >
> > Am I guessing correctly?
> >
>
> Yes, correct. Thermal would use all (efficient + inefficient), but
> we in cpufreq governor would like to pick if possible the efficient
> one (below the thermal limit).

To address that, you need to pass more information from schedutil to
__cpufreq_driver_target() that down the road can be used by
cpufreq_frequency_table_target() to decide whether or not to skip the
inefficient frequencies.

For example, you can define CPUFREQ_RELATION_EFFICIENT and pass it
from schedutil to __cpufreq_driver_target() in the "relation"
argument, and clear it if the target frequency is above the max policy
limit, or if ->target() is to be called.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-02 17:53                                   ` Rafael J. Wysocki
@ 2021-07-02 19:04                                     ` Lukasz Luba
  2021-07-02 19:17                                     ` Vincent Donnefort
  1 sibling, 0 replies; 45+ messages in thread
From: Lukasz Luba @ 2021-07-02 19:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Viresh Kumar, Vincent Donnefort,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann



On 7/2/21 6:53 PM, Rafael J. Wysocki wrote:
> On Fri, Jul 2, 2021 at 6:08 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 7/2/21 5:04 PM, Rafael J. Wysocki wrote:
>>> On Fri, Jul 2, 2021 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>>>
>>>> On Fri, Jul 2, 2021 at 4:21 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>>
>>>>> Hi Rafael,
>>>>>
>>>>> This is a gentle ping.
>>>>> You have probably not seen this discussion thread.
>>>>
>>>> I have looked at it briefly for a few times, but not too much into detail.
>>>>
>>>>> On 6/16/21 1:45 PM, Lukasz Luba wrote:
>>>>>>
>>>>>>
>>>>>> On 6/16/21 11:53 AM, Viresh Kumar wrote:
>>>>>>> On 16-06-21, 11:33, Lukasz Luba wrote:
>>>>>>>> On 6/16/21 10:31 AM, Viresh Kumar wrote:
>>>>>>>>> On 16-06-21, 10:03, Lukasz Luba wrote:
>>>>>>>>> Clean is not lesser number of lines for me, but rather having the
>>>>>>>>> right ownership of such things.
>>>>>>>>
>>>>>>>> Some developers do like patches which removes more lines then adds ;)
>>>>>>>
>>>>>>> :)
>>>>>>>
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>>>
>>>>>>>> What could be the modified v1 [2]:
>>>>>>>> - LUT which holds two IDs: efficient, inefficient, take one
>>>>>>>>      according to the clamp f_max
>>>>>>>> - add new argument 'policy->max' to em_pd_get_efficient_freq()
>>>>>>>>
>>>>>>>> freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq,
>>>>>>>> policy->max);
>>>>>>>>
>>>>>>>> The problem was that EAS couldn't know the clamp freq_max,
>>>>>>>> which shouldn't be the blocker now.
>>>>>>>
>>>>>>> If you can do that without adding any EM specific stuff in the cpufreq
>>>>>>> core, I will mostly be fine.
>>>>>>>
>>>>>>> But honestly speaking, creating more data structures to keep related
>>>>>>> information doesn't scale well.
>>>>>>>
>>>>>>> We already have so many tables for keeping freq/voltage pairs, OPP,
>>>>>>> cpufreq, EM. You tried to add one more in EM I think V1, not sure.
>>>>>>>
>>>>>>> It is always better to consolidate and we almost reached to a point
>>>>>>> where that could have been done very easily. I understand that you
>>>>>>> didn't want to touch so many different parts, but anyway..
>>>>>>
>>>>>> Yes, I don't want to touch some many generic parts because they
>>>>>> are intended to be generic. This heuristic is only for EM platforms,
>>>>>> which are arm, arm64 battery powered (not servers or other).
>>>>>> Thus, I wanted to keep it locally. The cost of EM extra structures
>>>>>> (the LUT) will only be for platforms for which EM discovers that
>>>>>> they have inefficient performance levels.
>>>>>> The code would even not be compiled in for x86, ppc, etc, in hot path.
>>>>>>
>>>>>>>>>> this v3 and your proposal.
>>>>>>>>>
>>>>>>>>> IMHO, adding such callbacks to the EM core, like .mark_efficient(),
>>>>>>>>> will only make this easier to handle for all different frameworks, and
>>>>>>>>> not otherwise. The code will look much cleaner everywhere..
>>>>>>>>>
>>>>>>>>
>>>>>>>> What about coming back to the slightly modified v1 idea?
>>>>>>>> That was really self-contained modification for this
>>>>>>>> inefficient opps heuristic.
>>>>>>>
>>>>>>> I am not sure if I really understand what that would be, but again
>>>>>>> adding another table is going to create more problems then it should.
>>>>>>
>>>>>> IMHO your proposals are more invasive for the generic code, while
>>>>>> not always being even used. Plenty of architectures don't even set EM,
>>>>>> even arm64 for servers doesn't use it. You and Rafael would have to
>>>>>> maintain these modifications in generic code. It might be hard to remove
>>>>>> it. While I recommend to keep this heuristic feature inside the EM and
>>>>>> we will maintain it. If we decide after a few years that new arm64
>>>>>> platforms use some smarter FW for performance level, we might just
>>>>>> disable this heuristic.
>>>>>>
>>>>>>>
>>>>>>> Anyway, that's my view, which can be wrong as well.
>>>>>>>
>>>>>>> Rafael: You have any suggestions here ?
>>>>>>>
>>>>>>
>>>>>> I would also like to hear Rafael's opinion :)
>>>>>
>>>>> It would be great if you could have a look.
>>>>> I will be grateful for your time spend on it and opinion.
>>>>
>>>> First of all, IMO checking whether or not a given frequency is
>>>> "efficient" doesn't belong to cpufreq governors.  The governor knows
>>>> what the min and max supported frequencies are and selects one from
>>>> that range and note that it doesn't even check whether or not the
>>>> selected frequency is present in the frequency table.  That part
>>>> belongs to the driver or the general frequency table handling in the
>>>> cpufreq core.
>>>>
>>>> So the governor doesn't care and it shouldn't care IMO.
>>>>
>>>> Besides, in the cpufreq_driver_fast_switch() case the driver's
>>>> ->fast_switch() callback is entirely responsible for applying the
>>>> selected frequency, so I'm not sure how this "efficient" thing is
>>>> going to work then?
>>>>
>>>> Anyway, since we are talking about frequency tables, that would be the
>>>> __cpufreq_driver_target() case only and specifically in the
>>>> __target_index() case only (note how far away this is from the
>>>> governor).
>>>>
>>>> Now, you may think about modifying cpufreq_frequency_table_target() to
>>>> skip "inefficient" frequencies, but then the question is why those
>>>> frequencies need to be there in the frequency table in the first
>>>> place?
>>>
>>> I'm guessing that the problem is that cpufreq_cooling works by using
>>> freq_qos_update_request() to update the max frequency limit and if
>>> that is in effect you'd rather use the inefficient frequencies,
>>> whereas when the governor selects an inefficient frequency  below the
>>> policy limit, you'd rather use a higher-but-efficient frequency
>>> instead (within the policy limit).
>>>
>>> Am I guessing correctly?
>>>
>>
>> Yes, correct. Thermal would use all (efficient + inefficient), but
>> we in cpufreq governor would like to pick if possible the efficient
>> one (below the thermal limit).
> 
> To address that, you need to pass more information from schedutil to
> __cpufreq_driver_target() that down the road can be used by
> cpufreq_frequency_table_target() to decide whether or not to skip the
> inefficient frequencies.
> 
> For example, you can define CPUFREQ_RELATION_EFFICIENT and pass it
> from schedutil to __cpufreq_driver_target() in the "relation"
> argument, and clear it if the target frequency is above the max policy
> limit, or if ->target() is to be called.
> 

Thank you Rafael for valuable comments. We will have to experiment
with that option and come back with implementation based on it.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-02 17:53                                   ` Rafael J. Wysocki
  2021-07-02 19:04                                     ` Lukasz Luba
@ 2021-07-02 19:17                                     ` Vincent Donnefort
  2021-07-05 14:09                                       ` Rafael J. Wysocki
  1 sibling, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-07-02 19:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Lukasz Luba, Rafael J. Wysocki, Viresh Kumar, Peter Zijlstra,
	Vincent Guittot, Quentin Perret, Linux PM, Ionela Voinescu,
	Dietmar Eggemann

[...]

> > >
> > > I'm guessing that the problem is that cpufreq_cooling works by using
> > > freq_qos_update_request() to update the max frequency limit and if
> > > that is in effect you'd rather use the inefficient frequencies,
> > > whereas when the governor selects an inefficient frequency  below the
> > > policy limit, you'd rather use a higher-but-efficient frequency
> > > instead (within the policy limit).
> > >
> > > Am I guessing correctly?
> > >
> >
> > Yes, correct. Thermal would use all (efficient + inefficient), but
> > we in cpufreq governor would like to pick if possible the efficient
> > one (below the thermal limit).
> 
> To address that, you need to pass more information from schedutil to
> __cpufreq_driver_target() that down the road can be used by
> cpufreq_frequency_table_target() to decide whether or not to skip the
> inefficient frequencies.
> 
> For example, you can define CPUFREQ_RELATION_EFFICIENT and pass it
> from schedutil to __cpufreq_driver_target() in the "relation"
> argument, and clear it if the target frequency is above the max policy
> limit, or if ->target() is to be called.

What about a cpufreq_policy option that if sets would make
cpufreq_frequency_table_target() skip inefficient OPPs while staying within
the limit of max policy? Each governor could decide to set it or not, but
it would hide the efficiency resolution to the governor and allow drivers
that implements ->target() to also implements support for inefficient OPPs.

That flag could be set according to a new cpufreq_governor flag
CPUFREQ_GOV_SKIP_INEFFICIENCIES?

That could though modify behaviors like powersave_bias from ondemand. But if
a frequency is inefficient, there's probably no power saving anyway.

-- 
Vincent

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-02 19:17                                     ` Vincent Donnefort
@ 2021-07-05 14:09                                       ` Rafael J. Wysocki
  2021-07-06  8:12                                         ` Vincent Donnefort
  0 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2021-07-05 14:09 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Lukasz Luba, Rafael J. Wysocki, Viresh Kumar,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann

On Fri, Jul 2, 2021 at 9:17 PM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> [...]
>
> > > >
> > > > I'm guessing that the problem is that cpufreq_cooling works by using
> > > > freq_qos_update_request() to update the max frequency limit and if
> > > > that is in effect you'd rather use the inefficient frequencies,
> > > > whereas when the governor selects an inefficient frequency  below the
> > > > policy limit, you'd rather use a higher-but-efficient frequency
> > > > instead (within the policy limit).
> > > >
> > > > Am I guessing correctly?
> > > >
> > >
> > > Yes, correct. Thermal would use all (efficient + inefficient), but
> > > we in cpufreq governor would like to pick if possible the efficient
> > > one (below the thermal limit).
> >
> > To address that, you need to pass more information from schedutil to
> > __cpufreq_driver_target() that down the road can be used by
> > cpufreq_frequency_table_target() to decide whether or not to skip the
> > inefficient frequencies.
> >
> > For example, you can define CPUFREQ_RELATION_EFFICIENT and pass it
> > from schedutil to __cpufreq_driver_target() in the "relation"
> > argument, and clear it if the target frequency is above the max policy
> > limit, or if ->target() is to be called.
>
> What about a cpufreq_policy option that if sets would make
> cpufreq_frequency_table_target() skip inefficient OPPs while staying within
> the limit of max policy?

That would work too, ->

> Each governor could decide to set it or not, but
> it would hide the efficiency resolution to the governor and allow drivers
> that implements ->target() to also implements support for inefficient OPPs.

-> but alternatively there could be an additional cpufreq driver flag
to be set by the drivers implementing ->target() and wanting to deal
with CPUFREQ_RELATION_EFFICIENT themselves (an opt-in of sorts).

So the governors that want it may pass CPUFREQ_RELATION_EFFICIENT to
__cpufreq_driver_target() and then it will be passed to ->target()
depending on whether or not the new driver flag is set.

> That flag could be set according to a new cpufreq_governor flag
> CPUFREQ_GOV_SKIP_INEFFICIENCIES?
>
> That could though modify behaviors like powersave_bias from ondemand. But if
> a frequency is inefficient, there's probably no power saving anyway.

AFAICS, the userspace governor aside, using inefficient frequencies
only works with the powersave governor.  In the other cases,
RELATION_L (say) can be interpreted as "the closest efficient
frequency equal to or above the target" with the max policy limit
possibly causing inefficient frequencies to be used if they are closer
to the limit than the next efficient one.

As a rule, the governors don't assume that there are any inefficient
frequencies in the table.  In fact, they don't make any assumptions
regarding the contents of the frequency table at all.  They don't even
assume that the driver uses a frequency table in the first place.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-05 14:09                                       ` Rafael J. Wysocki
@ 2021-07-06  8:12                                         ` Vincent Donnefort
  2021-07-06  8:37                                           ` Viresh Kumar
  2021-07-06 12:11                                           ` Rafael J. Wysocki
  0 siblings, 2 replies; 45+ messages in thread
From: Vincent Donnefort @ 2021-07-06  8:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Lukasz Luba, Rafael J. Wysocki, Viresh Kumar, Peter Zijlstra,
	Vincent Guittot, Quentin Perret, Linux PM, Ionela Voinescu,
	Dietmar Eggemann

[...]

> >
> > What about a cpufreq_policy option that if sets would make
> > cpufreq_frequency_table_target() skip inefficient OPPs while staying within
> > the limit of max policy?
> 
> That would work too, ->
> 
> > Each governor could decide to set it or not, but
> > it would hide the efficiency resolution to the governor and allow drivers
> > that implements ->target() to also implements support for inefficient OPPs.
> 
> -> but alternatively there could be an additional cpufreq driver flag
> to be set by the drivers implementing ->target() and wanting to deal
> with CPUFREQ_RELATION_EFFICIENT themselves (an opt-in of sorts).
> 
> So the governors that want it may pass CPUFREQ_RELATION_EFFICIENT to
> __cpufreq_driver_target() and then it will be passed to ->target()
> depending on whether or not the new driver flag is set.

Of course, I can implement this instead of a cpufreq_policy flag in v4.
I suppose then right fallback for CPUFREQ_RELATION_EFFICIENT in case the
driver doesn't opt-in is CPUFREQ_RELATION_L.

> 
> > That flag could be set according to a new cpufreq_governor flag
> > CPUFREQ_GOV_SKIP_INEFFICIENCIES?
> >
> > That could though modify behaviors like powersave_bias from ondemand. But if
> > a frequency is inefficient, there's probably no power saving anyway.
> 
> AFAICS, the userspace governor aside, using inefficient frequencies
> only works with the powersave governor.  In the other cases,
> RELATION_L (say) can be interpreted as "the closest efficient
> frequency equal to or above the target" with the max policy limit
> possibly causing inefficient frequencies to be used if they are closer
> to the limit than the next efficient one.
> 
> As a rule, the governors don't assume that there are any inefficient
> frequencies in the table.  In fact, they don't make any assumptions
> regarding the contents of the frequency table at all.  They don't even
> assume that the driver uses a frequency table in the first place.

So all the governors, beside powersave and userspace would replace their
RELATION_L with RELATION_EFFICIENT. I'll add the changes in v4.

So if I sum-up: new RELATION_EFFICIENT that resolves RELATION_L to an higher
efficient frequency (if necessary) within the limits of policy->max.
CPUfreq drivers can opt-in by setting an appropriate flag. If they do not,
RELATION_EFFICIENT will be rewritten in RELATION_L. All governors but userspace
and powersave would use RELATION_EFFICIENT instead of RELATION_L.

If that works for you, I'll implement this in a v4, as well as some
improvements for the CPUfreq/EM registration following the discussion with
Viresh.

-- 
Vincent

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-06  8:12                                         ` Vincent Donnefort
@ 2021-07-06  8:37                                           ` Viresh Kumar
  2021-07-06  8:43                                             ` Vincent Donnefort
  2021-07-06 12:11                                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2021-07-06  8:37 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Lukasz Luba, Rafael J. Wysocki,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann

On 06-07-21, 09:12, Vincent Donnefort wrote:
> Of course, I can implement this instead of a cpufreq_policy flag in v4.

Why should this ever be a policy flag and not governor's flag ?

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-06  8:37                                           ` Viresh Kumar
@ 2021-07-06  8:43                                             ` Vincent Donnefort
  2021-07-06  8:50                                               ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Vincent Donnefort @ 2021-07-06  8:43 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Lukasz Luba, Rafael J. Wysocki,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann

On Tue, Jul 06, 2021 at 02:07:41PM +0530, Viresh Kumar wrote:
> On 06-07-21, 09:12, Vincent Donnefort wrote:
> > Of course, I can implement this instead of a cpufreq_policy flag in v4.
> 
> Why should this ever be a policy flag and not governor's flag ?

I was referring to how letting know a driver which registers ->target() that we
want or not inefficient frequencies. My proposal was to rely on a
cpufreq_policy's flag that the driver can read. 

> 
> -- 
> viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-06  8:43                                             ` Vincent Donnefort
@ 2021-07-06  8:50                                               ` Viresh Kumar
  0 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2021-07-06  8:50 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Lukasz Luba, Rafael J. Wysocki,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann

On 06-07-21, 09:43, Vincent Donnefort wrote:
> I was referring to how letting know a driver which registers ->target() that we
> want or not inefficient frequencies. My proposal was to rely on a
> cpufreq_policy's flag that the driver can read. 

I am bit confused at this point, lets see how it looks eventually as patches :)

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies
  2021-07-06  8:12                                         ` Vincent Donnefort
  2021-07-06  8:37                                           ` Viresh Kumar
@ 2021-07-06 12:11                                           ` Rafael J. Wysocki
  1 sibling, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2021-07-06 12:11 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Rafael J. Wysocki, Lukasz Luba, Rafael J. Wysocki, Viresh Kumar,
	Peter Zijlstra, Vincent Guittot, Quentin Perret, Linux PM,
	Ionela Voinescu, Dietmar Eggemann

On Tue, Jul 6, 2021 at 10:13 AM Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> [...]
>
> > >
> > > What about a cpufreq_policy option that if sets would make
> > > cpufreq_frequency_table_target() skip inefficient OPPs while staying within
> > > the limit of max policy?
> >
> > That would work too, ->
> >
> > > Each governor could decide to set it or not, but
> > > it would hide the efficiency resolution to the governor and allow drivers
> > > that implements ->target() to also implements support for inefficient OPPs.
> >
> > -> but alternatively there could be an additional cpufreq driver flag
> > to be set by the drivers implementing ->target() and wanting to deal
> > with CPUFREQ_RELATION_EFFICIENT themselves (an opt-in of sorts).
> >
> > So the governors that want it may pass CPUFREQ_RELATION_EFFICIENT to
> > __cpufreq_driver_target() and then it will be passed to ->target()
> > depending on whether or not the new driver flag is set.
>
> Of course, I can implement this instead of a cpufreq_policy flag in v4.
> I suppose then right fallback for CPUFREQ_RELATION_EFFICIENT in case the
> driver doesn't opt-in is CPUFREQ_RELATION_L.
>
> >
> > > That flag could be set according to a new cpufreq_governor flag
> > > CPUFREQ_GOV_SKIP_INEFFICIENCIES?
> > >
> > > That could though modify behaviors like powersave_bias from ondemand. But if
> > > a frequency is inefficient, there's probably no power saving anyway.
> >
> > AFAICS, the userspace governor aside, using inefficient frequencies
> > only works with the powersave governor.  In the other cases,
> > RELATION_L (say) can be interpreted as "the closest efficient
> > frequency equal to or above the target" with the max policy limit
> > possibly causing inefficient frequencies to be used if they are closer
> > to the limit than the next efficient one.
> >
> > As a rule, the governors don't assume that there are any inefficient
> > frequencies in the table.  In fact, they don't make any assumptions
> > regarding the contents of the frequency table at all.  They don't even
> > assume that the driver uses a frequency table in the first place.
>
> So all the governors, beside powersave and userspace would replace their
> RELATION_L with RELATION_EFFICIENT. I'll add the changes in v4.
>
> So if I sum-up: new RELATION_EFFICIENT that resolves RELATION_L to an higher
> efficient frequency (if necessary) within the limits of policy->max.

Yes.

It can be called RELATION_E for brevity.

> CPUfreq drivers can opt-in by setting an appropriate flag. If they do not,
> RELATION_EFFICIENT will be rewritten in RELATION_L.

Yes, and cpufreq_frequency_table_target() will take RELATION_E into
account if set.

> All governors but userspace and powersave would use RELATION_EFFICIENT instead of RELATION_L.

Yes.

> If that works for you, I'll implement this in a v4, as well as some
> improvements for the CPUfreq/EM registration following the discussion with
> Viresh.

Sounds good, thanks!

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-07-06 12:35 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04 11:05 [PATCH v3 0/6] EM / PM: Inefficient OPPs Vincent Donnefort
2021-06-04 11:05 ` [PATCH v3 1/6] PM / EM: Fix inefficient states detection Vincent Donnefort
2021-06-04 18:09   ` Matthias Kaehlcke
2021-06-04 11:05 ` [PATCH v3 2/6] PM / EM: Mark inefficient states Vincent Donnefort
2021-06-04 18:12   ` Matthias Kaehlcke
2021-06-04 11:05 ` [PATCH v3 3/6] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
2021-06-04 18:19   ` Matthias Kaehlcke
2021-06-14 13:40     ` Vincent Donnefort
2021-06-07  5:02   ` Viresh Kumar
2021-06-07 10:14     ` Lukasz Luba
2021-06-14  7:28   ` Viresh Kumar
2021-06-14 13:35     ` Vincent Donnefort
2021-06-15  5:02       ` Viresh Kumar
2021-06-15  8:44         ` Vincent Donnefort
2021-06-15 10:17           ` Viresh Kumar
2021-06-15 17:15             ` Vincent Donnefort
2021-06-16  7:35               ` Viresh Kumar
2021-06-16  9:03                 ` Lukasz Luba
2021-06-16  9:31                   ` Viresh Kumar
2021-06-16 10:33                     ` Lukasz Luba
2021-06-16 10:53                       ` Viresh Kumar
2021-06-16 12:45                         ` Lukasz Luba
2021-07-02 14:21                           ` Lukasz Luba
2021-07-02 15:46                             ` Rafael J. Wysocki
2021-07-02 16:04                               ` Rafael J. Wysocki
2021-07-02 16:08                                 ` Lukasz Luba
2021-07-02 17:53                                   ` Rafael J. Wysocki
2021-07-02 19:04                                     ` Lukasz Luba
2021-07-02 19:17                                     ` Vincent Donnefort
2021-07-05 14:09                                       ` Rafael J. Wysocki
2021-07-06  8:12                                         ` Vincent Donnefort
2021-07-06  8:37                                           ` Viresh Kumar
2021-07-06  8:43                                             ` Vincent Donnefort
2021-07-06  8:50                                               ` Viresh Kumar
2021-07-06 12:11                                           ` Rafael J. Wysocki
2021-07-02 16:13                               ` Vincent Donnefort
2021-07-02 17:38                                 ` Rafael J. Wysocki
2021-06-22  9:01             ` Quentin Perret
2021-06-22  9:25               ` Viresh Kumar
2021-06-04 11:05 ` [PATCH v3 4/6] cpufreq: Skip inefficient frequencies in cpufreq_driver_resolve_freq() Vincent Donnefort
2021-06-04 18:25   ` Matthias Kaehlcke
2021-06-04 11:06 ` [PATCH v3 5/6] cpufreq: Mark inefficient frequencies using the Energy Model Vincent Donnefort
2021-06-04 18:35   ` Matthias Kaehlcke
2021-06-04 11:06 ` [PATCH v3 6/6] PM / EM: Skip inefficient states Vincent Donnefort
2021-06-04 18:49   ` Matthias Kaehlcke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.