Linux-PM Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware
@ 2019-05-08 17:42 douglas.raillard
  2019-05-08 17:42 ` douglas.raillard
                   ` (8 more replies)
  0 siblings, 9 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Make schedutil cpufreq governor energy-aware.

- patch 1 introduces a function to retrieve a frequency given a base
  frequency and an energy cost margin.
- patch 2 links Energy Model perf_domain to sugov_policy.
- patch 3 updates get_next_freq() to make use of the Energy Model.
- patch 4 moves a static function around in cpufreq_schedutil.c to make
  it available for subsequent patch.
- patch 5 updates sugov_cpu_is_busy() to make it useable for shared
  cpufreq policies.
- patch 6 improves sugov_cpu_is_busy() to avoid some pitfalls when used
  from shared policies.
- patch 7 makes use of sugov_cpu_is_busy() for frequency selection of
  shared cpufreq policies.

The benefits of using the EM in schedutil are twofold:

1) Selecting the highest possible frequency for a given cost. Some
   platforms can have lower frequencies that are less efficient than
   higher ones, in which case they should be skipped for most purposes.
   They can still be useful to give more freedom to thermal throttling
   mechanisms, but not under normal circumstances.
   note: the EM framework will warn about such OPPs "hertz/watts ratio
   non-monotonically decreasing"

2) Drive the frequency selection with power in mind, in addition to
   maximizing the utilization of the non-idle CPUs in the system.

Point 1) is implemented in "PM: Introduce em_pd_get_higher_freq()" and
enabled in schedutil by
"sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq()".

Point 2) is enabled in
"sched/cpufreq: Boost schedutil frequency ramp up". It allows using
higher frequencies when a CPU is "busy" in the policy, reusing the
existing sugov_cpu_is_busy() schedutil heuristic.  "busy" is defined
here as not having any idle time since the last increase in frequency.
The benefits of that are:

* Boosting the frequency when it (seems) needed by a CPU to finish its
  allocated work. That shortens the frequency ramp up duration, which in
  turns allows the utilization signal to reach stable values quicker.
  Since the allowed frequency boost is bounded in energy, it will behave
  consistently across platforms, regardless of the OPP cost range.

* The boost is only transient, and should not impact a lot the energy
  consumed of workloads with very stable utilization signals.


This has been ligthly tested with a rtapp task ramping from 10% to 75%
utilisation on a big core. Improvements in frequency ramp-up time are
somehow diminished by the shape of the utilisation signal, which gives a
big oscillation to the signal after a fast ramp up with idle time. It
however improves the time it takes to reach the final frequency, but
some activations are still missed due to strong frequency decrease right
after ramping up.

Douglas RAILLARD (7):
  PM: Introduce em_pd_get_higher_freq()
  sched/cpufreq: Attach perf domain to sugov policy
  sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq()
  sched/cpufreq: Move up sugov_cpu_is_busy()
  sched/cpufreq: sugov_cpu_is_busy for shared policy
  sched/cpufreq: Improve sugov_cpu_is_busy accuracy
  sched/cpufreq: Boost schedutil frequency ramp up

 include/linux/energy_model.h     |  48 +++++++++++
 kernel/sched/cpufreq_schedutil.c | 136 +++++++++++++++++++++++++++----
 2 files changed, 166 insertions(+), 18 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware
  2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
@ 2019-05-08 17:42 ` douglas.raillard
  2019-05-08 17:42 ` [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq() douglas.raillard
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Make schedutil cpufreq governor energy-aware.

- patch 1 introduces a function to retrieve a frequency given a base
  frequency and an energy cost margin.
- patch 2 links Energy Model perf_domain to sugov_policy.
- patch 3 updates get_next_freq() to make use of the Energy Model.
- patch 4 moves a static function around in cpufreq_schedutil.c to make
  it available for subsequent patch.
- patch 5 updates sugov_cpu_is_busy() to make it useable for shared
  cpufreq policies.
- patch 6 improves sugov_cpu_is_busy() to avoid some pitfalls when used
  from shared policies.
- patch 7 makes use of sugov_cpu_is_busy() for frequency selection of
  shared cpufreq policies.

The benefits of using the EM in schedutil are twofold:

1) Selecting the highest possible frequency for a given cost. Some
   platforms can have lower frequencies that are less efficient than
   higher ones, in which case they should be skipped for most purposes.
   They can still be useful to give more freedom to thermal throttling
   mechanisms, but not under normal circumstances.
   note: the EM framework will warn about such OPPs "hertz/watts ratio
   non-monotonically decreasing"

2) Drive the frequency selection with power in mind, in addition to
   maximizing the utilization of the non-idle CPUs in the system.

Point 1) is implemented in "PM: Introduce em_pd_get_higher_freq()" and
enabled in schedutil by
"sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq()".

Point 2) is enabled in
"sched/cpufreq: Boost schedutil frequency ramp up". It allows using
higher frequencies when a CPU is "busy" in the policy, reusing the
existing sugov_cpu_is_busy() schedutil heuristic.  "busy" is defined
here as not having any idle time since the last increase in frequency.
The benefits of that are:

* Boosting the frequency when it (seems) needed by a CPU to finish its
  allocated work. That shortens the frequency ramp up duration, which in
  turns allows the utilization signal to reach stable values quicker.
  Since the allowed frequency boost is bounded in energy, it will behave
  consistently across platforms, regardless of the OPP cost range.

* The boost is only transient, and should not impact a lot the energy
  consumed of workloads with very stable utilization signals.


This has been ligthly tested with a rtapp task ramping from 10% to 75%
utilisation on a big core. Improvements in frequency ramp-up time are
somehow diminished by the shape of the utilisation signal, which gives a
big oscillation to the signal after a fast ramp up with idle time. It
however improves the time it takes to reach the final frequency, but
some activations are still missed due to strong frequency decrease right
after ramping up.

Douglas RAILLARD (7):
  PM: Introduce em_pd_get_higher_freq()
  sched/cpufreq: Attach perf domain to sugov policy
  sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq()
  sched/cpufreq: Move up sugov_cpu_is_busy()
  sched/cpufreq: sugov_cpu_is_busy for shared policy
  sched/cpufreq: Improve sugov_cpu_is_busy accuracy
  sched/cpufreq: Boost schedutil frequency ramp up

 include/linux/energy_model.h     |  48 +++++++++++
 kernel/sched/cpufreq_schedutil.c | 136 +++++++++++++++++++++++++++----
 2 files changed, 166 insertions(+), 18 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
  2019-05-08 17:42 ` douglas.raillard
@ 2019-05-08 17:42 ` douglas.raillard
  2019-05-08 17:42   ` douglas.raillard
  2019-05-16 12:42   ` Patrick Bellasi
  2019-05-08 17:42 ` [RFC PATCH 2/7] sched/cpufreq: Attach perf domain to sugov policy douglas.raillard
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

em_pd_get_higher_freq() returns a frequency greater or equal to the
provided one while taking into account a given cost margin. It also
skips inefficient OPPs that have a higher cost than another one with a
higher frequency.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 include/linux/energy_model.h | 48 ++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index aa027f7bcb3e..797b4e0f758c 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -159,6 +159,48 @@ static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
 	return pd->nr_cap_states;
 }
 
+/**
+ * em_pd_get_higher_freq() - Get the highest frequency that does not exceed the
+ * given cost margin compared to min_freq
+ * @pd		: performance domain for which this must be done
+ * @min_freq	: minimum frequency to return
+ * @cost_margin	: allowed margin compared to min_freq, as a per-1024 value.
+ *
+ * Return: the chosen frequency, guaranteed to be at least as high as min_freq.
+ */
+static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
+	unsigned long min_freq, unsigned long cost_margin)
+{
+	unsigned long max_cost = 0;
+	struct em_cap_state *cs;
+	int i;
+
+	if (!pd)
+		return min_freq;
+
+	/* Compute the maximum allowed cost */
+	for (i = 0; i < pd->nr_cap_states; i++) {
+		cs = &pd->table[i];
+		if (cs->frequency >= min_freq) {
+			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
+			break;
+		}
+	}
+
+	/* Find the highest frequency that will not exceed the cost margin */
+	for (i = pd->nr_cap_states-1; i >= 0; i--) {
+		cs = &pd->table[i];
+		if (cs->cost <= max_cost)
+			return cs->frequency;
+	}
+
+	/*
+	 * We should normally never reach here, unless min_freq was higher than
+	 * the highest available frequency, which is not expected to happen.
+	 */
+	return min_freq;
+}
+
 #else
 struct em_perf_domain {};
 struct em_data_callback {};
@@ -182,6 +224,12 @@ static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
 {
 	return 0;
 }
+
+static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
+	unsigned long min_freq, unsigned long cost_margin)
+{
+	return min_freq;
+}
 #endif
 
 #endif
-- 
2.21.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-08 17:42 ` [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq() douglas.raillard
@ 2019-05-08 17:42   ` douglas.raillard
  2019-05-16 12:42   ` Patrick Bellasi
  1 sibling, 0 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

em_pd_get_higher_freq() returns a frequency greater or equal to the
provided one while taking into account a given cost margin. It also
skips inefficient OPPs that have a higher cost than another one with a
higher frequency.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 include/linux/energy_model.h | 48 ++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index aa027f7bcb3e..797b4e0f758c 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -159,6 +159,48 @@ static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
 	return pd->nr_cap_states;
 }
 
+/**
+ * em_pd_get_higher_freq() - Get the highest frequency that does not exceed the
+ * given cost margin compared to min_freq
+ * @pd		: performance domain for which this must be done
+ * @min_freq	: minimum frequency to return
+ * @cost_margin	: allowed margin compared to min_freq, as a per-1024 value.
+ *
+ * Return: the chosen frequency, guaranteed to be at least as high as min_freq.
+ */
+static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
+	unsigned long min_freq, unsigned long cost_margin)
+{
+	unsigned long max_cost = 0;
+	struct em_cap_state *cs;
+	int i;
+
+	if (!pd)
+		return min_freq;
+
+	/* Compute the maximum allowed cost */
+	for (i = 0; i < pd->nr_cap_states; i++) {
+		cs = &pd->table[i];
+		if (cs->frequency >= min_freq) {
+			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
+			break;
+		}
+	}
+
+	/* Find the highest frequency that will not exceed the cost margin */
+	for (i = pd->nr_cap_states-1; i >= 0; i--) {
+		cs = &pd->table[i];
+		if (cs->cost <= max_cost)
+			return cs->frequency;
+	}
+
+	/*
+	 * We should normally never reach here, unless min_freq was higher than
+	 * the highest available frequency, which is not expected to happen.
+	 */
+	return min_freq;
+}
+
 #else
 struct em_perf_domain {};
 struct em_data_callback {};
@@ -182,6 +224,12 @@ static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
 {
 	return 0;
 }
+
+static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
+	unsigned long min_freq, unsigned long cost_margin)
+{
+	return min_freq;
+}
 #endif
 
 #endif
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 2/7] sched/cpufreq: Attach perf domain to sugov policy
  2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
  2019-05-08 17:42 ` douglas.raillard
  2019-05-08 17:42 ` [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq() douglas.raillard
@ 2019-05-08 17:42 ` douglas.raillard
  2019-05-08 17:42   ` douglas.raillard
  2019-05-08 17:42 ` [RFC PATCH 3/7] sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq() douglas.raillard
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Attach an Energy Model perf_domain to each sugov_policy to prepare the
ground for energy-aware schedutil.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 39 ++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index d0f4d2111bfe..82dbacf83649 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -39,6 +39,10 @@ struct sugov_policy {
 	bool			work_in_progress;
 
 	bool			need_freq_update;
+
+#ifdef CONFIG_ENERGY_MODEL
+	struct em_perf_domain *pd;
+#endif
 };
 
 struct sugov_cpu {
@@ -64,6 +68,38 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
 
 /************************ Governor internals ***********************/
 
+#ifdef CONFIG_ENERGY_MODEL
+static void sugov_policy_attach_pd(struct sugov_policy *sg_policy)
+{
+	struct em_perf_domain *pd;
+	struct cpufreq_policy *policy = sg_policy->policy;
+
+	sg_policy->pd = NULL;
+	pd = em_cpu_get(policy->cpu);
+	if (!pd)
+		return;
+
+	if (cpumask_equal(policy->related_cpus, to_cpumask(pd->cpus)))
+		sg_policy->pd = pd;
+	else
+		pr_warn("%s: Not all CPUs in schedutil policy %u share the same perf domain, no perf domain for that policy will be registered\n",
+			__func__, policy->cpu);
+}
+
+static struct em_perf_domain *sugov_policy_get_pd(
+						struct sugov_policy *sg_policy)
+{
+	return sg_policy->pd;
+}
+#else /* CONFIG_ENERGY_MODEL */
+static void sugov_policy_attach_pd(struct sugov_policy *sg_policy) {}
+static struct em_perf_domain *sugov_policy_get_pd(
+						struct sugov_policy *sg_policy)
+{
+	return NULL;
+}
+#endif /* CONFIG_ENERGY_MODEL */
+
 static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 {
 	s64 delta_ns;
@@ -849,6 +885,9 @@ static int sugov_start(struct cpufreq_policy *policy)
 							sugov_update_shared :
 							sugov_update_single);
 	}
+
+	sugov_policy_attach_pd(sg_policy);
+
 	return 0;
 }
 
-- 
2.21.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 2/7] sched/cpufreq: Attach perf domain to sugov policy
  2019-05-08 17:42 ` [RFC PATCH 2/7] sched/cpufreq: Attach perf domain to sugov policy douglas.raillard
@ 2019-05-08 17:42   ` douglas.raillard
  0 siblings, 0 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Attach an Energy Model perf_domain to each sugov_policy to prepare the
ground for energy-aware schedutil.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 39 ++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index d0f4d2111bfe..82dbacf83649 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -39,6 +39,10 @@ struct sugov_policy {
 	bool			work_in_progress;
 
 	bool			need_freq_update;
+
+#ifdef CONFIG_ENERGY_MODEL
+	struct em_perf_domain *pd;
+#endif
 };
 
 struct sugov_cpu {
@@ -64,6 +68,38 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
 
 /************************ Governor internals ***********************/
 
+#ifdef CONFIG_ENERGY_MODEL
+static void sugov_policy_attach_pd(struct sugov_policy *sg_policy)
+{
+	struct em_perf_domain *pd;
+	struct cpufreq_policy *policy = sg_policy->policy;
+
+	sg_policy->pd = NULL;
+	pd = em_cpu_get(policy->cpu);
+	if (!pd)
+		return;
+
+	if (cpumask_equal(policy->related_cpus, to_cpumask(pd->cpus)))
+		sg_policy->pd = pd;
+	else
+		pr_warn("%s: Not all CPUs in schedutil policy %u share the same perf domain, no perf domain for that policy will be registered\n",
+			__func__, policy->cpu);
+}
+
+static struct em_perf_domain *sugov_policy_get_pd(
+						struct sugov_policy *sg_policy)
+{
+	return sg_policy->pd;
+}
+#else /* CONFIG_ENERGY_MODEL */
+static void sugov_policy_attach_pd(struct sugov_policy *sg_policy) {}
+static struct em_perf_domain *sugov_policy_get_pd(
+						struct sugov_policy *sg_policy)
+{
+	return NULL;
+}
+#endif /* CONFIG_ENERGY_MODEL */
+
 static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 {
 	s64 delta_ns;
@@ -849,6 +885,9 @@ static int sugov_start(struct cpufreq_policy *policy)
 							sugov_update_shared :
 							sugov_update_single);
 	}
+
+	sugov_policy_attach_pd(sg_policy);
+
 	return 0;
 }
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 3/7] sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq()
  2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
                   ` (2 preceding siblings ...)
  2019-05-08 17:42 ` [RFC PATCH 2/7] sched/cpufreq: Attach perf domain to sugov policy douglas.raillard
@ 2019-05-08 17:42 ` douglas.raillard
  2019-05-08 17:42   ` douglas.raillard
  2019-05-08 17:42 ` [RFC PATCH 4/7] sched/cpufreq: Move up sugov_cpu_is_busy() douglas.raillard
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Choose the highest OPP for a given energy cost, allowing to skip lower
frequencies that would not be cheaper in terms of consumed power. These
frequencies can still be interesting to keep in the energy model to give
more freedom to thermal throttling, but should not be selected under
normal circumstances.

This also prepares the ground for energy-aware frequency boosting.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 82dbacf83649..f4173a7c0f01 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -10,6 +10,7 @@
 
 #include "sched.h"
 
+#include <linux/energy_model.h>
 #include <linux/sched/cpufreq.h>
 #include <trace/events/power.h>
 
@@ -200,9 +201,19 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned int freq = arch_scale_freq_invariant() ?
 				policy->cpuinfo.max_freq : policy->cur;
+	struct em_perf_domain *pd = sugov_policy_get_pd(sg_policy);
+
+	/* Maximum power we are ready to spend. */
+	unsigned int cost_margin = 0;
 
 	freq = map_util_freq(util, freq, max);
 
+	/*
+	 * Try to get a higher frequency if one is available, given the extra
+	 * power we are ready to spend.
+	 */
+	freq = em_pd_get_higher_freq(pd, freq, cost_margin);
+
 	if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
 		return sg_policy->next_freq;
 
-- 
2.21.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 3/7] sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq()
  2019-05-08 17:42 ` [RFC PATCH 3/7] sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq() douglas.raillard
@ 2019-05-08 17:42   ` douglas.raillard
  0 siblings, 0 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Choose the highest OPP for a given energy cost, allowing to skip lower
frequencies that would not be cheaper in terms of consumed power. These
frequencies can still be interesting to keep in the energy model to give
more freedom to thermal throttling, but should not be selected under
normal circumstances.

This also prepares the ground for energy-aware frequency boosting.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 82dbacf83649..f4173a7c0f01 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -10,6 +10,7 @@
 
 #include "sched.h"
 
+#include <linux/energy_model.h>
 #include <linux/sched/cpufreq.h>
 #include <trace/events/power.h>
 
@@ -200,9 +201,19 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned int freq = arch_scale_freq_invariant() ?
 				policy->cpuinfo.max_freq : policy->cur;
+	struct em_perf_domain *pd = sugov_policy_get_pd(sg_policy);
+
+	/* Maximum power we are ready to spend. */
+	unsigned int cost_margin = 0;
 
 	freq = map_util_freq(util, freq, max);
 
+	/*
+	 * Try to get a higher frequency if one is available, given the extra
+	 * power we are ready to spend.
+	 */
+	freq = em_pd_get_higher_freq(pd, freq, cost_margin);
+
 	if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
 		return sg_policy->next_freq;
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 4/7] sched/cpufreq: Move up sugov_cpu_is_busy()
  2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
                   ` (3 preceding siblings ...)
  2019-05-08 17:42 ` [RFC PATCH 3/7] sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq() douglas.raillard
@ 2019-05-08 17:42 ` douglas.raillard
  2019-05-08 17:42   ` douglas.raillard
  2019-05-08 17:42 ` [RFC PATCH 5/7] sched/cpufreq: sugov_cpu_is_busy for shared policy douglas.raillard
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Move sugov_cpu_is_busy() static function of cpufreq_schedutil.c higher
in the file so it can be used by other functions.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index f4173a7c0f01..a52c66559321 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -173,6 +173,19 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
 	}
 }
 
+#ifdef CONFIG_NO_HZ_COMMON
+static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
+{
+	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
+	bool ret = idle_calls == sg_cpu->saved_idle_calls;
+
+	sg_cpu->saved_idle_calls = idle_calls;
+	return ret;
+}
+#else
+static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
+#endif /* CONFIG_NO_HZ_COMMON */
+
 /**
  * get_next_freq - Compute a new frequency for a given cpufreq policy.
  * @sg_policy: schedutil policy object to compute the new frequency for.
@@ -462,19 +475,6 @@ static unsigned long sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time,
 	return max(boost, util);
 }
 
-#ifdef CONFIG_NO_HZ_COMMON
-static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
-{
-	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
-	bool ret = idle_calls == sg_cpu->saved_idle_calls;
-
-	sg_cpu->saved_idle_calls = idle_calls;
-	return ret;
-}
-#else
-static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
-#endif /* CONFIG_NO_HZ_COMMON */
-
 /*
  * Make sugov_should_update_freq() ignore the rate limit when DL
  * has increased the utilization.
-- 
2.21.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 4/7] sched/cpufreq: Move up sugov_cpu_is_busy()
  2019-05-08 17:42 ` [RFC PATCH 4/7] sched/cpufreq: Move up sugov_cpu_is_busy() douglas.raillard
@ 2019-05-08 17:42   ` douglas.raillard
  0 siblings, 0 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Move sugov_cpu_is_busy() static function of cpufreq_schedutil.c higher
in the file so it can be used by other functions.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index f4173a7c0f01..a52c66559321 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -173,6 +173,19 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
 	}
 }
 
+#ifdef CONFIG_NO_HZ_COMMON
+static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
+{
+	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
+	bool ret = idle_calls == sg_cpu->saved_idle_calls;
+
+	sg_cpu->saved_idle_calls = idle_calls;
+	return ret;
+}
+#else
+static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
+#endif /* CONFIG_NO_HZ_COMMON */
+
 /**
  * get_next_freq - Compute a new frequency for a given cpufreq policy.
  * @sg_policy: schedutil policy object to compute the new frequency for.
@@ -462,19 +475,6 @@ static unsigned long sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time,
 	return max(boost, util);
 }
 
-#ifdef CONFIG_NO_HZ_COMMON
-static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
-{
-	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
-	bool ret = idle_calls == sg_cpu->saved_idle_calls;
-
-	sg_cpu->saved_idle_calls = idle_calls;
-	return ret;
-}
-#else
-static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
-#endif /* CONFIG_NO_HZ_COMMON */
-
 /*
  * Make sugov_should_update_freq() ignore the rate limit when DL
  * has increased the utilization.
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 5/7] sched/cpufreq: sugov_cpu_is_busy for shared policy
  2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
                   ` (4 preceding siblings ...)
  2019-05-08 17:42 ` [RFC PATCH 4/7] sched/cpufreq: Move up sugov_cpu_is_busy() douglas.raillard
@ 2019-05-08 17:42 ` douglas.raillard
  2019-05-08 17:42   ` douglas.raillard
  2019-05-08 17:43 ` [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy douglas.raillard
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Allow using sugov_cpu_is_busy() from sugov_update_shared(). This means
that the heuristic needs to return stable results across multiple calls
for a given CPU, even if there has been no utilization change since last
call.

sugov_cpu_is_busy() currently both checks business status and updates
the counters, so let's decouple the two actions.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index a52c66559321..a12b7e5bc028 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -178,12 +178,17 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
 {
 	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
 	bool ret = idle_calls == sg_cpu->saved_idle_calls;
+	return ret;
+}
 
+static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu)
+{
+	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
 	sg_cpu->saved_idle_calls = idle_calls;
-	return ret;
 }
 #else
 static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
+static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu) {}
 #endif /* CONFIG_NO_HZ_COMMON */
 
 /**
@@ -503,6 +508,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 		return;
 
 	busy = sugov_cpu_is_busy(sg_cpu);
+	sugov_cpu_is_busy_update(sg_cpu);
 
 	util = sugov_get_util(sg_cpu);
 	max = sg_cpu->max;
-- 
2.21.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 5/7] sched/cpufreq: sugov_cpu_is_busy for shared policy
  2019-05-08 17:42 ` [RFC PATCH 5/7] sched/cpufreq: sugov_cpu_is_busy for shared policy douglas.raillard
@ 2019-05-08 17:42   ` douglas.raillard
  0 siblings, 0 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Allow using sugov_cpu_is_busy() from sugov_update_shared(). This means
that the heuristic needs to return stable results across multiple calls
for a given CPU, even if there has been no utilization change since last
call.

sugov_cpu_is_busy() currently both checks business status and updates
the counters, so let's decouple the two actions.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index a52c66559321..a12b7e5bc028 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -178,12 +178,17 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
 {
 	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
 	bool ret = idle_calls == sg_cpu->saved_idle_calls;
+	return ret;
+}
 
+static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu)
+{
+	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
 	sg_cpu->saved_idle_calls = idle_calls;
-	return ret;
 }
 #else
 static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
+static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu) {}
 #endif /* CONFIG_NO_HZ_COMMON */
 
 /**
@@ -503,6 +508,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 		return;
 
 	busy = sugov_cpu_is_busy(sg_cpu);
+	sugov_cpu_is_busy_update(sg_cpu);
 
 	util = sugov_get_util(sg_cpu);
 	max = sg_cpu->max;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy
  2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
                   ` (5 preceding siblings ...)
  2019-05-08 17:42 ` [RFC PATCH 5/7] sched/cpufreq: sugov_cpu_is_busy for shared policy douglas.raillard
@ 2019-05-08 17:43 ` douglas.raillard
  2019-05-08 17:43   ` douglas.raillard
  2019-05-16 12:55   ` Patrick Bellasi
  2019-05-08 17:43 ` [RFC PATCH 7/7] sched/cpufreq: Boost schedutil frequency ramp up douglas.raillard
  2019-05-13  7:12 ` [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware Viresh Kumar
  8 siblings, 2 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Avoid assuming a CPU is busy when it has begun being idle before
get_next_freq() is called. This is achieved by making sure the CPU will
not be detected as busy by other CPUs whenever its utilization is
decreasing.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 42 ++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index a12b7e5bc028..ce4b90cafbb5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -62,6 +62,7 @@ struct sugov_cpu {
 	/* The field below is for single-CPU policies only: */
 #ifdef CONFIG_NO_HZ_COMMON
 	unsigned long		saved_idle_calls;
+	unsigned long		previous_util;
 #endif
 };
 
@@ -181,14 +182,35 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
 	return ret;
 }
 
-static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu)
+static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu,
+				     unsigned long util)
 {
 	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
 	sg_cpu->saved_idle_calls = idle_calls;
+
+	/*
+	 * Make sure that this CPU will not be immediately considered as busy in
+	 * cases where the CPU has already entered an idle state. In that case,
+	 * the number of idle_calls will not vary anymore until it exits idle,
+	 * which would lead sugov_cpu_is_busy() to say that this CPU is busy,
+	 * because it has not (re)entered idle since the last time we looked at
+	 * it.
+	 * Assuming cpu0 and cpu1 are in the same policy, that will make sure
+	 * this sequence of events leads to right cpu1 business status from
+	 * get_next_freq(cpu=1)
+	 * cpu0: [enter idle] -> [get_next_freq] -> [doing nothing] -> [wakeup]
+	 * cpu1:                ...              -> [get_next_freq] ->   ...
+	 */
+	if (util <= sg_cpu->previous_util)
+		sg_cpu->saved_idle_calls--;
+
+	sg_cpu->previous_util = util;
 }
 #else
 static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
-static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu) {}
+static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu
+				     unsigned long util)
+{}
 #endif /* CONFIG_NO_HZ_COMMON */
 
 /**
@@ -507,10 +529,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
-	busy = sugov_cpu_is_busy(sg_cpu);
-	sugov_cpu_is_busy_update(sg_cpu);
-
 	util = sugov_get_util(sg_cpu);
+	busy = sugov_cpu_is_busy(sg_cpu);
+	sugov_cpu_is_busy_update(sg_cpu, util);
 	max = sg_cpu->max;
 	util = sugov_iowait_apply(sg_cpu, time, util, max);
 	next_f = get_next_freq(sg_policy, util, max);
@@ -545,12 +566,15 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned long util = 0, max = 1;
 	unsigned int j;
+	unsigned long sg_cpu_util = 0;
 
 	for_each_cpu(j, policy->cpus) {
 		struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
 		unsigned long j_util, j_max;
 
 		j_util = sugov_get_util(j_sg_cpu);
+		if (j_sg_cpu == sg_cpu)
+			sg_cpu_util = j_util;
 		j_max = j_sg_cpu->max;
 		j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max);
 
@@ -560,6 +584,14 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 		}
 	}
 
+	/*
+	 * Only update the business status if we are looking at the CPU for
+	 * which a utilization change triggered a call to get_next_freq(). This
+	 * way, we don't affect the "busy" status of CPUs that don't have any
+	 * change in utilization.
+	 */
+	sugov_cpu_is_busy_update(sg_cpu, sg_cpu_util);
+
 	return get_next_freq(sg_policy, util, max);
 }
 
-- 
2.21.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy
  2019-05-08 17:43 ` [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy douglas.raillard
@ 2019-05-08 17:43   ` douglas.raillard
  2019-05-16 12:55   ` Patrick Bellasi
  1 sibling, 0 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

Avoid assuming a CPU is busy when it has begun being idle before
get_next_freq() is called. This is achieved by making sure the CPU will
not be detected as busy by other CPUs whenever its utilization is
decreasing.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 42 ++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index a12b7e5bc028..ce4b90cafbb5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -62,6 +62,7 @@ struct sugov_cpu {
 	/* The field below is for single-CPU policies only: */
 #ifdef CONFIG_NO_HZ_COMMON
 	unsigned long		saved_idle_calls;
+	unsigned long		previous_util;
 #endif
 };
 
@@ -181,14 +182,35 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
 	return ret;
 }
 
-static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu)
+static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu,
+				     unsigned long util)
 {
 	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
 	sg_cpu->saved_idle_calls = idle_calls;
+
+	/*
+	 * Make sure that this CPU will not be immediately considered as busy in
+	 * cases where the CPU has already entered an idle state. In that case,
+	 * the number of idle_calls will not vary anymore until it exits idle,
+	 * which would lead sugov_cpu_is_busy() to say that this CPU is busy,
+	 * because it has not (re)entered idle since the last time we looked at
+	 * it.
+	 * Assuming cpu0 and cpu1 are in the same policy, that will make sure
+	 * this sequence of events leads to right cpu1 business status from
+	 * get_next_freq(cpu=1)
+	 * cpu0: [enter idle] -> [get_next_freq] -> [doing nothing] -> [wakeup]
+	 * cpu1:                ...              -> [get_next_freq] ->   ...
+	 */
+	if (util <= sg_cpu->previous_util)
+		sg_cpu->saved_idle_calls--;
+
+	sg_cpu->previous_util = util;
 }
 #else
 static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
-static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu) {}
+static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu
+				     unsigned long util)
+{}
 #endif /* CONFIG_NO_HZ_COMMON */
 
 /**
@@ -507,10 +529,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
-	busy = sugov_cpu_is_busy(sg_cpu);
-	sugov_cpu_is_busy_update(sg_cpu);
-
 	util = sugov_get_util(sg_cpu);
+	busy = sugov_cpu_is_busy(sg_cpu);
+	sugov_cpu_is_busy_update(sg_cpu, util);
 	max = sg_cpu->max;
 	util = sugov_iowait_apply(sg_cpu, time, util, max);
 	next_f = get_next_freq(sg_policy, util, max);
@@ -545,12 +566,15 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned long util = 0, max = 1;
 	unsigned int j;
+	unsigned long sg_cpu_util = 0;
 
 	for_each_cpu(j, policy->cpus) {
 		struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
 		unsigned long j_util, j_max;
 
 		j_util = sugov_get_util(j_sg_cpu);
+		if (j_sg_cpu == sg_cpu)
+			sg_cpu_util = j_util;
 		j_max = j_sg_cpu->max;
 		j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max);
 
@@ -560,6 +584,14 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 		}
 	}
 
+	/*
+	 * Only update the business status if we are looking at the CPU for
+	 * which a utilization change triggered a call to get_next_freq(). This
+	 * way, we don't affect the "busy" status of CPUs that don't have any
+	 * change in utilization.
+	 */
+	sugov_cpu_is_busy_update(sg_cpu, sg_cpu_util);
+
 	return get_next_freq(sg_policy, util, max);
 }
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 7/7] sched/cpufreq: Boost schedutil frequency ramp up
  2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
                   ` (6 preceding siblings ...)
  2019-05-08 17:43 ` [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy douglas.raillard
@ 2019-05-08 17:43 ` douglas.raillard
  2019-05-08 17:43   ` douglas.raillard
  2019-05-13  7:12 ` [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware Viresh Kumar
  8 siblings, 1 reply; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

In some situations, it can be interesting to spend temporarily more
power if that can give a useful frequency boost.

The sugov_cpu_is_busy() heuristic is reused to check if there has been
some idle time on all CPUs in the considered perf domain since last call
to schedutil's get_next_freq(). If not, it is assumed that at least one
CPU is in a frequency ramp up phase and the domain will be allowed to
spend extra power to reach a stable OPP in a shorter amount of time.

Since the extra power expenditure is bounded, it cannot skyrocket even
on platforms with a large number of cores in the same frequency domain
and/or very high ratio between lowest and highest OPP cost.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index ce4b90cafbb5..513b32bf14c5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -218,6 +218,8 @@ static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu
  * @sg_policy: schedutil policy object to compute the new frequency for.
  * @util: Current CPU utilization.
  * @max: CPU capacity.
+ * @busy: true if at least one CPU in the policy is busy, which means it had no
+ *	idle time since its last frequency change.
  *
  * If the utilization is frequency-invariant, choose the new frequency to be
  * proportional to it, that is
@@ -231,20 +233,28 @@ static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu
  *
  * Take C = 1.25 for the frequency tipping point at (util / max) = 0.8.
  *
+ * An energy-aware boost is then applied if busy is true. The boost will allow
+ * selecting frequencies at most twice as costly in term of energy.
+ *
  * The lowest driver-supported frequency which is equal or greater than the raw
  * next_freq (as calculated above) is returned, subject to policy min/max and
  * cpufreq driver limitations.
  */
 static unsigned int get_next_freq(struct sugov_policy *sg_policy,
-				  unsigned long util, unsigned long max)
+				  unsigned long util, unsigned long max,
+				  bool busy)
 {
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned int freq = arch_scale_freq_invariant() ?
 				policy->cpuinfo.max_freq : policy->cur;
 	struct em_perf_domain *pd = sugov_policy_get_pd(sg_policy);
 
-	/* Maximum power we are ready to spend. */
-	unsigned int cost_margin = 0;
+	/*
+	 * Maximum power we are ready to spend.
+	 * When one CPU is busy in the policy, we apply a boost to help it reach
+	 * the needed frequency faster.
+	 */
+	unsigned int cost_margin = busy ? 1024/2 : 0;
 
 	freq = map_util_freq(util, freq, max);
 
@@ -534,7 +544,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	sugov_cpu_is_busy_update(sg_cpu, util);
 	max = sg_cpu->max;
 	util = sugov_iowait_apply(sg_cpu, time, util, max);
-	next_f = get_next_freq(sg_policy, util, max);
+	next_f = get_next_freq(sg_policy, util, max, busy);
 	/*
 	 * Do not reduce the frequency if the CPU has not been idle
 	 * recently, as the reduction is likely to be premature then.
@@ -567,6 +577,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 	unsigned long util = 0, max = 1;
 	unsigned int j;
 	unsigned long sg_cpu_util = 0;
+	bool busy = false;
 
 	for_each_cpu(j, policy->cpus) {
 		struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
@@ -577,6 +588,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 			sg_cpu_util = j_util;
 		j_max = j_sg_cpu->max;
 		j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max);
+		busy |= sugov_cpu_is_busy(j_sg_cpu);
 
 		if (j_util * max > j_max * util) {
 			util = j_util;
@@ -592,7 +604,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 	 */
 	sugov_cpu_is_busy_update(sg_cpu, sg_cpu_util);
 
-	return get_next_freq(sg_policy, util, max);
+	return get_next_freq(sg_policy, util, max, busy);
 }
 
 static void
-- 
2.21.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 7/7] sched/cpufreq: Boost schedutil frequency ramp up
  2019-05-08 17:43 ` [RFC PATCH 7/7] sched/cpufreq: Boost schedutil frequency ramp up douglas.raillard
@ 2019-05-08 17:43   ` douglas.raillard
  0 siblings, 0 replies; 36+ messages in thread
From: douglas.raillard @ 2019-05-08 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, mingo, peterz, quentin.perret, douglas.raillard,
	patrick.bellasi, dietmar.eggemann

From: Douglas RAILLARD <douglas.raillard@arm.com>

In some situations, it can be interesting to spend temporarily more
power if that can give a useful frequency boost.

The sugov_cpu_is_busy() heuristic is reused to check if there has been
some idle time on all CPUs in the considered perf domain since last call
to schedutil's get_next_freq(). If not, it is assumed that at least one
CPU is in a frequency ramp up phase and the domain will be allowed to
spend extra power to reach a stable OPP in a shorter amount of time.

Since the extra power expenditure is bounded, it cannot skyrocket even
on platforms with a large number of cores in the same frequency domain
and/or very high ratio between lowest and highest OPP cost.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index ce4b90cafbb5..513b32bf14c5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -218,6 +218,8 @@ static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu
  * @sg_policy: schedutil policy object to compute the new frequency for.
  * @util: Current CPU utilization.
  * @max: CPU capacity.
+ * @busy: true if at least one CPU in the policy is busy, which means it had no
+ *	idle time since its last frequency change.
  *
  * If the utilization is frequency-invariant, choose the new frequency to be
  * proportional to it, that is
@@ -231,20 +233,28 @@ static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu
  *
  * Take C = 1.25 for the frequency tipping point at (util / max) = 0.8.
  *
+ * An energy-aware boost is then applied if busy is true. The boost will allow
+ * selecting frequencies at most twice as costly in term of energy.
+ *
  * The lowest driver-supported frequency which is equal or greater than the raw
  * next_freq (as calculated above) is returned, subject to policy min/max and
  * cpufreq driver limitations.
  */
 static unsigned int get_next_freq(struct sugov_policy *sg_policy,
-				  unsigned long util, unsigned long max)
+				  unsigned long util, unsigned long max,
+				  bool busy)
 {
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned int freq = arch_scale_freq_invariant() ?
 				policy->cpuinfo.max_freq : policy->cur;
 	struct em_perf_domain *pd = sugov_policy_get_pd(sg_policy);
 
-	/* Maximum power we are ready to spend. */
-	unsigned int cost_margin = 0;
+	/*
+	 * Maximum power we are ready to spend.
+	 * When one CPU is busy in the policy, we apply a boost to help it reach
+	 * the needed frequency faster.
+	 */
+	unsigned int cost_margin = busy ? 1024/2 : 0;
 
 	freq = map_util_freq(util, freq, max);
 
@@ -534,7 +544,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	sugov_cpu_is_busy_update(sg_cpu, util);
 	max = sg_cpu->max;
 	util = sugov_iowait_apply(sg_cpu, time, util, max);
-	next_f = get_next_freq(sg_policy, util, max);
+	next_f = get_next_freq(sg_policy, util, max, busy);
 	/*
 	 * Do not reduce the frequency if the CPU has not been idle
 	 * recently, as the reduction is likely to be premature then.
@@ -567,6 +577,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 	unsigned long util = 0, max = 1;
 	unsigned int j;
 	unsigned long sg_cpu_util = 0;
+	bool busy = false;
 
 	for_each_cpu(j, policy->cpus) {
 		struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
@@ -577,6 +588,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 			sg_cpu_util = j_util;
 		j_max = j_sg_cpu->max;
 		j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max);
+		busy |= sugov_cpu_is_busy(j_sg_cpu);
 
 		if (j_util * max > j_max * util) {
 			util = j_util;
@@ -592,7 +604,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 	 */
 	sugov_cpu_is_busy_update(sg_cpu, sg_cpu_util);
 
-	return get_next_freq(sg_policy, util, max);
+	return get_next_freq(sg_policy, util, max, busy);
 }
 
 static void
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware
  2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
                   ` (7 preceding siblings ...)
  2019-05-08 17:43 ` [RFC PATCH 7/7] sched/cpufreq: Boost schedutil frequency ramp up douglas.raillard
@ 2019-05-13  7:12 ` Viresh Kumar
  2019-05-13  7:12   ` Viresh Kumar
  2019-05-13 13:52   ` Douglas Raillard
  8 siblings, 2 replies; 36+ messages in thread
From: Viresh Kumar @ 2019-05-13  7:12 UTC (permalink / raw)
  To: douglas.raillard
  Cc: Linux Kernel Mailing List, Linux PM list, Ingo Molnar,
	Peter Zijlstra, quentin.perret, patrick.bellasi,
	Dietmar Eggemann

On Wed, May 8, 2019 at 11:57 PM <douglas.raillard@arm.com> wrote:
>
> From: Douglas RAILLARD <douglas.raillard@arm.com>
>
> Make schedutil cpufreq governor energy-aware.

Hi Douglas,

I was wondering on why the cpufreq maintainers weren't cc'd for this set and
then I noticed that get_maintainers doesn't report us at all for schedutil :(

I have sent a patch to fix that, but please include us as well in the
future even
if get_maintainers doesn't report us :)

--
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware
  2019-05-13  7:12 ` [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware Viresh Kumar
@ 2019-05-13  7:12   ` Viresh Kumar
  2019-05-13 13:52   ` Douglas Raillard
  1 sibling, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2019-05-13  7:12 UTC (permalink / raw)
  To: douglas.raillard
  Cc: Linux Kernel Mailing List, Linux PM list, Ingo Molnar,
	Peter Zijlstra, quentin.perret, patrick.bellasi,
	Dietmar Eggemann

On Wed, May 8, 2019 at 11:57 PM <douglas.raillard@arm.com> wrote:
>
> From: Douglas RAILLARD <douglas.raillard@arm.com>
>
> Make schedutil cpufreq governor energy-aware.

Hi Douglas,

I was wondering on why the cpufreq maintainers weren't cc'd for this set and
then I noticed that get_maintainers doesn't report us at all for schedutil :(

I have sent a patch to fix that, but please include us as well in the
future even
if get_maintainers doesn't report us :)

--
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware
  2019-05-13  7:12 ` [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware Viresh Kumar
  2019-05-13  7:12   ` Viresh Kumar
@ 2019-05-13 13:52   ` Douglas Raillard
  2019-05-13 13:52     ` Douglas Raillard
  1 sibling, 1 reply; 36+ messages in thread
From: Douglas Raillard @ 2019-05-13 13:52 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Linux Kernel Mailing List, Linux PM list, Ingo Molnar,
	Peter Zijlstra, quentin.perret, patrick.bellasi,
	Dietmar Eggemann, rjw

Hi Viresh, Rafael,

On 5/13/19 8:12 AM, Viresh Kumar wrote:
> On Wed, May 8, 2019 at 11:57 PM <douglas.raillard@arm.com> wrote:
>>
>> From: Douglas RAILLARD <douglas.raillard@arm.com>
>>
>> Make schedutil cpufreq governor energy-aware.
> 
> Hi Douglas,
> 
> I was wondering on why the cpufreq maintainers weren't cc'd for this set and
> then I noticed that get_maintainers doesn't report us at all for schedutil :(
>
> I have sent a patch to fix that, but please include us as well in the
> future even
> if get_maintainers doesn't report us :)

Looks like it was a mix of me not checking my checklist and get_maintainers
not catching it, I'll make sure both of you are in CC next time :)

> 
> --
> viresh

Thanks,
Douglas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware
  2019-05-13 13:52   ` Douglas Raillard
@ 2019-05-13 13:52     ` Douglas Raillard
  0 siblings, 0 replies; 36+ messages in thread
From: Douglas Raillard @ 2019-05-13 13:52 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Linux Kernel Mailing List, Linux PM list, Ingo Molnar,
	Peter Zijlstra, quentin.perret, patrick.bellasi,
	Dietmar Eggemann, rjw

Hi Viresh, Rafael,

On 5/13/19 8:12 AM, Viresh Kumar wrote:
> On Wed, May 8, 2019 at 11:57 PM <douglas.raillard@arm.com> wrote:
>>
>> From: Douglas RAILLARD <douglas.raillard@arm.com>
>>
>> Make schedutil cpufreq governor energy-aware.
> 
> Hi Douglas,
> 
> I was wondering on why the cpufreq maintainers weren't cc'd for this set and
> then I noticed that get_maintainers doesn't report us at all for schedutil :(
>
> I have sent a patch to fix that, but please include us as well in the
> future even
> if get_maintainers doesn't report us :)

Looks like it was a mix of me not checking my checklist and get_maintainers
not catching it, I'll make sure both of you are in CC next time :)

> 
> --
> viresh

Thanks,
Douglas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-08 17:42 ` [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq() douglas.raillard
  2019-05-08 17:42   ` douglas.raillard
@ 2019-05-16 12:42   ` Patrick Bellasi
  2019-05-16 12:42     ` Patrick Bellasi
                       ` (2 more replies)
  1 sibling, 3 replies; 36+ messages in thread
From: Patrick Bellasi @ 2019-05-16 12:42 UTC (permalink / raw)
  To: douglas.raillard
  Cc: linux-kernel, linux-pm, mingo, peterz, quentin.perret, dietmar.eggemann

On 08-May 18:42, douglas.raillard@arm.com wrote:
> From: Douglas RAILLARD <douglas.raillard@arm.com>
> 
> em_pd_get_higher_freq() returns a frequency greater or equal to the
> provided one while taking into account a given cost margin. It also
> skips inefficient OPPs that have a higher cost than another one with a
> higher frequency.

It's worth to add a small description and definition of what we mean by
"OPP efficiency". Despite being just an RFC, it could help to better
understand what we are after.

[...]

> +/** + * em_pd_get_higher_freq() - Get the highest frequency that
> does not exceed the
> + * given cost margin compared to min_freq
> + * @pd		: performance domain for which this must be done
> + * @min_freq	: minimum frequency to return
> + * @cost_margin	: allowed margin compared to min_freq, as a per-1024 value.
                                                                    ^^^^^^^^
here...

> + *
> + * Return: the chosen frequency, guaranteed to be at least as high as min_freq.
> + */
> +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
> +	unsigned long min_freq, unsigned long cost_margin)
> +{
> +	unsigned long max_cost = 0;
> +	struct em_cap_state *cs;
> +	int i;
> +
> +	if (!pd)
> +		return min_freq;
> +
> +	/* Compute the maximum allowed cost */
> +	for (i = 0; i < pd->nr_cap_states; i++) {
> +		cs = &pd->table[i];
> +		if (cs->frequency >= min_freq) {
> +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
                                                                         ^^^^
... end here we should probably better use SCHED_CAPACITY_SCALE
instead of hard-coding in values, isn't it?

> +			break;
> +		}
> +	}
> +

[...]

Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-16 12:42   ` Patrick Bellasi
@ 2019-05-16 12:42     ` Patrick Bellasi
  2019-05-16 13:01     ` Quentin Perret
  2019-05-16 13:06     ` Douglas Raillard
  2 siblings, 0 replies; 36+ messages in thread
From: Patrick Bellasi @ 2019-05-16 12:42 UTC (permalink / raw)
  To: douglas.raillard
  Cc: linux-kernel, linux-pm, mingo, peterz, quentin.perret, dietmar.eggemann

On 08-May 18:42, douglas.raillard@arm.com wrote:
> From: Douglas RAILLARD <douglas.raillard@arm.com>
> 
> em_pd_get_higher_freq() returns a frequency greater or equal to the
> provided one while taking into account a given cost margin. It also
> skips inefficient OPPs that have a higher cost than another one with a
> higher frequency.

It's worth to add a small description and definition of what we mean by
"OPP efficiency". Despite being just an RFC, it could help to better
understand what we are after.

[...]

> +/** + * em_pd_get_higher_freq() - Get the highest frequency that
> does not exceed the
> + * given cost margin compared to min_freq
> + * @pd		: performance domain for which this must be done
> + * @min_freq	: minimum frequency to return
> + * @cost_margin	: allowed margin compared to min_freq, as a per-1024 value.
                                                                    ^^^^^^^^
here...

> + *
> + * Return: the chosen frequency, guaranteed to be at least as high as min_freq.
> + */
> +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
> +	unsigned long min_freq, unsigned long cost_margin)
> +{
> +	unsigned long max_cost = 0;
> +	struct em_cap_state *cs;
> +	int i;
> +
> +	if (!pd)
> +		return min_freq;
> +
> +	/* Compute the maximum allowed cost */
> +	for (i = 0; i < pd->nr_cap_states; i++) {
> +		cs = &pd->table[i];
> +		if (cs->frequency >= min_freq) {
> +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
                                                                         ^^^^
... end here we should probably better use SCHED_CAPACITY_SCALE
instead of hard-coding in values, isn't it?

> +			break;
> +		}
> +	}
> +

[...]

Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy
  2019-05-08 17:43 ` [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy douglas.raillard
  2019-05-08 17:43   ` douglas.raillard
@ 2019-05-16 12:55   ` Patrick Bellasi
  2019-05-16 12:55     ` Patrick Bellasi
  2019-06-19 16:19     ` Douglas Raillard
  1 sibling, 2 replies; 36+ messages in thread
From: Patrick Bellasi @ 2019-05-16 12:55 UTC (permalink / raw)
  To: douglas.raillard
  Cc: linux-kernel, linux-pm, mingo, peterz, quentin.perret, dietmar.eggemann

On 08-May 18:43, douglas.raillard@arm.com wrote:
> From: Douglas RAILLARD <douglas.raillard@arm.com>
> 
> Avoid assuming a CPU is busy when it has begun being idle before
> get_next_freq() is called. This is achieved by making sure the CPU will
> not be detected as busy by other CPUs whenever its utilization is
> decreasing.

If I understand it correctly, what you are after here is a "metric"
which tells you (in a shared performance domain) if a CPU has been
busy for a certain amount of time.
You do that by reworking the way idle_calls are accounted for the
sugov_update_single() case.

That approach could work but it looks a bit convoluted in the way it's
coded and it's also difficult to exclude there could be corner cases
with wired behaviors.
Isn't that why you "fix" the saved_idle_calls counter after all?

What about a different approach where we:

1. we annotate the timestamp a CPU wakes up from IDLE (last_wakeup_time)

2. we use that annotated last_wake_time and the rq->nr_running to
   define the "cpu is busy" heuristic.

Looking at a sibling CPU, I think we can end up with two main cases:

1. CPU's nr_running is == 0
   then we don't consider busy that CPU

2. CPU's nr_running is  > 0
   then the CPU is busy iff
      (current_time - last_wakeup_tim) >= busy_threshold

Notice that, when a CPU is active, its rq clock is periodically
updated, at least once per tick. Thus, provided a tick time is not too
long to declare busy a CPU, then the above logic should work.

Perhaps the busy_threshold can also be defined considering the PELT
dynamics and starting from an expected utilization increase converted
in time.

Could something like to above work or am I missing something?

> Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 42 ++++++++++++++++++++++++++++----
>  1 file changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index a12b7e5bc028..ce4b90cafbb5 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -62,6 +62,7 @@ struct sugov_cpu {
>  	/* The field below is for single-CPU policies only: */
>  #ifdef CONFIG_NO_HZ_COMMON
>  	unsigned long		saved_idle_calls;
> +	unsigned long		previous_util;
>  #endif
>  };
>  
> @@ -181,14 +182,35 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
>  	return ret;
>  }
>  
> -static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu)
> +static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu,
> +				     unsigned long util)
>  {
>  	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
>  	sg_cpu->saved_idle_calls = idle_calls;
> +
> +	/*
> +	 * Make sure that this CPU will not be immediately considered as busy in
> +	 * cases where the CPU has already entered an idle state. In that case,
> +	 * the number of idle_calls will not vary anymore until it exits idle,
> +	 * which would lead sugov_cpu_is_busy() to say that this CPU is busy,
> +	 * because it has not (re)entered idle since the last time we looked at
> +	 * it.
> +	 * Assuming cpu0 and cpu1 are in the same policy, that will make sure
> +	 * this sequence of events leads to right cpu1 business status from
> +	 * get_next_freq(cpu=1)
> +	 * cpu0: [enter idle] -> [get_next_freq] -> [doing nothing] -> [wakeup]
> +	 * cpu1:                ...              -> [get_next_freq] ->   ...
> +	 */
> +	if (util <= sg_cpu->previous_util)
> +		sg_cpu->saved_idle_calls--;
> +
> +	sg_cpu->previous_util = util;
>  }
>  #else
>  static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
> -static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu) {}
> +static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu
> +				     unsigned long util)
> +{}
>  #endif /* CONFIG_NO_HZ_COMMON */
>  
>  /**
> @@ -507,10 +529,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  	if (!sugov_should_update_freq(sg_policy, time))
>  		return;
>  
> -	busy = sugov_cpu_is_busy(sg_cpu);
> -	sugov_cpu_is_busy_update(sg_cpu);
> -
>  	util = sugov_get_util(sg_cpu);
> +	busy = sugov_cpu_is_busy(sg_cpu);
> +	sugov_cpu_is_busy_update(sg_cpu, util);
>  	max = sg_cpu->max;
>  	util = sugov_iowait_apply(sg_cpu, time, util, max);
>  	next_f = get_next_freq(sg_policy, util, max);
> @@ -545,12 +566,15 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>  	struct cpufreq_policy *policy = sg_policy->policy;
>  	unsigned long util = 0, max = 1;
>  	unsigned int j;
> +	unsigned long sg_cpu_util = 0;
>  
>  	for_each_cpu(j, policy->cpus) {
>  		struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
>  		unsigned long j_util, j_max;
>  
>  		j_util = sugov_get_util(j_sg_cpu);
> +		if (j_sg_cpu == sg_cpu)
> +			sg_cpu_util = j_util;
>  		j_max = j_sg_cpu->max;
>  		j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max);
>  
> @@ -560,6 +584,14 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>  		}
>  	}
>  
> +	/*
> +	 * Only update the business status if we are looking at the CPU for
> +	 * which a utilization change triggered a call to get_next_freq(). This
> +	 * way, we don't affect the "busy" status of CPUs that don't have any
> +	 * change in utilization.
> +	 */
> +	sugov_cpu_is_busy_update(sg_cpu, sg_cpu_util);
> +
>  	return get_next_freq(sg_policy, util, max);
>  }
>  
> -- 
> 2.21.0
> 

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy
  2019-05-16 12:55   ` Patrick Bellasi
@ 2019-05-16 12:55     ` Patrick Bellasi
  2019-06-19 16:19     ` Douglas Raillard
  1 sibling, 0 replies; 36+ messages in thread
From: Patrick Bellasi @ 2019-05-16 12:55 UTC (permalink / raw)
  To: douglas.raillard
  Cc: linux-kernel, linux-pm, mingo, peterz, quentin.perret, dietmar.eggemann

On 08-May 18:43, douglas.raillard@arm.com wrote:
> From: Douglas RAILLARD <douglas.raillard@arm.com>
> 
> Avoid assuming a CPU is busy when it has begun being idle before
> get_next_freq() is called. This is achieved by making sure the CPU will
> not be detected as busy by other CPUs whenever its utilization is
> decreasing.

If I understand it correctly, what you are after here is a "metric"
which tells you (in a shared performance domain) if a CPU has been
busy for a certain amount of time.
You do that by reworking the way idle_calls are accounted for the
sugov_update_single() case.

That approach could work but it looks a bit convoluted in the way it's
coded and it's also difficult to exclude there could be corner cases
with wired behaviors.
Isn't that why you "fix" the saved_idle_calls counter after all?

What about a different approach where we:

1. we annotate the timestamp a CPU wakes up from IDLE (last_wakeup_time)

2. we use that annotated last_wake_time and the rq->nr_running to
   define the "cpu is busy" heuristic.

Looking at a sibling CPU, I think we can end up with two main cases:

1. CPU's nr_running is == 0
   then we don't consider busy that CPU

2. CPU's nr_running is  > 0
   then the CPU is busy iff
      (current_time - last_wakeup_tim) >= busy_threshold

Notice that, when a CPU is active, its rq clock is periodically
updated, at least once per tick. Thus, provided a tick time is not too
long to declare busy a CPU, then the above logic should work.

Perhaps the busy_threshold can also be defined considering the PELT
dynamics and starting from an expected utilization increase converted
in time.

Could something like to above work or am I missing something?

> Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 42 ++++++++++++++++++++++++++++----
>  1 file changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index a12b7e5bc028..ce4b90cafbb5 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -62,6 +62,7 @@ struct sugov_cpu {
>  	/* The field below is for single-CPU policies only: */
>  #ifdef CONFIG_NO_HZ_COMMON
>  	unsigned long		saved_idle_calls;
> +	unsigned long		previous_util;
>  #endif
>  };
>  
> @@ -181,14 +182,35 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
>  	return ret;
>  }
>  
> -static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu)
> +static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu,
> +				     unsigned long util)
>  {
>  	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
>  	sg_cpu->saved_idle_calls = idle_calls;
> +
> +	/*
> +	 * Make sure that this CPU will not be immediately considered as busy in
> +	 * cases where the CPU has already entered an idle state. In that case,
> +	 * the number of idle_calls will not vary anymore until it exits idle,
> +	 * which would lead sugov_cpu_is_busy() to say that this CPU is busy,
> +	 * because it has not (re)entered idle since the last time we looked at
> +	 * it.
> +	 * Assuming cpu0 and cpu1 are in the same policy, that will make sure
> +	 * this sequence of events leads to right cpu1 business status from
> +	 * get_next_freq(cpu=1)
> +	 * cpu0: [enter idle] -> [get_next_freq] -> [doing nothing] -> [wakeup]
> +	 * cpu1:                ...              -> [get_next_freq] ->   ...
> +	 */
> +	if (util <= sg_cpu->previous_util)
> +		sg_cpu->saved_idle_calls--;
> +
> +	sg_cpu->previous_util = util;
>  }
>  #else
>  static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
> -static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu) {}
> +static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu
> +				     unsigned long util)
> +{}
>  #endif /* CONFIG_NO_HZ_COMMON */
>  
>  /**
> @@ -507,10 +529,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  	if (!sugov_should_update_freq(sg_policy, time))
>  		return;
>  
> -	busy = sugov_cpu_is_busy(sg_cpu);
> -	sugov_cpu_is_busy_update(sg_cpu);
> -
>  	util = sugov_get_util(sg_cpu);
> +	busy = sugov_cpu_is_busy(sg_cpu);
> +	sugov_cpu_is_busy_update(sg_cpu, util);
>  	max = sg_cpu->max;
>  	util = sugov_iowait_apply(sg_cpu, time, util, max);
>  	next_f = get_next_freq(sg_policy, util, max);
> @@ -545,12 +566,15 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>  	struct cpufreq_policy *policy = sg_policy->policy;
>  	unsigned long util = 0, max = 1;
>  	unsigned int j;
> +	unsigned long sg_cpu_util = 0;
>  
>  	for_each_cpu(j, policy->cpus) {
>  		struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
>  		unsigned long j_util, j_max;
>  
>  		j_util = sugov_get_util(j_sg_cpu);
> +		if (j_sg_cpu == sg_cpu)
> +			sg_cpu_util = j_util;
>  		j_max = j_sg_cpu->max;
>  		j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max);
>  
> @@ -560,6 +584,14 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>  		}
>  	}
>  
> +	/*
> +	 * Only update the business status if we are looking at the CPU for
> +	 * which a utilization change triggered a call to get_next_freq(). This
> +	 * way, we don't affect the "busy" status of CPUs that don't have any
> +	 * change in utilization.
> +	 */
> +	sugov_cpu_is_busy_update(sg_cpu, sg_cpu_util);
> +
>  	return get_next_freq(sg_policy, util, max);
>  }
>  
> -- 
> 2.21.0
> 

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-16 12:42   ` Patrick Bellasi
  2019-05-16 12:42     ` Patrick Bellasi
@ 2019-05-16 13:01     ` Quentin Perret
  2019-05-16 13:01       ` Quentin Perret
  2019-05-16 13:22       ` Patrick Bellasi
  2019-05-16 13:06     ` Douglas Raillard
  2 siblings, 2 replies; 36+ messages in thread
From: Quentin Perret @ 2019-05-16 13:01 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: douglas.raillard, linux-kernel, linux-pm, mingo, peterz,
	dietmar.eggemann

On Thursday 16 May 2019 at 13:42:00 (+0100), Patrick Bellasi wrote:
> > +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
> > +	unsigned long min_freq, unsigned long cost_margin)
> > +{
> > +	unsigned long max_cost = 0;
> > +	struct em_cap_state *cs;
> > +	int i;
> > +
> > +	if (!pd)
> > +		return min_freq;
> > +
> > +	/* Compute the maximum allowed cost */
> > +	for (i = 0; i < pd->nr_cap_states; i++) {
> > +		cs = &pd->table[i];
> > +		if (cs->frequency >= min_freq) {
> > +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
>                                                                          ^^^^
> ... end here we should probably better use SCHED_CAPACITY_SCALE
> instead of hard-coding in values, isn't it?

I'm not sure to agree. This isn't part of the scheduler per se, and the
cost thing isn't in units of capacity, but in units of power, so I don't
think SCHED_CAPACITY_SCALE is correct here.

But I agree these hard coded values (that one, and the 512 in one of the
following patches) could use some motivation :-)

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-16 13:01     ` Quentin Perret
@ 2019-05-16 13:01       ` Quentin Perret
  2019-05-16 13:22       ` Patrick Bellasi
  1 sibling, 0 replies; 36+ messages in thread
From: Quentin Perret @ 2019-05-16 13:01 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: douglas.raillard, linux-kernel, linux-pm, mingo, peterz,
	dietmar.eggemann

On Thursday 16 May 2019 at 13:42:00 (+0100), Patrick Bellasi wrote:
> > +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
> > +	unsigned long min_freq, unsigned long cost_margin)
> > +{
> > +	unsigned long max_cost = 0;
> > +	struct em_cap_state *cs;
> > +	int i;
> > +
> > +	if (!pd)
> > +		return min_freq;
> > +
> > +	/* Compute the maximum allowed cost */
> > +	for (i = 0; i < pd->nr_cap_states; i++) {
> > +		cs = &pd->table[i];
> > +		if (cs->frequency >= min_freq) {
> > +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
>                                                                          ^^^^
> ... end here we should probably better use SCHED_CAPACITY_SCALE
> instead of hard-coding in values, isn't it?

I'm not sure to agree. This isn't part of the scheduler per se, and the
cost thing isn't in units of capacity, but in units of power, so I don't
think SCHED_CAPACITY_SCALE is correct here.

But I agree these hard coded values (that one, and the 512 in one of the
following patches) could use some motivation :-)

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-16 12:42   ` Patrick Bellasi
  2019-05-16 12:42     ` Patrick Bellasi
  2019-05-16 13:01     ` Quentin Perret
@ 2019-05-16 13:06     ` Douglas Raillard
  2019-05-16 13:06       ` Douglas Raillard
  2 siblings, 1 reply; 36+ messages in thread
From: Douglas Raillard @ 2019-05-16 13:06 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, mingo, peterz, quentin.perret,
	dietmar.eggemann, rjw, Viresh Kumar

Hi Patrick,

On 5/16/19 1:42 PM, Patrick Bellasi wrote:
> On 08-May 18:42, douglas.raillard@arm.com wrote:
>> From: Douglas RAILLARD <douglas.raillard@arm.com>
>>
>> em_pd_get_higher_freq() returns a frequency greater or equal to the
>> provided one while taking into account a given cost margin. It also
>> skips inefficient OPPs that have a higher cost than another one with a
>> higher frequency.
> 
> It's worth to add a small description and definition of what we mean by
> "OPP efficiency". Despite being just an RFC, it could help to better
> understand what we are after.

Right, here efficiency=capacity/power.

> 
> [...]
> 
>> +/** + * em_pd_get_higher_freq() - Get the highest frequency that
>> does not exceed the
>> + * given cost margin compared to min_freq
>> + * @pd		: performance domain for which this must be done
>> + * @min_freq	: minimum frequency to return
>> + * @cost_margin	: allowed margin compared to min_freq, as a per-1024 value.
>                                                                      ^^^^^^^^
> here...
> 
>> + *
>> + * Return: the chosen frequency, guaranteed to be at least as high as min_freq.
>> + */
>> +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
>> +	unsigned long min_freq, unsigned long cost_margin)
>> +{
>> +	unsigned long max_cost = 0;
>> +	struct em_cap_state *cs;
>> +	int i;
>> +
>> +	if (!pd)
>> +		return min_freq;
>> +
>> +	/* Compute the maximum allowed cost */
>> +	for (i = 0; i < pd->nr_cap_states; i++) {
>> +		cs = &pd->table[i];
>> +		if (cs->frequency >= min_freq) {
>> +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
>                                                                           ^^^^
> ... end here we should probably better use SCHED_CAPACITY_SCALE
> instead of hard-coding in values, isn't it?

"cs->cost*cost_margin/1024" is not a capacity, it's a cost as defined here:
https://elixir.bootlin.com/linux/latest/source/include/linux/energy_model.h#L17

Actually, it's in milliwatts, but it's not better the better way to look at
it to understand it IMHO.

The margin is expressed as a "per-1024" value just like we use percents'
in everyday life, so it has no unit. If we want to avoid hard-coded values
here, I can introduce an ENERGY_COST_MARGIN_SCALE macro.

>> +			break;
>> +		}
>> +	}
>> +
> 
> [...]
> 
> Best,
> Patrick

Thanks,
Douglas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-16 13:06     ` Douglas Raillard
@ 2019-05-16 13:06       ` Douglas Raillard
  0 siblings, 0 replies; 36+ messages in thread
From: Douglas Raillard @ 2019-05-16 13:06 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, mingo, peterz, quentin.perret,
	dietmar.eggemann, rjw, Viresh Kumar

Hi Patrick,

On 5/16/19 1:42 PM, Patrick Bellasi wrote:
> On 08-May 18:42, douglas.raillard@arm.com wrote:
>> From: Douglas RAILLARD <douglas.raillard@arm.com>
>>
>> em_pd_get_higher_freq() returns a frequency greater or equal to the
>> provided one while taking into account a given cost margin. It also
>> skips inefficient OPPs that have a higher cost than another one with a
>> higher frequency.
> 
> It's worth to add a small description and definition of what we mean by
> "OPP efficiency". Despite being just an RFC, it could help to better
> understand what we are after.

Right, here efficiency=capacity/power.

> 
> [...]
> 
>> +/** + * em_pd_get_higher_freq() - Get the highest frequency that
>> does not exceed the
>> + * given cost margin compared to min_freq
>> + * @pd		: performance domain for which this must be done
>> + * @min_freq	: minimum frequency to return
>> + * @cost_margin	: allowed margin compared to min_freq, as a per-1024 value.
>                                                                      ^^^^^^^^
> here...
> 
>> + *
>> + * Return: the chosen frequency, guaranteed to be at least as high as min_freq.
>> + */
>> +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
>> +	unsigned long min_freq, unsigned long cost_margin)
>> +{
>> +	unsigned long max_cost = 0;
>> +	struct em_cap_state *cs;
>> +	int i;
>> +
>> +	if (!pd)
>> +		return min_freq;
>> +
>> +	/* Compute the maximum allowed cost */
>> +	for (i = 0; i < pd->nr_cap_states; i++) {
>> +		cs = &pd->table[i];
>> +		if (cs->frequency >= min_freq) {
>> +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
>                                                                           ^^^^
> ... end here we should probably better use SCHED_CAPACITY_SCALE
> instead of hard-coding in values, isn't it?

"cs->cost*cost_margin/1024" is not a capacity, it's a cost as defined here:
https://elixir.bootlin.com/linux/latest/source/include/linux/energy_model.h#L17

Actually, it's in milliwatts, but it's not better the better way to look at
it to understand it IMHO.

The margin is expressed as a "per-1024" value just like we use percents'
in everyday life, so it has no unit. If we want to avoid hard-coded values
here, I can introduce an ENERGY_COST_MARGIN_SCALE macro.

>> +			break;
>> +		}
>> +	}
>> +
> 
> [...]
> 
> Best,
> Patrick

Thanks,
Douglas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-16 13:01     ` Quentin Perret
  2019-05-16 13:01       ` Quentin Perret
@ 2019-05-16 13:22       ` Patrick Bellasi
  2019-05-16 13:22         ` Patrick Bellasi
  2019-06-19 16:08         ` Douglas Raillard
  1 sibling, 2 replies; 36+ messages in thread
From: Patrick Bellasi @ 2019-05-16 13:22 UTC (permalink / raw)
  To: Quentin Perret
  Cc: douglas.raillard, linux-kernel, linux-pm, mingo, peterz,
	dietmar.eggemann

On 16-May 14:01, Quentin Perret wrote:
> On Thursday 16 May 2019 at 13:42:00 (+0100), Patrick Bellasi wrote:
> > > +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
> > > +	unsigned long min_freq, unsigned long cost_margin)
> > > +{
> > > +	unsigned long max_cost = 0;
> > > +	struct em_cap_state *cs;
> > > +	int i;
> > > +
> > > +	if (!pd)
> > > +		return min_freq;
> > > +
> > > +	/* Compute the maximum allowed cost */
> > > +	for (i = 0; i < pd->nr_cap_states; i++) {
> > > +		cs = &pd->table[i];
> > > +		if (cs->frequency >= min_freq) {
> > > +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
> >                                                                          ^^^^
> > ... end here we should probably better use SCHED_CAPACITY_SCALE
> > instead of hard-coding in values, isn't it?
> 
> I'm not sure to agree. This isn't part of the scheduler per se, and the
> cost thing isn't in units of capacity, but in units of power, so I don't
> think SCHED_CAPACITY_SCALE is correct here.

Right, I get the units do not match and it would not be elegant to use
it here...

> But I agree these hard coded values (that one, and the 512 in one of the
> following patches) could use some motivation :-)

... ultimately SCHED_CAPACITY_SCALE is just SCHED_FIXEDPOINT_SCALE,
which is adimensional. Perhaps we should use that or yet another alias
for the same.

> Thanks,
> Quentin

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-16 13:22       ` Patrick Bellasi
@ 2019-05-16 13:22         ` Patrick Bellasi
  2019-06-19 16:08         ` Douglas Raillard
  1 sibling, 0 replies; 36+ messages in thread
From: Patrick Bellasi @ 2019-05-16 13:22 UTC (permalink / raw)
  To: Quentin Perret
  Cc: douglas.raillard, linux-kernel, linux-pm, mingo, peterz,
	dietmar.eggemann

On 16-May 14:01, Quentin Perret wrote:
> On Thursday 16 May 2019 at 13:42:00 (+0100), Patrick Bellasi wrote:
> > > +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
> > > +	unsigned long min_freq, unsigned long cost_margin)
> > > +{
> > > +	unsigned long max_cost = 0;
> > > +	struct em_cap_state *cs;
> > > +	int i;
> > > +
> > > +	if (!pd)
> > > +		return min_freq;
> > > +
> > > +	/* Compute the maximum allowed cost */
> > > +	for (i = 0; i < pd->nr_cap_states; i++) {
> > > +		cs = &pd->table[i];
> > > +		if (cs->frequency >= min_freq) {
> > > +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
> >                                                                          ^^^^
> > ... end here we should probably better use SCHED_CAPACITY_SCALE
> > instead of hard-coding in values, isn't it?
> 
> I'm not sure to agree. This isn't part of the scheduler per se, and the
> cost thing isn't in units of capacity, but in units of power, so I don't
> think SCHED_CAPACITY_SCALE is correct here.

Right, I get the units do not match and it would not be elegant to use
it here...

> But I agree these hard coded values (that one, and the 512 in one of the
> following patches) could use some motivation :-)

... ultimately SCHED_CAPACITY_SCALE is just SCHED_FIXEDPOINT_SCALE,
which is adimensional. Perhaps we should use that or yet another alias
for the same.

> Thanks,
> Quentin

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-05-16 13:22       ` Patrick Bellasi
  2019-05-16 13:22         ` Patrick Bellasi
@ 2019-06-19 16:08         ` Douglas Raillard
  2019-06-20 13:04           ` Patrick Bellasi
  1 sibling, 1 reply; 36+ messages in thread
From: Douglas Raillard @ 2019-06-19 16:08 UTC (permalink / raw)
  To: Patrick Bellasi, Quentin Perret
  Cc: linux-kernel, linux-pm, mingo, peterz, dietmar.eggemann

Hi Patrick,

On 5/16/19 2:22 PM, Patrick Bellasi wrote:
> On 16-May 14:01, Quentin Perret wrote:
>> On Thursday 16 May 2019 at 13:42:00 (+0100), Patrick Bellasi wrote:
>>>> +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
>>>> +	unsigned long min_freq, unsigned long cost_margin)
>>>> +{
>>>> +	unsigned long max_cost = 0;
>>>> +	struct em_cap_state *cs;
>>>> +	int i;
>>>> +
>>>> +	if (!pd)
>>>> +		return min_freq;
>>>> +
>>>> +	/* Compute the maximum allowed cost */
>>>> +	for (i = 0; i < pd->nr_cap_states; i++) {
>>>> +		cs = &pd->table[i];
>>>> +		if (cs->frequency >= min_freq) {
>>>> +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
>>>                                                                           ^^^^
>>> ... end here we should probably better use SCHED_CAPACITY_SCALE
>>> instead of hard-coding in values, isn't it?
>>
>> I'm not sure to agree. This isn't part of the scheduler per se, and the
>> cost thing isn't in units of capacity, but in units of power, so I don't
>> think SCHED_CAPACITY_SCALE is correct here.
> 
> Right, I get the units do not match and it would not be elegant to use
> it here...
> 
>> But I agree these hard coded values (that one, and the 512 in one of the
>> following patches) could use some motivation :-)
> 
> ... ultimately SCHED_CAPACITY_SCALE is just SCHED_FIXEDPOINT_SCALE,
> which is adimensional. Perhaps we should use that or yet another alias
> for the same.

Would it be a good idea to use SCHED_FIXEDPOINT_SCALE in energy.c ?
Since it's not part of the scheduler, maybe there is a scale covering a wider scope,
or we can introduce a similar ENERGY_FIXEDPOINT_SCALE in energy_model.h.


>> Thanks,
>> Quentin
> 

Thanks,
Douglas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy
  2019-05-16 12:55   ` Patrick Bellasi
  2019-05-16 12:55     ` Patrick Bellasi
@ 2019-06-19 16:19     ` Douglas Raillard
  2019-06-20 11:05       ` Patrick Bellasi
  1 sibling, 1 reply; 36+ messages in thread
From: Douglas Raillard @ 2019-06-19 16:19 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, mingo, peterz, quentin.perret, dietmar.eggemann

Hi Patrick,

On 5/16/19 1:55 PM, Patrick Bellasi wrote:
> On 08-May 18:43, douglas.raillard@arm.com wrote:
>> From: Douglas RAILLARD <douglas.raillard@arm.com>
>>
>> Avoid assuming a CPU is busy when it has begun being idle before
>> get_next_freq() is called. This is achieved by making sure the CPU will
>> not be detected as busy by other CPUs whenever its utilization is
>> decreasing.
> 
> If I understand it correctly, what you are after here is a "metric"
> which tells you (in a shared performance domain) if a CPU has been
> busy for a certain amount of time.
> You do that by reworking the way idle_calls are accounted for the
> sugov_update_single() case.
> 
> That approach could work but it looks a bit convoluted in the way it's
> coded and it's also difficult to exclude there could be corner cases
> with wired behaviors.
> Isn't that why you "fix" the saved_idle_calls counter after all?
> 
> What about a different approach where we:
> 
> 1. we annotate the timestamp a CPU wakes up from IDLE (last_wakeup_time)
> 
> 2. we use that annotated last_wake_time and the rq->nr_running to
>     define the "cpu is busy" heuristic.
> 
> Looking at a sibling CPU, I think we can end up with two main cases:
> 
> 1. CPU's nr_running is == 0
>     then we don't consider busy that CPU
> 
> 2. CPU's nr_running is  > 0
>     then the CPU is busy iff
>        (current_time - last_wakeup_tim) >= busy_threshold
> 
> Notice that, when a CPU is active, its rq clock is periodically
> updated, at least once per tick. Thus, provided a tick time is not too
> long to declare busy a CPU, then the above logic should work.
> 
> Perhaps the busy_threshold can also be defined considering the PELT
> dynamics and starting from an expected utilization increase converted
> in time.

After experimenting with quite a few combinations, I managed to get a heuristic
based on util_avg and util_est_enqueued that seems to work better in my case.
Key differences are:
* this new heuristic only really takes into account CFS signals
  (current one based on idle calls takes into account everything that executes
   on a given CPU.)
* it will mark a CPU as busy less often, since it should only trigger when
   there is a change in the utilization of a currently enqueued tasks.
   Util changes due to enqueue/dequeue will not trigger it, which IMHO
   is desirable, since we only want to bias frequency selection
   when we know that we don't have precise utilization values for the
   enqueued tasks (because the task has changed its behavior).

That change will be part of v2 posting of this series.
  
> Could something like to above work or am I missing something?
> 
>> Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
>> ---
>>   kernel/sched/cpufreq_schedutil.c | 42 ++++++++++++++++++++++++++++----
>>   1 file changed, 37 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index a12b7e5bc028..ce4b90cafbb5 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -62,6 +62,7 @@ struct sugov_cpu {
>>   	/* The field below is for single-CPU policies only: */
>>   #ifdef CONFIG_NO_HZ_COMMON
>>   	unsigned long		saved_idle_calls;
>> +	unsigned long		previous_util;
>>   #endif
>>   };
>>   
>> @@ -181,14 +182,35 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
>>   	return ret;
>>   }
>>   
>> -static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu)
>> +static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu,
>> +				     unsigned long util)
>>   {
>>   	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
>>   	sg_cpu->saved_idle_calls = idle_calls;
>> +
>> +	/*
>> +	 * Make sure that this CPU will not be immediately considered as busy in
>> +	 * cases where the CPU has already entered an idle state. In that case,
>> +	 * the number of idle_calls will not vary anymore until it exits idle,
>> +	 * which would lead sugov_cpu_is_busy() to say that this CPU is busy,
>> +	 * because it has not (re)entered idle since the last time we looked at
>> +	 * it.
>> +	 * Assuming cpu0 and cpu1 are in the same policy, that will make sure
>> +	 * this sequence of events leads to right cpu1 business status from
>> +	 * get_next_freq(cpu=1)
>> +	 * cpu0: [enter idle] -> [get_next_freq] -> [doing nothing] -> [wakeup]
>> +	 * cpu1:                ...              -> [get_next_freq] ->   ...
>> +	 */
>> +	if (util <= sg_cpu->previous_util)
>> +		sg_cpu->saved_idle_calls--;
>> +
>> +	sg_cpu->previous_util = util;
>>   }
>>   #else
>>   static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
>> -static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu) {}
>> +static void sugov_cpu_is_busy_update(struct sugov_cpu *sg_cpu
>> +				     unsigned long util)
>> +{}
>>   #endif /* CONFIG_NO_HZ_COMMON */
>>   
>>   /**
>> @@ -507,10 +529,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>>   	if (!sugov_should_update_freq(sg_policy, time))
>>   		return;
>>   
>> -	busy = sugov_cpu_is_busy(sg_cpu);
>> -	sugov_cpu_is_busy_update(sg_cpu);
>> -
>>   	util = sugov_get_util(sg_cpu);
>> +	busy = sugov_cpu_is_busy(sg_cpu);
>> +	sugov_cpu_is_busy_update(sg_cpu, util);
>>   	max = sg_cpu->max;
>>   	util = sugov_iowait_apply(sg_cpu, time, util, max);
>>   	next_f = get_next_freq(sg_policy, util, max);
>> @@ -545,12 +566,15 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>>   	struct cpufreq_policy *policy = sg_policy->policy;
>>   	unsigned long util = 0, max = 1;
>>   	unsigned int j;
>> +	unsigned long sg_cpu_util = 0;
>>   
>>   	for_each_cpu(j, policy->cpus) {
>>   		struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
>>   		unsigned long j_util, j_max;
>>   
>>   		j_util = sugov_get_util(j_sg_cpu);
>> +		if (j_sg_cpu == sg_cpu)
>> +			sg_cpu_util = j_util;
>>   		j_max = j_sg_cpu->max;
>>   		j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max);
>>   
>> @@ -560,6 +584,14 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>>   		}
>>   	}
>>   
>> +	/*
>> +	 * Only update the business status if we are looking at the CPU for
>> +	 * which a utilization change triggered a call to get_next_freq(). This
>> +	 * way, we don't affect the "busy" status of CPUs that don't have any
>> +	 * change in utilization.
>> +	 */
>> +	sugov_cpu_is_busy_update(sg_cpu, sg_cpu_util);
>> +
>>   	return get_next_freq(sg_policy, util, max);
>>   }
>>   
>> -- 
>> 2.21.0
>>
> 

Thanks,
Douglas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy
  2019-06-19 16:19     ` Douglas Raillard
@ 2019-06-20 11:05       ` Patrick Bellasi
  0 siblings, 0 replies; 36+ messages in thread
From: Patrick Bellasi @ 2019-06-20 11:05 UTC (permalink / raw)
  To: Douglas Raillard
  Cc: linux-kernel, linux-pm, mingo, peterz, quentin.perret, dietmar.eggemann

On 19-Jun 17:19, Douglas Raillard wrote:
> Hi Patrick,

Hi!

> On 5/16/19 1:55 PM, Patrick Bellasi wrote:
> > On 08-May 18:43, douglas.raillard@arm.com wrote:
> > > From: Douglas RAILLARD <douglas.raillard@arm.com>
> > > 
> > > Avoid assuming a CPU is busy when it has begun being idle before
> > > get_next_freq() is called. This is achieved by making sure the CPU will
> > > not be detected as busy by other CPUs whenever its utilization is
> > > decreasing.
> > 
> > If I understand it correctly, what you are after here is a "metric"
> > which tells you (in a shared performance domain) if a CPU has been
> > busy for a certain amount of time.
> > You do that by reworking the way idle_calls are accounted for the
> > sugov_update_single() case.
> > 
> > That approach could work but it looks a bit convoluted in the way it's
> > coded and it's also difficult to exclude there could be corner cases
> > with wired behaviors.
> > Isn't that why you "fix" the saved_idle_calls counter after all?
> > 
> > What about a different approach where we:
> > 
> > 1. we annotate the timestamp a CPU wakes up from IDLE (last_wakeup_time)
> > 
> > 2. we use that annotated last_wake_time and the rq->nr_running to
> >     define the "cpu is busy" heuristic.
> > 
> > Looking at a sibling CPU, I think we can end up with two main cases:
> > 
> > 1. CPU's nr_running is == 0
> >     then we don't consider busy that CPU
> > 
> > 2. CPU's nr_running is  > 0
> >     then the CPU is busy iff
> >        (current_time - last_wakeup_tim) >= busy_threshold
> > 
> > Notice that, when a CPU is active, its rq clock is periodically
> > updated, at least once per tick. Thus, provided a tick time is not too
> > long to declare busy a CPU, then the above logic should work.
> > 
> > Perhaps the busy_threshold can also be defined considering the PELT
> > dynamics and starting from an expected utilization increase converted
> > in time.
> 
> After experimenting with quite a few combinations, I managed to get a heuristic
> based on util_avg and util_est_enqueued that seems to work better in my case.
> Key differences are:
> * this new heuristic only really takes into account CFS signals
>  (current one based on idle calls takes into account everything that executes
>   on a given CPU.)

Right, that should not be an issue. You are after boosting for CFS
tasks at the end, while for RT and DL we don't need boosting since
they have their own mechanisms.

> * it will mark a CPU as busy less often, since it should only trigger when
>   there is a change in the utilization of a currently enqueued tasks.

That sounds right too.

>   Util changes due to enqueue/dequeue will not trigger it, which IMHO
>   is desirable, since we only want to bias frequency selection
>   when we know that we don't have precise utilization values for the
>   enqueued tasks (because the task has changed its behavior).

Agree.

Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-06-19 16:08         ` Douglas Raillard
@ 2019-06-20 13:04           ` Patrick Bellasi
  2019-06-21 10:17             ` Quentin Perret
  0 siblings, 1 reply; 36+ messages in thread
From: Patrick Bellasi @ 2019-06-20 13:04 UTC (permalink / raw)
  To: Douglas Raillard
  Cc: Quentin Perret, linux-kernel, linux-pm, mingo, peterz, dietmar.eggemann

On 19-Jun 17:08, Douglas Raillard wrote:
> Hi Patrick,
> 
> On 5/16/19 2:22 PM, Patrick Bellasi wrote:
> > On 16-May 14:01, Quentin Perret wrote:
> > > On Thursday 16 May 2019 at 13:42:00 (+0100), Patrick Bellasi wrote:
> > > > > +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
> > > > > +	unsigned long min_freq, unsigned long cost_margin)
> > > > > +{
> > > > > +	unsigned long max_cost = 0;
> > > > > +	struct em_cap_state *cs;
> > > > > +	int i;
> > > > > +
> > > > > +	if (!pd)
> > > > > +		return min_freq;
> > > > > +
> > > > > +	/* Compute the maximum allowed cost */
> > > > > +	for (i = 0; i < pd->nr_cap_states; i++) {
> > > > > +		cs = &pd->table[i];
> > > > > +		if (cs->frequency >= min_freq) {
> > > > > +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
> > > >                                                                           ^^^^
> > > > ... end here we should probably better use SCHED_CAPACITY_SCALE
> > > > instead of hard-coding in values, isn't it?
> > > 
> > > I'm not sure to agree. This isn't part of the scheduler per se, and the
> > > cost thing isn't in units of capacity, but in units of power, so I don't
> > > think SCHED_CAPACITY_SCALE is correct here.
> > 
> > Right, I get the units do not match and it would not be elegant to use
> > it here...
> > 
> > > But I agree these hard coded values (that one, and the 512 in one of the
> > > following patches) could use some motivation :-)
> > 
> > ... ultimately SCHED_CAPACITY_SCALE is just SCHED_FIXEDPOINT_SCALE,
> > which is adimensional. Perhaps we should use that or yet another alias
> > for the same.
> 
> Would it be a good idea to use SCHED_FIXEDPOINT_SCALE in energy.c ?
> Since it's not part of the scheduler, maybe there is a scale covering a wider scope,
> or we can introduce a similar ENERGY_FIXEDPOINT_SCALE in energy_model.h.

Well, in energy_model.c we have references to "capacity" and
"utilization" which are all SCHED_FIXEDPOINT_SCALE range values.
That symbol is defined in <linux/sched.h> and we already pull
in other <linux/sched/*> headers.

So, to me it seems it's not unreasonable to say that we use scheduler
related concepts and it makes more sense than introducing yet another
scaling factor.

But that's just my two cents ;)

Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-06-20 13:04           ` Patrick Bellasi
@ 2019-06-21 10:17             ` Quentin Perret
  2019-06-21 10:22               ` Quentin Perret
  0 siblings, 1 reply; 36+ messages in thread
From: Quentin Perret @ 2019-06-21 10:17 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Douglas Raillard, linux-kernel, linux-pm, mingo, peterz,
	dietmar.eggemann

On Thursday 20 Jun 2019 at 14:04:39 (+0100), Patrick Bellasi wrote:
> On 19-Jun 17:08, Douglas Raillard wrote:
> > Hi Patrick,
> > 
> > On 5/16/19 2:22 PM, Patrick Bellasi wrote:
> > > On 16-May 14:01, Quentin Perret wrote:
> > > > On Thursday 16 May 2019 at 13:42:00 (+0100), Patrick Bellasi wrote:
> > > > > > +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
> > > > > > +	unsigned long min_freq, unsigned long cost_margin)
> > > > > > +{
> > > > > > +	unsigned long max_cost = 0;
> > > > > > +	struct em_cap_state *cs;
> > > > > > +	int i;
> > > > > > +
> > > > > > +	if (!pd)
> > > > > > +		return min_freq;
> > > > > > +
> > > > > > +	/* Compute the maximum allowed cost */
> > > > > > +	for (i = 0; i < pd->nr_cap_states; i++) {
> > > > > > +		cs = &pd->table[i];
> > > > > > +		if (cs->frequency >= min_freq) {
> > > > > > +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
> > > > >                                                                           ^^^^
> > > > > ... end here we should probably better use SCHED_CAPACITY_SCALE
> > > > > instead of hard-coding in values, isn't it?
> > > > 
> > > > I'm not sure to agree. This isn't part of the scheduler per se, and the
> > > > cost thing isn't in units of capacity, but in units of power, so I don't
> > > > think SCHED_CAPACITY_SCALE is correct here.
> > > 
> > > Right, I get the units do not match and it would not be elegant to use
> > > it here...
> > > 
> > > > But I agree these hard coded values (that one, and the 512 in one of the
> > > > following patches) could use some motivation :-)
> > > 
> > > ... ultimately SCHED_CAPACITY_SCALE is just SCHED_FIXEDPOINT_SCALE,
> > > which is adimensional. Perhaps we should use that or yet another alias
> > > for the same.
> > 
> > Would it be a good idea to use SCHED_FIXEDPOINT_SCALE in energy.c ?
> > Since it's not part of the scheduler, maybe there is a scale covering a wider scope,
> > or we can introduce a similar ENERGY_FIXEDPOINT_SCALE in energy_model.h.
> 
> Well, in energy_model.c we have references to "capacity" and
> "utilization" which are all SCHED_FIXEDPOINT_SCALE range values.
> That symbol is defined in <linux/sched.h> and we already pull
> in other <linux/sched/*> headers.
> 
> So, to me it seems it's not unreasonable to say that we use scheduler
> related concepts and it makes more sense than introducing yet another
> scaling factor.
> 
> But that's just my two cents ;)

Perhaps use this ?

https://elixir.bootlin.com/linux/latest/source/include/linux/energy_model.h#L43

Thanks,

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq()
  2019-06-21 10:17             ` Quentin Perret
@ 2019-06-21 10:22               ` Quentin Perret
  0 siblings, 0 replies; 36+ messages in thread
From: Quentin Perret @ 2019-06-21 10:22 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Douglas Raillard, linux-kernel, linux-pm, mingo, peterz,
	dietmar.eggemann

On Friday 21 Jun 2019 at 11:17:05 (+0100), Quentin Perret wrote:
> On Thursday 20 Jun 2019 at 14:04:39 (+0100), Patrick Bellasi wrote:
> > On 19-Jun 17:08, Douglas Raillard wrote:
> > > Hi Patrick,
> > > 
> > > On 5/16/19 2:22 PM, Patrick Bellasi wrote:
> > > > On 16-May 14:01, Quentin Perret wrote:
> > > > > On Thursday 16 May 2019 at 13:42:00 (+0100), Patrick Bellasi wrote:
> > > > > > > +static inline unsigned long em_pd_get_higher_freq(struct em_perf_domain *pd,
> > > > > > > +	unsigned long min_freq, unsigned long cost_margin)
> > > > > > > +{
> > > > > > > +	unsigned long max_cost = 0;
> > > > > > > +	struct em_cap_state *cs;
> > > > > > > +	int i;
> > > > > > > +
> > > > > > > +	if (!pd)
> > > > > > > +		return min_freq;
> > > > > > > +
> > > > > > > +	/* Compute the maximum allowed cost */
> > > > > > > +	for (i = 0; i < pd->nr_cap_states; i++) {
> > > > > > > +		cs = &pd->table[i];
> > > > > > > +		if (cs->frequency >= min_freq) {
> > > > > > > +			max_cost = cs->cost + (cs->cost * cost_margin) / 1024;
> > > > > >                                                                           ^^^^
> > > > > > ... end here we should probably better use SCHED_CAPACITY_SCALE
> > > > > > instead of hard-coding in values, isn't it?
> > > > > 
> > > > > I'm not sure to agree. This isn't part of the scheduler per se, and the
> > > > > cost thing isn't in units of capacity, but in units of power, so I don't
> > > > > think SCHED_CAPACITY_SCALE is correct here.
> > > > 
> > > > Right, I get the units do not match and it would not be elegant to use
> > > > it here...
> > > > 
> > > > > But I agree these hard coded values (that one, and the 512 in one of the
> > > > > following patches) could use some motivation :-)
> > > > 
> > > > ... ultimately SCHED_CAPACITY_SCALE is just SCHED_FIXEDPOINT_SCALE,
> > > > which is adimensional. Perhaps we should use that or yet another alias
> > > > for the same.
> > > 
> > > Would it be a good idea to use SCHED_FIXEDPOINT_SCALE in energy.c ?
> > > Since it's not part of the scheduler, maybe there is a scale covering a wider scope,
> > > or we can introduce a similar ENERGY_FIXEDPOINT_SCALE in energy_model.h.
> > 
> > Well, in energy_model.c we have references to "capacity" and
> > "utilization" which are all SCHED_FIXEDPOINT_SCALE range values.
> > That symbol is defined in <linux/sched.h> and we already pull
> > in other <linux/sched/*> headers.
> > 
> > So, to me it seems it's not unreasonable to say that we use scheduler
> > related concepts and it makes more sense than introducing yet another
> > scaling factor.
> > 
> > But that's just my two cents ;)
> 
> Perhaps use this ?
> 
> https://elixir.bootlin.com/linux/latest/source/include/linux/energy_model.h#L43
> 

Nah, bad idea actually ... Sorry for the noise

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, back to index

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-08 17:42 [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware douglas.raillard
2019-05-08 17:42 ` douglas.raillard
2019-05-08 17:42 ` [RFC PATCH 1/7] PM: Introduce em_pd_get_higher_freq() douglas.raillard
2019-05-08 17:42   ` douglas.raillard
2019-05-16 12:42   ` Patrick Bellasi
2019-05-16 12:42     ` Patrick Bellasi
2019-05-16 13:01     ` Quentin Perret
2019-05-16 13:01       ` Quentin Perret
2019-05-16 13:22       ` Patrick Bellasi
2019-05-16 13:22         ` Patrick Bellasi
2019-06-19 16:08         ` Douglas Raillard
2019-06-20 13:04           ` Patrick Bellasi
2019-06-21 10:17             ` Quentin Perret
2019-06-21 10:22               ` Quentin Perret
2019-05-16 13:06     ` Douglas Raillard
2019-05-16 13:06       ` Douglas Raillard
2019-05-08 17:42 ` [RFC PATCH 2/7] sched/cpufreq: Attach perf domain to sugov policy douglas.raillard
2019-05-08 17:42   ` douglas.raillard
2019-05-08 17:42 ` [RFC PATCH 3/7] sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq() douglas.raillard
2019-05-08 17:42   ` douglas.raillard
2019-05-08 17:42 ` [RFC PATCH 4/7] sched/cpufreq: Move up sugov_cpu_is_busy() douglas.raillard
2019-05-08 17:42   ` douglas.raillard
2019-05-08 17:42 ` [RFC PATCH 5/7] sched/cpufreq: sugov_cpu_is_busy for shared policy douglas.raillard
2019-05-08 17:42   ` douglas.raillard
2019-05-08 17:43 ` [RFC PATCH 6/7] sched/cpufreq: Improve sugov_cpu_is_busy accuracy douglas.raillard
2019-05-08 17:43   ` douglas.raillard
2019-05-16 12:55   ` Patrick Bellasi
2019-05-16 12:55     ` Patrick Bellasi
2019-06-19 16:19     ` Douglas Raillard
2019-06-20 11:05       ` Patrick Bellasi
2019-05-08 17:43 ` [RFC PATCH 7/7] sched/cpufreq: Boost schedutil frequency ramp up douglas.raillard
2019-05-08 17:43   ` douglas.raillard
2019-05-13  7:12 ` [RFC PATCH 0/7] sched/cpufreq: Make schedutil energy aware Viresh Kumar
2019-05-13  7:12   ` Viresh Kumar
2019-05-13 13:52   ` Douglas Raillard
2019-05-13 13:52     ` Douglas Raillard

Linux-PM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pm/0 linux-pm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pm linux-pm/ https://lore.kernel.org/linux-pm \
		linux-pm@vger.kernel.org linux-pm@archiver.kernel.org
	public-inbox-index linux-pm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox