linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] cpufreq: avoid redundant driver calls in schedutil
@ 2016-05-26  2:52 Steve Muckle
  2016-05-26  2:52 ` [PATCH v2 1/3] cpufreq: add resolve_freq driver callback Steve Muckle
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Steve Muckle @ 2016-05-26  2:52 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Viresh Kumar
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

In the series [0] I included a patch which attempted to avoid redundant driver
calls in the schedutil governor by mapping the raw required CPU frequencies to
driver frequencies. This vastly increases the likelihood of detecting a
redundant cpufreq driver call, i.e. one which will end up attempting to set the
CPU frequency to the frequency it is already running at. The redundant call can
be avoided. This is especially valuable in the case of drivers which do not
support fast path updates or if remote CPU cpufreq callbacks are implemented.

Unfortunately the implementation of this in [0] walked the frequency table
directly in schedutil. Rafael pointed out that not all drivers may have a
frequency table and that a driver callback might be implemented to return the
driver frequency associated with a particular target frequency. The driver
could then also cache this lookup and possibly use it on an ensuing
fast_switch. This series implements that approach.

Given that this change is beneficial on its own I've split it out into its own
series from the remote callback support.

[0] https://lkml.org/lkml/2016/5/9/853

Steve Muckle (3):
  cpufreq: add resolve_freq driver callback
  cpufreq: acpi-cpufreq: add resolve_freq callback
  cpufreq: schedutil: map raw required frequency to driver frequency

 drivers/cpufreq/acpi-cpufreq.c   | 56 ++++++++++++++++++++++++++++++----------
 drivers/cpufreq/cpufreq.c        | 25 ++++++++++++++++++
 include/linux/cpufreq.h          | 11 ++++++++
 kernel/sched/cpufreq_schedutil.c | 30 +++++++++++++++------
 4 files changed, 101 insertions(+), 21 deletions(-)

-- 
2.4.10

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/3] cpufreq: add resolve_freq driver callback
  2016-05-26  2:52 [PATCH v2 0/3] cpufreq: avoid redundant driver calls in schedutil Steve Muckle
@ 2016-05-26  2:52 ` Steve Muckle
  2016-05-26  6:25   ` Viresh Kumar
  2016-05-31 11:14   ` Viresh Kumar
  2016-05-26  2:53 ` [PATCH v2 2/3] cpufreq: acpi-cpufreq: add resolve_freq callback Steve Muckle
  2016-05-26  2:53 ` [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency Steve Muckle
  2 siblings, 2 replies; 25+ messages in thread
From: Steve Muckle @ 2016-05-26  2:52 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Viresh Kumar
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

Cpufreq governors may need to know what a particular target frequency
maps to in the driver without necessarily wanting to set the frequency.
Support this operation via a new cpufreq API,
cpufreq_driver_resolve_freq().

The above API will call a new cpufreq driver callback, resolve_freq(),
if it has been registered by the driver. If that callback has not been
registered and a frequency table is available then the frequency table
is walked using cpufreq_frequency_table_target().

UINT_MAX is returned if no driver callback or frequency table is
available.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 25 +++++++++++++++++++++++++
 include/linux/cpufreq.h   | 11 +++++++++++
 2 files changed, 36 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 77d77a4e3b74..3b44f4bdc071 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1849,6 +1849,31 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
 }
 EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
 
+unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
+					 unsigned int target_freq)
+{
+	struct cpufreq_frequency_table *freq_table;
+	int index, retval;
+
+	clamp_val(target_freq, policy->min, policy->max);
+
+	if (cpufreq_driver->resolve_freq)
+		return cpufreq_driver->resolve_freq(policy, target_freq);
+
+	freq_table = cpufreq_frequency_get_table(policy->cpu);
+	if (!freq_table)
+		return UINT_MAX;
+
+	retval = cpufreq_frequency_table_target(policy, freq_table,
+						target_freq, CPUFREQ_RELATION_L,
+						&index);
+	if (retval)
+		return UINT_MAX;
+
+	return freq_table[index].frequency;
+}
+EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
+
 /* Must set freqs->new to intermediate frequency */
 static int __target_intermediate(struct cpufreq_policy *policy,
 				 struct cpufreq_freqs *freqs, int index)
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 4e81e08db752..675f17f98e75 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -271,6 +271,13 @@ struct cpufreq_driver {
 	int		(*target_intermediate)(struct cpufreq_policy *policy,
 					       unsigned int index);
 
+	/*
+	 * Return the driver-supported frequency that a particular target
+	 * frequency maps to (does not set the new frequency).
+	 */
+	unsigned int	(*resolve_freq)(struct cpufreq_policy *policy,
+					unsigned int target_freq);
+
 	/* should be defined, if possible */
 	unsigned int	(*get)(unsigned int cpu);
 
@@ -487,6 +494,10 @@ int cpufreq_driver_target(struct cpufreq_policy *policy,
 int __cpufreq_driver_target(struct cpufreq_policy *policy,
 				   unsigned int target_freq,
 				   unsigned int relation);
+
+unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
+					 unsigned int target_freq);
+
 int cpufreq_register_governor(struct cpufreq_governor *governor);
 void cpufreq_unregister_governor(struct cpufreq_governor *governor);
 
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/3] cpufreq: acpi-cpufreq: add resolve_freq callback
  2016-05-26  2:52 [PATCH v2 0/3] cpufreq: avoid redundant driver calls in schedutil Steve Muckle
  2016-05-26  2:52 ` [PATCH v2 1/3] cpufreq: add resolve_freq driver callback Steve Muckle
@ 2016-05-26  2:53 ` Steve Muckle
  2016-05-26  6:43   ` Viresh Kumar
  2016-05-26  2:53 ` [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency Steve Muckle
  2 siblings, 1 reply; 25+ messages in thread
From: Steve Muckle @ 2016-05-26  2:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Viresh Kumar
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

Support the new resolve_freq cpufreq callback which resolves a target
frequency to a driver-supported frequency without actually setting it.

The target frequency and resolved frequency table entry are cached so
that a subsequent fast_switch operation may avoid the frequency table
walk assuming the requested target frequency is the same.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 drivers/cpufreq/acpi-cpufreq.c | 56 ++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 13 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 7f38fb55f223..d87962eda1ed 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -66,6 +66,8 @@ enum {
 
 struct acpi_cpufreq_data {
 	struct cpufreq_frequency_table *freq_table;
+	unsigned int cached_lookup_freq;
+	struct cpufreq_frequency_table *cached_lookup_entry;
 	unsigned int resume;
 	unsigned int cpu_feature;
 	unsigned int acpi_perf_cpu;
@@ -458,26 +460,53 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
 	return result;
 }
 
-unsigned int acpi_cpufreq_fast_switch(struct cpufreq_policy *policy,
-				      unsigned int target_freq)
+/*
+ * Find the closest frequency above target_freq.
+ *
+ * The table is sorted in the reverse order with respect to the
+ * frequency and all of the entries are valid (see the initialization).
+ */
+static inline struct cpufreq_frequency_table
+*lookup_freq(struct cpufreq_frequency_table *table, unsigned int target_freq)
 {
-	struct acpi_cpufreq_data *data = policy->driver_data;
-	struct acpi_processor_performance *perf;
-	struct cpufreq_frequency_table *entry;
-	unsigned int next_perf_state, next_freq, freq;
+	struct cpufreq_frequency_table *entry = table;
+	unsigned int freq;
 
-	/*
-	 * Find the closest frequency above target_freq.
-	 *
-	 * The table is sorted in the reverse order with respect to the
-	 * frequency and all of the entries are valid (see the initialization).
-	 */
-	entry = data->freq_table;
 	do {
 		entry++;
 		freq = entry->frequency;
 	} while (freq >= target_freq && freq != CPUFREQ_TABLE_END);
 	entry--;
+
+	return entry;
+}
+
+unsigned int acpi_cpufreq_resolve_freq(struct cpufreq_policy *policy,
+				       unsigned int target_freq)
+{
+	struct acpi_cpufreq_data *data = policy->driver_data;
+	struct cpufreq_frequency_table *entry;
+
+	data->cached_lookup_freq = target_freq;
+	entry = lookup_freq(data->freq_table, target_freq);
+	data->cached_lookup_entry = entry;
+
+	return entry->frequency;
+}
+
+unsigned int acpi_cpufreq_fast_switch(struct cpufreq_policy *policy,
+				      unsigned int target_freq)
+{
+	struct acpi_cpufreq_data *data = policy->driver_data;
+	struct acpi_processor_performance *perf;
+	struct cpufreq_frequency_table *entry;
+	unsigned int next_perf_state, next_freq;
+
+	if (data->cached_lookup_entry &&
+	    data->cached_lookup_freq == target_freq)
+		entry = data->cached_lookup_entry;
+	else
+		entry = lookup_freq(data->freq_table, target_freq);
 	next_freq = entry->frequency;
 	next_perf_state = entry->driver_data;
 
@@ -918,6 +947,7 @@ static struct cpufreq_driver acpi_cpufreq_driver = {
 	.verify		= cpufreq_generic_frequency_table_verify,
 	.target_index	= acpi_cpufreq_target,
 	.fast_switch	= acpi_cpufreq_fast_switch,
+	.resolve_freq	= acpi_cpufreq_resolve_freq,
 	.bios_limit	= acpi_processor_get_bios_limit,
 	.init		= acpi_cpufreq_cpu_init,
 	.exit		= acpi_cpufreq_cpu_exit,
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-26  2:52 [PATCH v2 0/3] cpufreq: avoid redundant driver calls in schedutil Steve Muckle
  2016-05-26  2:52 ` [PATCH v2 1/3] cpufreq: add resolve_freq driver callback Steve Muckle
  2016-05-26  2:53 ` [PATCH v2 2/3] cpufreq: acpi-cpufreq: add resolve_freq callback Steve Muckle
@ 2016-05-26  2:53 ` Steve Muckle
  2016-05-26  7:16   ` Viresh Kumar
  2016-05-27  5:41   ` Wanpeng Li
  2 siblings, 2 replies; 25+ messages in thread
From: Steve Muckle @ 2016-05-26  2:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Viresh Kumar
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

The slow-path frequency transition path is relatively expensive as it
requires waking up a thread to do work. Should support be added for
remote CPU cpufreq updates that is also expensive since it requires an
IPI. These activities should be avoided if they are not necessary.

To that end, calculate the actual driver-supported frequency required by
the new utilization value in schedutil by using the recently added
cpufreq_driver_resolve_freq callback. If it is the same as the
previously requested driver frequency then there is no need to continue
with the update assuming the cpu frequency limits have not changed. This
will have additional benefits should the semantics of the rate limit be
changed to apply solely to frequency transitions rather than to
frequency calculations in schedutil.

The last raw required frequency is cached. This allows the driver
frequency lookup to be skipped in the event that the new raw required
frequency matches the last one, assuming a frequency update has not been
forced due to limits changing (indicated by a next_freq value of
UINT_MAX, see sugov_should_update_freq).

Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 154ae3a51e86..ef73ca4d8d78 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -45,6 +45,8 @@ struct sugov_cpu {
 	struct update_util_data update_util;
 	struct sugov_policy *sg_policy;
 
+	unsigned int cached_raw_freq;
+
 	/* The fields below are only needed when sharing a policy. */
 	unsigned long util;
 	unsigned long max;
@@ -104,7 +106,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
 
 /**
  * get_next_freq - Compute a new frequency for a given cpufreq policy.
- * @policy: cpufreq policy object to compute the new frequency for.
+ * @sg_cpu: schedutil cpu object to compute the new frequency for.
  * @util: Current CPU utilization.
  * @max: CPU capacity.
  *
@@ -119,14 +121,24 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
  * next_freq = C * curr_freq * util_raw / max
  *
  * Take C = 1.25 for the frequency tipping point at (util / max) = 0.8.
+ *
+ * The lowest driver-supported frequency which is equal or greater than the raw
+ * next_freq (as calculated above) is returned.
  */
-static unsigned int get_next_freq(struct cpufreq_policy *policy,
-				  unsigned long util, unsigned long max)
+static unsigned int get_next_freq(struct sugov_cpu *sg_cpu, unsigned long util,
+				  unsigned long max)
 {
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
+	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned int freq = arch_scale_freq_invariant() ?
 				policy->cpuinfo.max_freq : policy->cur;
 
-	return (freq + (freq >> 2)) * util / max;
+	freq = (freq + (freq >> 2)) * util / max;
+
+	if (freq == sg_cpu->cached_raw_freq && sg_policy->next_freq != UINT_MAX)
+		return sg_policy->next_freq;
+	sg_cpu->cached_raw_freq = freq;
+	return cpufreq_driver_resolve_freq(policy, freq);
 }
 
 static void sugov_update_single(struct update_util_data *hook, u64 time,
@@ -141,13 +153,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 		return;
 
 	next_f = util == ULONG_MAX ? policy->cpuinfo.max_freq :
-			get_next_freq(policy, util, max);
+			get_next_freq(sg_cpu, util, max);
 	sugov_update_commit(sg_policy, time, next_f);
 }
 
-static unsigned int sugov_next_freq_shared(struct sugov_policy *sg_policy,
+static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu,
 					   unsigned long util, unsigned long max)
 {
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned int max_f = policy->cpuinfo.max_freq;
 	u64 last_freq_update_time = sg_policy->last_freq_update_time;
@@ -187,7 +200,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_policy *sg_policy,
 		}
 	}
 
-	return get_next_freq(policy, util, max);
+	return get_next_freq(sg_cpu, util, max);
 }
 
 static void sugov_update_shared(struct update_util_data *hook, u64 time,
@@ -204,7 +217,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 	sg_cpu->last_update = time;
 
 	if (sugov_should_update_freq(sg_policy, time)) {
-		next_f = sugov_next_freq_shared(sg_policy, util, max);
+		next_f = sugov_next_freq_shared(sg_cpu, util, max);
 		sugov_update_commit(sg_policy, time, next_f);
 	}
 
@@ -432,6 +445,7 @@ static int sugov_start(struct cpufreq_policy *policy)
 			sg_cpu->util = ULONG_MAX;
 			sg_cpu->max = 0;
 			sg_cpu->last_update = 0;
+			sg_cpu->cached_raw_freq = 0;
 			cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
 						     sugov_update_shared);
 		} else {
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/3] cpufreq: add resolve_freq driver callback
  2016-05-26  2:52 ` [PATCH v2 1/3] cpufreq: add resolve_freq driver callback Steve Muckle
@ 2016-05-26  6:25   ` Viresh Kumar
  2016-05-30 15:31     ` Steve Muckle
  2016-05-31 11:14   ` Viresh Kumar
  1 sibling, 1 reply; 25+ messages in thread
From: Viresh Kumar @ 2016-05-26  6:25 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, linux-kernel,
	linux-pm, Vincent Guittot, Morten Rasmussen, Dietmar Eggemann,
	Juri Lelli, Patrick Bellasi, Michael Turquette

On 25-05-16, 19:52, Steve Muckle wrote:
> Cpufreq governors may need to know what a particular target frequency
> maps to in the driver without necessarily wanting to set the frequency.
> Support this operation via a new cpufreq API,
> cpufreq_driver_resolve_freq().
> 
> The above API will call a new cpufreq driver callback, resolve_freq(),
> if it has been registered by the driver. If that callback has not been
> registered and a frequency table is available then the frequency table
> is walked using cpufreq_frequency_table_target().
> 
> UINT_MAX is returned if no driver callback or frequency table is
> available.

Why should we return UINT_MAX here? We should return target_freq, no ?

> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Steve Muckle <smuckle@linaro.org>
> ---
>  drivers/cpufreq/cpufreq.c | 25 +++++++++++++++++++++++++
>  include/linux/cpufreq.h   | 11 +++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 77d77a4e3b74..3b44f4bdc071 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1849,6 +1849,31 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
>  
> +unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> +					 unsigned int target_freq)
> +{
> +	struct cpufreq_frequency_table *freq_table;
> +	int index, retval;
> +
> +	clamp_val(target_freq, policy->min, policy->max);
> +
> +	if (cpufreq_driver->resolve_freq)
> +		return cpufreq_driver->resolve_freq(policy, target_freq);
> +
> +	freq_table = cpufreq_frequency_get_table(policy->cpu);

I have sent a separate patch to provide a light weight alternative to
this. If that gets accepted, we can switch over to using it.

> +	if (!freq_table)
> +		return UINT_MAX;
> +
> +	retval = cpufreq_frequency_table_target(policy, freq_table,
> +						target_freq, CPUFREQ_RELATION_L,
> +						&index);
> +	if (retval)
> +		return UINT_MAX;
> +
> +	return freq_table[index].frequency;
> +}
> +EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
> +
>  /* Must set freqs->new to intermediate frequency */
>  static int __target_intermediate(struct cpufreq_policy *policy,
>  				 struct cpufreq_freqs *freqs, int index)
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 4e81e08db752..675f17f98e75 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -271,6 +271,13 @@ struct cpufreq_driver {
>  	int		(*target_intermediate)(struct cpufreq_policy *policy,
>  					       unsigned int index);
>  
> +	/*
> +	 * Return the driver-supported frequency that a particular target
> +	 * frequency maps to (does not set the new frequency).
> +	 */
> +	unsigned int	(*resolve_freq)(struct cpufreq_policy *policy,
> +					unsigned int target_freq);

We have 3 categories of cpufreq-drivers today:
1. setpolicy drivers: They don't use the cpufreq governors we are
   working on.
2. non-setpolicy drivers:
  A. with ->target_index() callback, these will always provide a
     freq-table.
  B. with ->target() callback, ONLY these should be allowed to provide
     the ->resolve_freq() callback and no one else.

And so I would suggest adding an additional check in
cpufreq_register_driver() to catch incorrect usage of this callback.

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] cpufreq: acpi-cpufreq: add resolve_freq callback
  2016-05-26  2:53 ` [PATCH v2 2/3] cpufreq: acpi-cpufreq: add resolve_freq callback Steve Muckle
@ 2016-05-26  6:43   ` Viresh Kumar
  2016-05-30 16:20     ` Steve Muckle
  0 siblings, 1 reply; 25+ messages in thread
From: Viresh Kumar @ 2016-05-26  6:43 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, linux-kernel,
	linux-pm, Vincent Guittot, Morten Rasmussen, Dietmar Eggemann,
	Juri Lelli, Patrick Bellasi, Michael Turquette

On 25-05-16, 19:53, Steve Muckle wrote:
> Support the new resolve_freq cpufreq callback which resolves a target
> frequency to a driver-supported frequency without actually setting it.

And here is the first abuser of this API as I was talking about in the
earlier patch :)

But, I know why you are doing it and I think we can do it differently.

So, lets assume that the ->resolve_freq() callback will ONLY be
provided by the drivers which also provide a ->target() callback.

i.e. not by acpi-cpufreq at least.

> The target frequency and resolved frequency table entry are cached so
> that a subsequent fast_switch operation may avoid the frequency table
> walk assuming the requested target frequency is the same.
> 
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Steve Muckle <smuckle@linaro.org>
> ---
>  drivers/cpufreq/acpi-cpufreq.c | 56 ++++++++++++++++++++++++++++++++----------
>  1 file changed, 43 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> index 7f38fb55f223..d87962eda1ed 100644
> --- a/drivers/cpufreq/acpi-cpufreq.c
> +++ b/drivers/cpufreq/acpi-cpufreq.c
> @@ -66,6 +66,8 @@ enum {
>  
>  struct acpi_cpufreq_data {
>  	struct cpufreq_frequency_table *freq_table;
> +	unsigned int cached_lookup_freq;
> +	struct cpufreq_frequency_table *cached_lookup_entry;

This could have been an integer value 'Index', which could have been
used together with the freq-table to get the freq we wanted.

And, then we can move these two fields into the cpufreq_policy
structure and make them part of the first patch itself.

>  	unsigned int resume;
>  	unsigned int cpu_feature;
>  	unsigned int acpi_perf_cpu;
> @@ -458,26 +460,53 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
>  	return result;
>  }
>  
> -unsigned int acpi_cpufreq_fast_switch(struct cpufreq_policy *policy,
> -				      unsigned int target_freq)
> +/*
> + * Find the closest frequency above target_freq.
> + *
> + * The table is sorted in the reverse order with respect to the
> + * frequency and all of the entries are valid (see the initialization).
> + */
> +static inline struct cpufreq_frequency_table
> +*lookup_freq(struct cpufreq_frequency_table *table, unsigned int target_freq)
>  {
> -	struct acpi_cpufreq_data *data = policy->driver_data;
> -	struct acpi_processor_performance *perf;
> -	struct cpufreq_frequency_table *entry;
> -	unsigned int next_perf_state, next_freq, freq;
> +	struct cpufreq_frequency_table *entry = table;
> +	unsigned int freq;
>  
> -	/*
> -	 * Find the closest frequency above target_freq.
> -	 *
> -	 * The table is sorted in the reverse order with respect to the
> -	 * frequency and all of the entries are valid (see the initialization).
> -	 */
> -	entry = data->freq_table;
>  	do {
>  		entry++;
>  		freq = entry->frequency;
>  	} while (freq >= target_freq && freq != CPUFREQ_TABLE_END);
>  	entry--;
> +
> +	return entry;
> +}
> +
> +unsigned int acpi_cpufreq_resolve_freq(struct cpufreq_policy *policy,
> +				       unsigned int target_freq)
> +{
> +	struct acpi_cpufreq_data *data = policy->driver_data;
> +	struct cpufreq_frequency_table *entry;
> +
> +	data->cached_lookup_freq = target_freq;
> +	entry = lookup_freq(data->freq_table, target_freq);
> +	data->cached_lookup_entry = entry;
> +
> +	return entry->frequency;
> +}
> +

And then we could remove this callback from acpi-cpufreq.

> +unsigned int acpi_cpufreq_fast_switch(struct cpufreq_policy *policy,
> +				      unsigned int target_freq)
> +{
> +	struct acpi_cpufreq_data *data = policy->driver_data;
> +	struct acpi_processor_performance *perf;
> +	struct cpufreq_frequency_table *entry;
> +	unsigned int next_perf_state, next_freq;
> +
> +	if (data->cached_lookup_entry &&
> +	    data->cached_lookup_freq == target_freq)
> +		entry = data->cached_lookup_entry;
> +	else
> +		entry = lookup_freq(data->freq_table, target_freq);

And a static inline callback in cpufreq.h to get this entry.

>  	next_freq = entry->frequency;
>  	next_perf_state = entry->driver_data;
>  
> @@ -918,6 +947,7 @@ static struct cpufreq_driver acpi_cpufreq_driver = {
>  	.verify		= cpufreq_generic_frequency_table_verify,
>  	.target_index	= acpi_cpufreq_target,
>  	.fast_switch	= acpi_cpufreq_fast_switch,
> +	.resolve_freq	= acpi_cpufreq_resolve_freq,
>  	.bios_limit	= acpi_processor_get_bios_limit,
>  	.init		= acpi_cpufreq_cpu_init,
>  	.exit		= acpi_cpufreq_cpu_exit,

Sounds reasonable ?

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-26  2:53 ` [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency Steve Muckle
@ 2016-05-26  7:16   ` Viresh Kumar
  2016-05-29  0:40     ` Rafael J. Wysocki
  2016-05-30 16:35     ` Steve Muckle
  2016-05-27  5:41   ` Wanpeng Li
  1 sibling, 2 replies; 25+ messages in thread
From: Viresh Kumar @ 2016-05-26  7:16 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, linux-kernel,
	linux-pm, Vincent Guittot, Morten Rasmussen, Dietmar Eggemann,
	Juri Lelli, Patrick Bellasi, Michael Turquette

On 25-05-16, 19:53, Steve Muckle wrote:
> The slow-path frequency transition path is relatively expensive as it
> requires waking up a thread to do work. Should support be added for
> remote CPU cpufreq updates that is also expensive since it requires an
> IPI. These activities should be avoided if they are not necessary.
> 
> To that end, calculate the actual driver-supported frequency required by
> the new utilization value in schedutil by using the recently added
> cpufreq_driver_resolve_freq callback. If it is the same as the
> previously requested driver frequency then there is no need to continue
> with the update assuming the cpu frequency limits have not changed. This
> will have additional benefits should the semantics of the rate limit be
> changed to apply solely to frequency transitions rather than to
> frequency calculations in schedutil.

Maybe mention here that this patch isn't avoiding those IPIs yet, I
got an impression earlier that they are avoided with it.

> The last raw required frequency is cached. This allows the driver
> frequency lookup to be skipped in the event that the new raw required
> frequency matches the last one, assuming a frequency update has not been
> forced due to limits changing (indicated by a next_freq value of
> UINT_MAX, see sugov_should_update_freq).

I am not sure this is required, see below..

> Signed-off-by: Steve Muckle <smuckle@linaro.org>
> ---
>  kernel/sched/cpufreq_schedutil.c | 30 ++++++++++++++++++++++--------
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 154ae3a51e86..ef73ca4d8d78 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -45,6 +45,8 @@ struct sugov_cpu {
>  	struct update_util_data update_util;
>  	struct sugov_policy *sg_policy;
>  
> +	unsigned int cached_raw_freq;
> +
>  	/* The fields below are only needed when sharing a policy. */
>  	unsigned long util;
>  	unsigned long max;
> @@ -104,7 +106,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>  
>  /**
>   * get_next_freq - Compute a new frequency for a given cpufreq policy.
> - * @policy: cpufreq policy object to compute the new frequency for.
> + * @sg_cpu: schedutil cpu object to compute the new frequency for.
>   * @util: Current CPU utilization.
>   * @max: CPU capacity.
>   *
> @@ -119,14 +121,24 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>   * next_freq = C * curr_freq * util_raw / max
>   *
>   * Take C = 1.25 for the frequency tipping point at (util / max) = 0.8.
> + *
> + * The lowest driver-supported frequency which is equal or greater than the raw
> + * next_freq (as calculated above) is returned.
>   */
> -static unsigned int get_next_freq(struct cpufreq_policy *policy,
> -				  unsigned long util, unsigned long max)
> +static unsigned int get_next_freq(struct sugov_cpu *sg_cpu, unsigned long util,
> +				  unsigned long max)
>  {
> +	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> +	struct cpufreq_policy *policy = sg_policy->policy;
>  	unsigned int freq = arch_scale_freq_invariant() ?
>  				policy->cpuinfo.max_freq : policy->cur;
>  
> -	return (freq + (freq >> 2)) * util / max;
> +	freq = (freq + (freq >> 2)) * util / max;
> +
> +	if (freq == sg_cpu->cached_raw_freq && sg_policy->next_freq != UINT_MAX)
> +		return sg_policy->next_freq;

I am not sure what the benefit of the second comparison to UINT_MAX
is. The output of below code will be same if the freq was ==
cached_raw_freq, no matter what.

I also have a doubt (I am quite sure Rafael will have a reason for
that, which I am failing to understand now), on why we are doing
next_freq == UINT_MAX in sugov_should_update_freq().

I understand that because the limits might have changed,
need_freq_update would have been set to true. We should evaluate
next-freq again without worrying about the load or the time since last
evaluation.

But what will happen by forcefully calling the cpufreq routines to
change the frequency, if next_freq hasn't changed even after limits
updates? Wouldn't that call always return early because the new freq
and the current freq are going to be same ?

@Rafael: Sorry for asking this so late :(

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-26  2:53 ` [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency Steve Muckle
  2016-05-26  7:16   ` Viresh Kumar
@ 2016-05-27  5:41   ` Wanpeng Li
  2016-05-30 16:48     ` Steve Muckle
  1 sibling, 1 reply; 25+ messages in thread
From: Wanpeng Li @ 2016-05-27  5:41 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Viresh Kumar,
	linux-kernel, Linux PM list, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

2016-05-26 10:53 GMT+08:00 Steve Muckle <steve.muckle@linaro.org>:
> The slow-path frequency transition path is relatively expensive as it
> requires waking up a thread to do work. Should support be added for
> remote CPU cpufreq updates that is also expensive since it requires an
> IPI. These activities should be avoided if they are not necessary.
>
> To that end, calculate the actual driver-supported frequency required by
> the new utilization value in schedutil by using the recently added
> cpufreq_driver_resolve_freq callback. If it is the same as the
> previously requested driver frequency then there is no need to continue
> with the update assuming the cpu frequency limits have not changed. This
> will have additional benefits should the semantics of the rate limit be
> changed to apply solely to frequency transitions rather than to
> frequency calculations in schedutil.

sugov_should_update_freq() still be called before get_nex_freq() after
the patch applied, so rate limit still apply to both frequency
transitions and frequency calculations, right?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-26  7:16   ` Viresh Kumar
@ 2016-05-29  0:40     ` Rafael J. Wysocki
  2016-05-30 10:18       ` Viresh Kumar
  2016-05-30 16:35     ` Steve Muckle
  1 sibling, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-05-29  0:40 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette

On Thu, May 26, 2016 at 9:16 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 25-05-16, 19:53, Steve Muckle wrote:
>> The slow-path frequency transition path is relatively expensive as it
>> requires waking up a thread to do work. Should support be added for
>> remote CPU cpufreq updates that is also expensive since it requires an
>> IPI. These activities should be avoided if they are not necessary.
>>
>> To that end, calculate the actual driver-supported frequency required by
>> the new utilization value in schedutil by using the recently added
>> cpufreq_driver_resolve_freq callback. If it is the same as the
>> previously requested driver frequency then there is no need to continue
>> with the update assuming the cpu frequency limits have not changed. This
>> will have additional benefits should the semantics of the rate limit be
>> changed to apply solely to frequency transitions rather than to
>> frequency calculations in schedutil.

[cut]

> I also have a doubt (I am quite sure Rafael will have a reason for
> that, which I am failing to understand now), on why we are doing
> next_freq == UINT_MAX in sugov_should_update_freq().
>
> I understand that because the limits might have changed,
> need_freq_update would have been set to true. We should evaluate
> next-freq again without worrying about the load or the time since last
> evaluation.

This is in response to the "limits" event (or to the ->limits call
after my recent patches).  That event basically means "something has
changed, so if you have cached anything, invalidate it" to the
governor.  Accordingly, it invalidates next_freq, because that's a
cached value.

> But what will happen by forcefully calling the cpufreq routines to
> change the frequency, if next_freq hasn't changed even after limits
> updates?

I can't really parse the above question, so I'm not going to try to
answer it. :-)

> Wouldn't that call always return early because the new freq
> and the current freq are going to be same ?
>
> @Rafael: Sorry for asking this so late :(

It is not too late.  If there's a problem somewhere, it needs to be
fixed, but at this point I have no idea what you are asking about.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-29  0:40     ` Rafael J. Wysocki
@ 2016-05-30 10:18       ` Viresh Kumar
  2016-05-30 14:25         ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Viresh Kumar @ 2016-05-30 10:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette

On 29-05-16, 02:40, Rafael J. Wysocki wrote:
> I can't really parse the above question, so I'm not going to try to
> answer it. :-)

Sorry about that :(

IOW, I think that we should make this change into the sched-governor (I will
send a patch separately if you agree to this):

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 14c4aa25cc45..5934b14aa21c 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -66,11 +66,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 
        if (unlikely(sg_policy->need_freq_update)) {
                sg_policy->need_freq_update = false;
-               /*
-                * This happens when limits change, so forget the previous
-                * next_freq value and force an update.
-                */
-               sg_policy->next_freq = UINT_MAX;
                return true;
        }

And here is my reasoning behind this.

Can you show me any case, where the above code (as present in mainline
today) leads to a freq-change? I couldn't find any.

Let me demonstrate.

Following only talks about the fast-switch path, the other path is
same as well.

Suppose this is the current range of frequencies supported by a
driver: 200, 400, 600, 800, 1000 (in MHz).

And policy->cur = next_freq = 400 MHz.

A.) Suppose that we change policy->min to 400 MHz from userspace.
    -> sugov_limits()
       This will find everything in order and simply set
       need_freq_update, without updating the frequency.

    On next util-callback, we will forcefully return true from
    sugov_should_update_freq() and reach sugov_update_commit().

    We calculate next_freq and that comes to 400 MHz again (that's the
    case we are trying to target with the above code).

    With the current code, we will forcefully end up calling
    cpufreq_driver_fast_switch().

    Because the new and current frequencies are same,
    cpufreq_driver->fast_switch() will simply return.

    NOTE: I also think that cpufreq_driver_fast_switch() should have a
    check like (policy->cur == target_freq). I will add that too, in
    case you agree.

    So, forcefully updating next_freq to UINT_MAX will end up wasting
    some cycles, but wouldn't do any useful stuff.

B.) Suppose that we change policy->min to 600 MHz from userspace.
    -> sugov_limits()
       This will find that policy->cur is less than 600 and will set
       that to 600 MHz by calling __cpufreq_driver_target(). We will
       also set need_freq_update.

       Note that next_freq and policy->cur are not in sync anymore and
       perhaps this is the most important case for the above code.

    On next util-callback, we will forcefully return true from
    sugov_should_update_freq() and reach sugov_update_commit().

    We calculate next_freq and lets say that comes to 400 MHz again
    (as that's the case we are trying to target with the above code).

    With the current code, we will forcefully end up calling
    cpufreq_driver_fast_switch().

    Because next_freq() is not part of the new range, we will clamp it
    and set it to 600 MHz eventually. Again, next and current
    frequencies are same, cpufreq_driver->fast_switch() will simply
    return.

    So, forcefully updating next_freq to UINT_MAX will end up wasting
    some cycles here as well, but would do any useful stuff.


What am I missing ?

-- 
viresh

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-30 10:18       ` Viresh Kumar
@ 2016-05-30 14:25         ` Rafael J. Wysocki
  2016-05-30 15:32           ` Viresh Kumar
  2016-05-31  1:49           ` Wanpeng Li
  0 siblings, 2 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-05-30 14:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette

On Mon, May 30, 2016 at 12:18 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 29-05-16, 02:40, Rafael J. Wysocki wrote:
>> I can't really parse the above question, so I'm not going to try to
>> answer it. :-)
>
> Sorry about that :(
>
> IOW, I think that we should make this change into the sched-governor (I will
> send a patch separately if you agree to this):

I don't.

> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 14c4aa25cc45..5934b14aa21c 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -66,11 +66,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>
>         if (unlikely(sg_policy->need_freq_update)) {
>                 sg_policy->need_freq_update = false;
> -               /*
> -                * This happens when limits change, so forget the previous
> -                * next_freq value and force an update.
> -                */
> -               sg_policy->next_freq = UINT_MAX;
>                 return true;
>         }
>
> And here is my reasoning behind this.
>
> Can you show me any case, where the above code (as present in mainline
> today) leads to a freq-change? I couldn't find any.
>
> Let me demonstrate.
>
> Following only talks about the fast-switch path, the other path is
> same as well.
>
> Suppose this is the current range of frequencies supported by a
> driver: 200, 400, 600, 800, 1000 (in MHz).
>
> And policy->cur = next_freq = 400 MHz.
>
> A.) Suppose that we change policy->min to 400 MHz from userspace.
>     -> sugov_limits()
>        This will find everything in order and simply set
>        need_freq_update, without updating the frequency.
>
>     On next util-callback, we will forcefully return true from
>     sugov_should_update_freq() and reach sugov_update_commit().
>
>     We calculate next_freq and that comes to 400 MHz again (that's the
>     case we are trying to target with the above code).
>
>     With the current code, we will forcefully end up calling
>     cpufreq_driver_fast_switch().
>
>     Because the new and current frequencies are same,
>     cpufreq_driver->fast_switch() will simply return.
>
>     NOTE: I also think that cpufreq_driver_fast_switch() should have a
>     check like (policy->cur == target_freq). I will add that too, in
>     case you agree.
>
>     So, forcefully updating next_freq to UINT_MAX will end up wasting
>     some cycles, but wouldn't do any useful stuff.

It will, but there's no way to distinguish this case from B in the
governor with the current min/max synchronization mechanism.  That is,
it only knows that something has changed, but checking what exactly
has changed would be racy.

> B.) Suppose that we change policy->min to 600 MHz from userspace.
>     -> sugov_limits()
>        This will find that policy->cur is less than 600 and will set
>        that to 600 MHz by calling __cpufreq_driver_target(). We will
>        also set need_freq_update.
>
>        Note that next_freq and policy->cur are not in sync anymore and
>        perhaps this is the most important case for the above code.

It is.

Moreover, please note that __cpufreq_driver_target() is only called in
sugov_limits() when policy->fast_switch_enabled is unset.

>     On next util-callback, we will forcefully return true from
>     sugov_should_update_freq() and reach sugov_update_commit().
>
>     We calculate next_freq and lets say that comes to 400 MHz again
>     (as that's the case we are trying to target with the above code).
>
>     With the current code, we will forcefully end up calling
>     cpufreq_driver_fast_switch().
>
>     Because next_freq() is not part of the new range, we will clamp it
>     and set it to 600 MHz eventually. Again, next and current
>     frequencies are same, cpufreq_driver->fast_switch() will simply
>     return.

Not really (as per the above).

And even in the !fast_switch_enabled case, if next_freq stays at 400
MHz (which is different from policy->cur), it may lead to suboptimal
decisions going forward (eg. if it goes to 600 MHz next time and the
governor will think that the frequency has changed, although in fact
it hasn't).

I guess the !fast_switch_enabled case might be optimized by comparing
next_freq with policy->cur at one point, but next_freq should be
updated regardless IMO.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/3] cpufreq: add resolve_freq driver callback
  2016-05-26  6:25   ` Viresh Kumar
@ 2016-05-30 15:31     ` Steve Muckle
  2016-05-31  5:30       ` Viresh Kumar
  0 siblings, 1 reply; 25+ messages in thread
From: Steve Muckle @ 2016-05-30 15:31 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

On Thu, May 26, 2016 at 11:55:14AM +0530, Viresh Kumar wrote:
> On 25-05-16, 19:52, Steve Muckle wrote:
> > Cpufreq governors may need to know what a particular target frequency
> > maps to in the driver without necessarily wanting to set the frequency.
> > Support this operation via a new cpufreq API,
> > cpufreq_driver_resolve_freq().
> > 
> > The above API will call a new cpufreq driver callback, resolve_freq(),
> > if it has been registered by the driver. If that callback has not been
> > registered and a frequency table is available then the frequency table
> > is walked using cpufreq_frequency_table_target().
> > 
> > UINT_MAX is returned if no driver callback or frequency table is
> > available.
> 
> Why should we return UINT_MAX here? We should return target_freq, no ?

My goal here was to have the system operate in this case in a manner
that is obviously not optimized (running at fmax), so the platform owner
realizes that the cpufreq driver doesn't fully support the schedutil
governor.

I was originally going to just return an error code but that also means
having to check for it which would be nice to avoid if possible on this
fast path.

> 
> > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
> > ---
> >  drivers/cpufreq/cpufreq.c | 25 +++++++++++++++++++++++++
> >  include/linux/cpufreq.h   | 11 +++++++++++
> >  2 files changed, 36 insertions(+)
> > 
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 77d77a4e3b74..3b44f4bdc071 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -1849,6 +1849,31 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
> >  }
> >  EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
> >  
> > +unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> > +					 unsigned int target_freq)
> > +{
> > +	struct cpufreq_frequency_table *freq_table;
> > +	int index, retval;
> > +
> > +	clamp_val(target_freq, policy->min, policy->max);
> > +
> > +	if (cpufreq_driver->resolve_freq)
> > +		return cpufreq_driver->resolve_freq(policy, target_freq);
> > +
> > +	freq_table = cpufreq_frequency_get_table(policy->cpu);
> 
> I have sent a separate patch to provide a light weight alternative to
> this. If that gets accepted, we can switch over to using it.

Sure.

> 
> > +	if (!freq_table)
> > +		return UINT_MAX;
> > +
> > +	retval = cpufreq_frequency_table_target(policy, freq_table,
> > +						target_freq, CPUFREQ_RELATION_L,
> > +						&index);
> > +	if (retval)
> > +		return UINT_MAX;
> > +
> > +	return freq_table[index].frequency;
> > +}
> > +EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
> > +
> >  /* Must set freqs->new to intermediate frequency */
> >  static int __target_intermediate(struct cpufreq_policy *policy,
> >  				 struct cpufreq_freqs *freqs, int index)
> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > index 4e81e08db752..675f17f98e75 100644
> > --- a/include/linux/cpufreq.h
> > +++ b/include/linux/cpufreq.h
> > @@ -271,6 +271,13 @@ struct cpufreq_driver {
> >  	int		(*target_intermediate)(struct cpufreq_policy *policy,
> >  					       unsigned int index);
> >  
> > +	/*
> > +	 * Return the driver-supported frequency that a particular target
> > +	 * frequency maps to (does not set the new frequency).
> > +	 */
> > +	unsigned int	(*resolve_freq)(struct cpufreq_policy *policy,
> > +					unsigned int target_freq);
> 
> We have 3 categories of cpufreq-drivers today:
> 1. setpolicy drivers: They don't use the cpufreq governors we are
>    working on.
> 2. non-setpolicy drivers:
>   A. with ->target_index() callback, these will always provide a
>      freq-table.
>   B. with ->target() callback, ONLY these should be allowed to provide
>      the ->resolve_freq() callback and no one else.
> 
> And so I would suggest adding an additional check in
> cpufreq_register_driver() to catch incorrect usage of this callback.

I'll reply to this in the next email on patch 2...

thanks,
Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-30 14:25         ` Rafael J. Wysocki
@ 2016-05-30 15:32           ` Viresh Kumar
  2016-05-30 19:08             ` Rafael J. Wysocki
  2016-05-31  9:02             ` Peter Zijlstra
  2016-05-31  1:49           ` Wanpeng Li
  1 sibling, 2 replies; 25+ messages in thread
From: Viresh Kumar @ 2016-05-30 15:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette

I clearly missed the !policy->fast_switch_enabled check in sugov_limit() and so
the confusion.

On 30-05-16, 16:25, Rafael J. Wysocki wrote:
> On Mon, May 30, 2016 at 12:18 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > Suppose this is the current range of frequencies supported by a
> > driver: 200, 400, 600, 800, 1000 (in MHz).
> >
> > And policy->cur = next_freq = 400 MHz.
> >
> > A.) Suppose that we change policy->min to 400 MHz from userspace.
> >     -> sugov_limits()
> >        This will find everything in order and simply set
> >        need_freq_update, without updating the frequency.
> >
> >     On next util-callback, we will forcefully return true from
> >     sugov_should_update_freq() and reach sugov_update_commit().
> >
> >     We calculate next_freq and that comes to 400 MHz again (that's the
> >     case we are trying to target with the above code).
> >
> >     With the current code, we will forcefully end up calling
> >     cpufreq_driver_fast_switch().
> >
> >     Because the new and current frequencies are same,
> >     cpufreq_driver->fast_switch() will simply return.
> >
> >     NOTE: I also think that cpufreq_driver_fast_switch() should have a
> >     check like (policy->cur == target_freq). I will add that too, in
> >     case you agree.
> >
> >     So, forcefully updating next_freq to UINT_MAX will end up wasting
> >     some cycles, but wouldn't do any useful stuff.
> 
> It will, but there's no way to distinguish this case from B in the
> governor with the current min/max synchronization mechanism.  That is,
> it only knows that something has changed, but checking what exactly
> has changed would be racy.
> 
> > B.) Suppose that we change policy->min to 600 MHz from userspace.
> >     -> sugov_limits()
> >        This will find that policy->cur is less than 600 and will set
> >        that to 600 MHz by calling __cpufreq_driver_target(). We will
> >        also set need_freq_update.
> >
> >        Note that next_freq and policy->cur are not in sync anymore and
> >        perhaps this is the most important case for the above code.
> 
> It is.
> 
> Moreover, please note that __cpufreq_driver_target() is only called in
> sugov_limits() when policy->fast_switch_enabled is unset.

Yep, I missed it.

I am not sure how harmful it can be, but we are returning from sugov_limits()
without making sure that policy->cur is in valid range currently. I also know
that you left it out because of the possible races with the util handler.

But this is something that is fundamentally broken for now. The user writes
updates the policy->max/min, we return the call to the user thinks that it has
successfully written to the file and everything is aligned. But we may be
running at an frequency from invalid range. Yes, that will happen very soon, but
its broken.

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] cpufreq: acpi-cpufreq: add resolve_freq callback
  2016-05-26  6:43   ` Viresh Kumar
@ 2016-05-30 16:20     ` Steve Muckle
  2016-05-31 11:38       ` Viresh Kumar
  0 siblings, 1 reply; 25+ messages in thread
From: Steve Muckle @ 2016-05-30 16:20 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

On Thu, May 26, 2016 at 12:13:41PM +0530, Viresh Kumar wrote:
> On 25-05-16, 19:53, Steve Muckle wrote:
> > Support the new resolve_freq cpufreq callback which resolves a target
> > frequency to a driver-supported frequency without actually setting it.
> 
> And here is the first abuser of this API as I was talking about in the
> earlier patch :)
> 
> But, I know why you are doing it and I think we can do it differently.
> 
> So, lets assume that the ->resolve_freq() callback will ONLY be
> provided by the drivers which also provide a ->target() callback.
> 
> i.e. not by acpi-cpufreq at least.
> 
> > The target frequency and resolved frequency table entry are cached so
> > that a subsequent fast_switch operation may avoid the frequency table
> > walk assuming the requested target frequency is the same.
> > 
> > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
> > ---
> >  drivers/cpufreq/acpi-cpufreq.c | 56 ++++++++++++++++++++++++++++++++----------
> >  1 file changed, 43 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> > index 7f38fb55f223..d87962eda1ed 100644
> > --- a/drivers/cpufreq/acpi-cpufreq.c
> > +++ b/drivers/cpufreq/acpi-cpufreq.c
> > @@ -66,6 +66,8 @@ enum {
> >  
> >  struct acpi_cpufreq_data {
> >  	struct cpufreq_frequency_table *freq_table;
> > +	unsigned int cached_lookup_freq;
> > +	struct cpufreq_frequency_table *cached_lookup_entry;
> 
> This could have been an integer value 'Index', which could have been
> used together with the freq-table to get the freq we wanted.
> 
> And, then we can move these two fields into the cpufreq_policy
> structure and make them part of the first patch itself.
> 
> >  	unsigned int resume;
> >  	unsigned int cpu_feature;
> >  	unsigned int acpi_perf_cpu;
> > @@ -458,26 +460,53 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
> >  	return result;
> >  }
> >  
> > -unsigned int acpi_cpufreq_fast_switch(struct cpufreq_policy *policy,
> > -				      unsigned int target_freq)
> > +/*
> > + * Find the closest frequency above target_freq.
> > + *
> > + * The table is sorted in the reverse order with respect to the
> > + * frequency and all of the entries are valid (see the initialization).
> > + */
> > +static inline struct cpufreq_frequency_table
> > +*lookup_freq(struct cpufreq_frequency_table *table, unsigned int target_freq)
> >  {
> > -	struct acpi_cpufreq_data *data = policy->driver_data;
> > -	struct acpi_processor_performance *perf;
> > -	struct cpufreq_frequency_table *entry;
> > -	unsigned int next_perf_state, next_freq, freq;
> > +	struct cpufreq_frequency_table *entry = table;
> > +	unsigned int freq;
> >  
> > -	/*
> > -	 * Find the closest frequency above target_freq.
> > -	 *
> > -	 * The table is sorted in the reverse order with respect to the
> > -	 * frequency and all of the entries are valid (see the initialization).
> > -	 */
> > -	entry = data->freq_table;
> >  	do {
> >  		entry++;
> >  		freq = entry->frequency;
> >  	} while (freq >= target_freq && freq != CPUFREQ_TABLE_END);
> >  	entry--;
> > +
> > +	return entry;
> > +}
> > +
> > +unsigned int acpi_cpufreq_resolve_freq(struct cpufreq_policy *policy,
> > +				       unsigned int target_freq)
> > +{
> > +	struct acpi_cpufreq_data *data = policy->driver_data;
> > +	struct cpufreq_frequency_table *entry;
> > +
> > +	data->cached_lookup_freq = target_freq;
> > +	entry = lookup_freq(data->freq_table, target_freq);
> > +	data->cached_lookup_entry = entry;
> > +
> > +	return entry->frequency;
> > +}
> > +
> 
> And then we could remove this callback from acpi-cpufreq.
> 
> > +unsigned int acpi_cpufreq_fast_switch(struct cpufreq_policy *policy,
> > +				      unsigned int target_freq)
> > +{
> > +	struct acpi_cpufreq_data *data = policy->driver_data;
> > +	struct acpi_processor_performance *perf;
> > +	struct cpufreq_frequency_table *entry;
> > +	unsigned int next_perf_state, next_freq;
> > +
> > +	if (data->cached_lookup_entry &&
> > +	    data->cached_lookup_freq == target_freq)
> > +		entry = data->cached_lookup_entry;
> > +	else
> > +		entry = lookup_freq(data->freq_table, target_freq);
> 
> And a static inline callback in cpufreq.h to get this entry.
> 
> >  	next_freq = entry->frequency;
> >  	next_perf_state = entry->driver_data;
> >  
> > @@ -918,6 +947,7 @@ static struct cpufreq_driver acpi_cpufreq_driver = {
> >  	.verify		= cpufreq_generic_frequency_table_verify,
> >  	.target_index	= acpi_cpufreq_target,
> >  	.fast_switch	= acpi_cpufreq_fast_switch,
> > +	.resolve_freq	= acpi_cpufreq_resolve_freq,
> >  	.bios_limit	= acpi_processor_get_bios_limit,
> >  	.init		= acpi_cpufreq_cpu_init,
> >  	.exit		= acpi_cpufreq_cpu_exit,
> 
> Sounds reasonable ?

A couple concerns... One is that if we do the lookup in
cpufreq_driver_resolve_freq() for drivers which implement target_index()
then it means using cpufreq_frequency_table_target() there.  This is a
heavier weight function that can't take advantage of driver-specific
knowledge that the freq table is sorted a particular way. So for
acpi-cpufreq we'd now be having to walk the whole table for every
fast_switch.

Another is that it'll be a a bit odd that the logic used to lookup the
driver frequency will be different in the cached and uncached
fast_switch cases. In the cached case it will have been determined by
code in cpufreq_driver_resolve_freq() whereas in the uncached case it
will be logic in the driver, in its fast_switch routine.

I think at least the first issue would need to be solved to use this
approach as it would impact performance for every fast_switch call.

(Thanks for the review btw!)

thanks,
Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-26  7:16   ` Viresh Kumar
  2016-05-29  0:40     ` Rafael J. Wysocki
@ 2016-05-30 16:35     ` Steve Muckle
  2016-06-01 10:50       ` Viresh Kumar
  1 sibling, 1 reply; 25+ messages in thread
From: Steve Muckle @ 2016-05-30 16:35 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

On Thu, May 26, 2016 at 12:46:29PM +0530, Viresh Kumar wrote:
> On 25-05-16, 19:53, Steve Muckle wrote:
> > The slow-path frequency transition path is relatively expensive as it
> > requires waking up a thread to do work. Should support be added for
> > remote CPU cpufreq updates that is also expensive since it requires an
> > IPI. These activities should be avoided if they are not necessary.
> > 
> > To that end, calculate the actual driver-supported frequency required by
> > the new utilization value in schedutil by using the recently added
> > cpufreq_driver_resolve_freq callback. If it is the same as the
> > previously requested driver frequency then there is no need to continue
> > with the update assuming the cpu frequency limits have not changed. This
> > will have additional benefits should the semantics of the rate limit be
> > changed to apply solely to frequency transitions rather than to
> > frequency calculations in schedutil.
> 
> Maybe mention here that this patch isn't avoiding those IPIs yet, I
> got an impression earlier that they are avoided with it.

Perhaps the problem is that my cover letter should have more clearly
specified that the earlier referenced series was an unaccepted draft?
I'll be more careful to note that in the future.

The text above (the second sentence) seems okay to me in that it
mentions remote updates are not currently supported. Let me know if
there is specific text you would like included.

thanks,
Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-27  5:41   ` Wanpeng Li
@ 2016-05-30 16:48     ` Steve Muckle
  0 siblings, 0 replies; 25+ messages in thread
From: Steve Muckle @ 2016-05-30 16:48 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	Viresh Kumar, linux-kernel, Linux PM list, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette

On Fri, May 27, 2016 at 01:41:02PM +0800, Wanpeng Li wrote:
> 2016-05-26 10:53 GMT+08:00 Steve Muckle <steve.muckle@linaro.org>:
> > The slow-path frequency transition path is relatively expensive as it
> > requires waking up a thread to do work. Should support be added for
> > remote CPU cpufreq updates that is also expensive since it requires an
> > IPI. These activities should be avoided if they are not necessary.
> >
> > To that end, calculate the actual driver-supported frequency required by
> > the new utilization value in schedutil by using the recently added
> > cpufreq_driver_resolve_freq callback. If it is the same as the
> > previously requested driver frequency then there is no need to continue
> > with the update assuming the cpu frequency limits have not changed. This
> > will have additional benefits should the semantics of the rate limit be
> > changed to apply solely to frequency transitions rather than to
> > frequency calculations in schedutil.
> 
> sugov_should_update_freq() still be called before get_nex_freq() after
> the patch applied, so rate limit still apply to both frequency
> transitions and frequency calculations, right?

Yes this series does not change the semantics of the rate limit. It
only includes the changes required for resolving raw target frequencies
to driver-supported frequencies so redundant operations can be avoided.

FWIW the approach I took to change the rate_limit semantics [0] just
moves where last_freq_update_time is set. I was going to return to that
once these changes are complete.

[0]: https://lkml.org/lkml/2016/5/9/857

thanks,
Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-30 15:32           ` Viresh Kumar
@ 2016-05-30 19:08             ` Rafael J. Wysocki
  2016-05-31  9:02             ` Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-05-30 19:08 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette

On Mon, May 30, 2016 at 5:32 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> I clearly missed the !policy->fast_switch_enabled check in sugov_limit() and so
> the confusion.
>
> On 30-05-16, 16:25, Rafael J. Wysocki wrote:
>> On Mon, May 30, 2016 at 12:18 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> > Suppose this is the current range of frequencies supported by a
>> > driver: 200, 400, 600, 800, 1000 (in MHz).

[cut]

> I am not sure how harmful it can be, but we are returning from sugov_limits()
> without making sure that policy->cur is in valid range currently. I also know
> that you left it out because of the possible races with the util handler.
>
> But this is something that is fundamentally broken for now.

To be clear, this is your opinion that I disagree with.

> The user writes updates the policy->max/min, we return the call to the user thinks that it has
> successfully written to the file and everything is aligned. But we may be
> running at an frequency from invalid range. Yes, that will happen very soon, but
> its broken.

Does it really matter that the call may return before the new limit
takes effect?  I don't think so.

Moreover, if the limit is updated cross-CPU (ie. the CPU running the
call is not the one whose limit is updated) and the target CPU is
idle, waking it up just in order to make sure that the frequency limit
took effect is not particularly smart.

What really matters is how soon the limit can take effect and that in
both "fast switch" and "traditional" cases is about the same time, so
I don't see anything problematic here.

Actually, making sugov_limits() wait for the limit to take effect
would be a matter of adding three lines of code to it, but that would
change a relatively lightweight call into a really heavyweight one, so
I didn't add them, because I didn't see a point in doing that.

I still don't see a point in doing that for that matter.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-30 14:25         ` Rafael J. Wysocki
  2016-05-30 15:32           ` Viresh Kumar
@ 2016-05-31  1:49           ` Wanpeng Li
  1 sibling, 0 replies; 25+ messages in thread
From: Wanpeng Li @ 2016-05-31  1:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette

2016-05-30 22:25 GMT+08:00 Rafael J. Wysocki <rafael@kernel.org>:
> On Mon, May 30, 2016 at 12:18 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> On 29-05-16, 02:40, Rafael J. Wysocki wrote:
>>> I can't really parse the above question, so I'm not going to try to
>>> answer it. :-)
>>
>> Sorry about that :(
>>
>> IOW, I think that we should make this change into the sched-governor (I will
>> send a patch separately if you agree to this):
>
> I don't.
>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index 14c4aa25cc45..5934b14aa21c 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -66,11 +66,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>>
>>         if (unlikely(sg_policy->need_freq_update)) {
>>                 sg_policy->need_freq_update = false;
>> -               /*
>> -                * This happens when limits change, so forget the previous
>> -                * next_freq value and force an update.
>> -                */
>> -               sg_policy->next_freq = UINT_MAX;
>>                 return true;
>>         }
>>
>> And here is my reasoning behind this.
>>
>> Can you show me any case, where the above code (as present in mainline
>> today) leads to a freq-change? I couldn't find any.
>>
>> Let me demonstrate.
>>
>> Following only talks about the fast-switch path, the other path is
>> same as well.
>>
>> Suppose this is the current range of frequencies supported by a
>> driver: 200, 400, 600, 800, 1000 (in MHz).
>>
>> And policy->cur = next_freq = 400 MHz.
>>
>> A.) Suppose that we change policy->min to 400 MHz from userspace.
>>     -> sugov_limits()
>>        This will find everything in order and simply set
>>        need_freq_update, without updating the frequency.
>>
>>     On next util-callback, we will forcefully return true from
>>     sugov_should_update_freq() and reach sugov_update_commit().
>>
>>     We calculate next_freq and that comes to 400 MHz again (that's the
>>     case we are trying to target with the above code).
>>
>>     With the current code, we will forcefully end up calling
>>     cpufreq_driver_fast_switch().
>>
>>     Because the new and current frequencies are same,
>>     cpufreq_driver->fast_switch() will simply return.
>>
>>     NOTE: I also think that cpufreq_driver_fast_switch() should have a
>>     check like (policy->cur == target_freq). I will add that too, in
>>     case you agree.
>>
>>     So, forcefully updating next_freq to UINT_MAX will end up wasting
>>     some cycles, but wouldn't do any useful stuff.
>
> It will, but there's no way to distinguish this case from B in the
> governor with the current min/max synchronization mechanism.  That is,
> it only knows that something has changed, but checking what exactly
> has changed would be racy.
>
>> B.) Suppose that we change policy->min to 600 MHz from userspace.
>>     -> sugov_limits()
>>        This will find that policy->cur is less than 600 and will set
>>        that to 600 MHz by calling __cpufreq_driver_target(). We will
>>        also set need_freq_update.
>>
>>        Note that next_freq and policy->cur are not in sync anymore and
>>        perhaps this is the most important case for the above code.
>
> It is.
>
> Moreover, please note that __cpufreq_driver_target() is only called in
> sugov_limits() when policy->fast_switch_enabled is unset.
>
>>     On next util-callback, we will forcefully return true from
>>     sugov_should_update_freq() and reach sugov_update_commit().
>>
>>     We calculate next_freq and lets say that comes to 400 MHz again
>>     (as that's the case we are trying to target with the above code).
>>
>>     With the current code, we will forcefully end up calling
>>     cpufreq_driver_fast_switch().
>>
>>     Because next_freq() is not part of the new range, we will clamp it
>>     and set it to 600 MHz eventually. Again, next and current
>>     frequencies are same, cpufreq_driver->fast_switch() will simply
>>     return.
>
> Not really (as per the above).
>
> And even in the !fast_switch_enabled case, if next_freq stays at 400
> MHz (which is different from policy->cur), it may lead to suboptimal
> decisions going forward (eg. if it goes to 600 MHz next time and the
> governor will think that the frequency has changed, although in fact
> it hasn't).

Does set next_freq = UNIT_MAX has same effect as next_freq stays at
400MHz since both means that frequency has changed?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/3] cpufreq: add resolve_freq driver callback
  2016-05-30 15:31     ` Steve Muckle
@ 2016-05-31  5:30       ` Viresh Kumar
  2016-05-31 18:48         ` Steve Muckle
  0 siblings, 1 reply; 25+ messages in thread
From: Viresh Kumar @ 2016-05-31  5:30 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, linux-kernel,
	linux-pm, Vincent Guittot, Morten Rasmussen, Dietmar Eggemann,
	Juri Lelli, Patrick Bellasi, Michael Turquette

On 30-05-16, 08:31, Steve Muckle wrote:
> My goal here was to have the system operate in this case in a manner
> that is obviously not optimized (running at fmax), so the platform owner
> realizes that the cpufreq driver doesn't fully support the schedutil
> governor.
> 
> I was originally going to just return an error code but that also means
> having to check for it which would be nice to avoid if possible on this
> fast path.

Okay, I get what you are saying.

But all we are doing here is to make things fast by not sending IPIs,
etc. That should *not* lead to a behavior where the frequency stays at
MAX all the time even if the driver doesn't provide this callback or
the freq-table.

If we just return the target_freq in this case instead of UINT_MAX,
the platform may eventually have some unnecessary IPIs, wakeups, etc,
but its frequency will still be switched properly.

Wouldn't that be a better choice ?

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-30 15:32           ` Viresh Kumar
  2016-05-30 19:08             ` Rafael J. Wysocki
@ 2016-05-31  9:02             ` Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-05-31  9:02 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Steve Muckle, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette

On Mon, May 30, 2016 at 09:02:33PM +0530, Viresh Kumar wrote:
> But this is something that is fundamentally broken for now. The user writes
> updates the policy->max/min, we return the call to the user thinks that it has

IMO the presence of these user min/max files is the broken part.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/3] cpufreq: add resolve_freq driver callback
  2016-05-26  2:52 ` [PATCH v2 1/3] cpufreq: add resolve_freq driver callback Steve Muckle
  2016-05-26  6:25   ` Viresh Kumar
@ 2016-05-31 11:14   ` Viresh Kumar
  2016-05-31 18:12     ` Steve Muckle
  1 sibling, 1 reply; 25+ messages in thread
From: Viresh Kumar @ 2016-05-31 11:14 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, linux-kernel,
	linux-pm, Vincent Guittot, Morten Rasmussen, Dietmar Eggemann,
	Juri Lelli, Patrick Bellasi, Michael Turquette

On 25-05-16, 19:52, Steve Muckle wrote:
> +unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> +					 unsigned int target_freq)
> +{
> +	struct cpufreq_frequency_table *freq_table;
> +	int index, retval;
> +
> +	clamp_val(target_freq, policy->min, policy->max);

Rafael will kill me for this, as I have fallen into the same trap and
copied your *incorrect* code :(

This is a useless statement unless you do:

target_freq = clamp_val(target_freq, policy->min, policy->max);

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] cpufreq: acpi-cpufreq: add resolve_freq callback
  2016-05-30 16:20     ` Steve Muckle
@ 2016-05-31 11:38       ` Viresh Kumar
  0 siblings, 0 replies; 25+ messages in thread
From: Viresh Kumar @ 2016-05-31 11:38 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, linux-kernel,
	linux-pm, Vincent Guittot, Morten Rasmussen, Dietmar Eggemann,
	Juri Lelli, Patrick Bellasi, Michael Turquette

On 30-05-16, 09:20, Steve Muckle wrote:
> A couple concerns... One is that if we do the lookup in
> cpufreq_driver_resolve_freq() for drivers which implement target_index()
> then it means using cpufreq_frequency_table_target() there.  This is a
> heavier weight function that can't take advantage of driver-specific
> knowledge that the freq table is sorted a particular way.

I completely agree.

> So for
> acpi-cpufreq we'd now be having to walk the whole table for every
> fast_switch.

I have just tried to address that with following set:

[PATCH 0/2] cpufreq: Use sorted frequency tables

Lets see what Rafael has to say about that.

> Another is that it'll be a a bit odd that the logic used to lookup the
> driver frequency will be different in the cached and uncached
> fast_switch cases. In the cached case it will have been determined by
> code in cpufreq_driver_resolve_freq() whereas in the uncached case it
> will be logic in the driver, in its fast_switch routine.

We can make both of them refer the above code then. Lets see.

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/3] cpufreq: add resolve_freq driver callback
  2016-05-31 11:14   ` Viresh Kumar
@ 2016-05-31 18:12     ` Steve Muckle
  0 siblings, 0 replies; 25+ messages in thread
From: Steve Muckle @ 2016-05-31 18:12 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

On Tue, May 31, 2016 at 04:44:51PM +0530, Viresh Kumar wrote:
> On 25-05-16, 19:52, Steve Muckle wrote:
> > +unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> > +					 unsigned int target_freq)
> > +{
> > +	struct cpufreq_frequency_table *freq_table;
> > +	int index, retval;
> > +
> > +	clamp_val(target_freq, policy->min, policy->max);
> 
> Rafael will kill me for this, as I have fallen into the same trap and
> copied your *incorrect* code :(
> 
> This is a useless statement unless you do:
> 
> target_freq = clamp_val(target_freq, policy->min, policy->max);

Well I wouldn't worry too much, I copied it from Rafael's code in
cpufreq_driver_fast_switch() which also has this problem and needs to be
fixed.

Perhaps clamp_val could be changed to force the compiler to check that
its return value is being used.

thanks,
Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/3] cpufreq: add resolve_freq driver callback
  2016-05-31  5:30       ` Viresh Kumar
@ 2016-05-31 18:48         ` Steve Muckle
  0 siblings, 0 replies; 25+ messages in thread
From: Steve Muckle @ 2016-05-31 18:48 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette

On Tue, May 31, 2016 at 11:00:11AM +0530, Viresh Kumar wrote:
> On 30-05-16, 08:31, Steve Muckle wrote:
> > My goal here was to have the system operate in this case in a manner
> > that is obviously not optimized (running at fmax), so the platform owner
> > realizes that the cpufreq driver doesn't fully support the schedutil
> > governor.
> > 
> > I was originally going to just return an error code but that also means
> > having to check for it which would be nice to avoid if possible on this
> > fast path.
> 
> Okay, I get what you are saying.
> 
> But all we are doing here is to make things fast by not sending IPIs,
> etc. That should *not* lead to a behavior where the frequency stays at
> MAX all the time even if the driver doesn't provide this callback or
> the freq-table.
> 
> If we just return the target_freq in this case instead of UINT_MAX,
> the platform may eventually have some unnecessary IPIs, wakeups, etc,
> but its frequency will still be switched properly.
> 
> Wouldn't that be a better choice ?

I'm still concerned that a platform owner may use this and accept
suboptimal perf/power because they aren't aware their cpufreq driver is
not fully compliant. But I agree it'd be better to have it work as well
as it can. I will make the change.

Maybe a warning message can be added when schedutil initializes if
resolve_freq is not supported.

thanks,
Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency
  2016-05-30 16:35     ` Steve Muckle
@ 2016-06-01 10:50       ` Viresh Kumar
  0 siblings, 0 replies; 25+ messages in thread
From: Viresh Kumar @ 2016-06-01 10:50 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, linux-kernel,
	linux-pm, Vincent Guittot, Morten Rasmussen, Dietmar Eggemann,
	Juri Lelli, Patrick Bellasi, Michael Turquette

On 30-05-16, 09:35, Steve Muckle wrote:
> The text above (the second sentence) seems okay to me in that it
> mentions remote updates are not currently supported. Let me know if
> there is specific text you would like included.

Perhaps I got confused between cover-letter and this log. Leave that.
Its fine.

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-06-01 10:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26  2:52 [PATCH v2 0/3] cpufreq: avoid redundant driver calls in schedutil Steve Muckle
2016-05-26  2:52 ` [PATCH v2 1/3] cpufreq: add resolve_freq driver callback Steve Muckle
2016-05-26  6:25   ` Viresh Kumar
2016-05-30 15:31     ` Steve Muckle
2016-05-31  5:30       ` Viresh Kumar
2016-05-31 18:48         ` Steve Muckle
2016-05-31 11:14   ` Viresh Kumar
2016-05-31 18:12     ` Steve Muckle
2016-05-26  2:53 ` [PATCH v2 2/3] cpufreq: acpi-cpufreq: add resolve_freq callback Steve Muckle
2016-05-26  6:43   ` Viresh Kumar
2016-05-30 16:20     ` Steve Muckle
2016-05-31 11:38       ` Viresh Kumar
2016-05-26  2:53 ` [PATCH v2 3/3] cpufreq: schedutil: map raw required frequency to driver frequency Steve Muckle
2016-05-26  7:16   ` Viresh Kumar
2016-05-29  0:40     ` Rafael J. Wysocki
2016-05-30 10:18       ` Viresh Kumar
2016-05-30 14:25         ` Rafael J. Wysocki
2016-05-30 15:32           ` Viresh Kumar
2016-05-30 19:08             ` Rafael J. Wysocki
2016-05-31  9:02             ` Peter Zijlstra
2016-05-31  1:49           ` Wanpeng Li
2016-05-30 16:35     ` Steve Muckle
2016-06-01 10:50       ` Viresh Kumar
2016-05-27  5:41   ` Wanpeng Li
2016-05-30 16:48     ` Steve Muckle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).