linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions
@ 2017-06-09 10:15 Viresh Kumar
  2017-06-09 10:15 ` [PATCH 1/3] cpufreq: schedutil: Restore cached_raw_freq behavior Viresh Kumar
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Viresh Kumar @ 2017-06-09 10:15 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: Viresh Kumar, linaro-kernel, linux-pm, linux-kernel,
	Vincent Guittot, Juri Lelli, Ingo Molnar, Peter Zijlstra,
	patrick.bellasi, john.ettedgui, Srinivas Pandruvada,
	Joel Fernandes, Morten Rasmussen

Hi Rafael,

I have identified some regressions with the schedutil governor which
happen due to one of your patches that got merged in 4.12-rc1.

This series fixes all the drivers which provide a ->target_index()
callback but doesn't fix the drivers which provide ->target() callback.

Such platforms need to implement the ->resolve_freq() callback in order
to get this fixed and I only had hardware for testing intel_pstate,
which I fixed in this series.

I am wondering if there is another way to fix this issue (than what I
tried) or if we should revert the offending commit (39b64aa1c007) and
look for other solutions.

Anyway, this series has the necessary patches to fix it.

Viresh Kumar (3):
  cpufreq: schedutil: Restore cached_raw_freq behavior
  cpufreq: schedutil: Fix selection algorithm while reducing frequency
  cpufreq: intel_pstate: Provide resolve_freq() to fix regression

 drivers/cpufreq/intel_pstate.c   | 14 +++++++++++++
 kernel/sched/cpufreq_schedutil.c | 45 +++++++++++++++++++++++++++++++++-------
 2 files changed, 52 insertions(+), 7 deletions(-)

-- 
2.13.0.70.g6367777092d9

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] cpufreq: schedutil: Restore cached_raw_freq behavior
  2017-06-09 10:15 [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions Viresh Kumar
@ 2017-06-09 10:15 ` Viresh Kumar
  2017-06-09 10:15 ` [PATCH 2/3] cpufreq: schedutil: Fix selection algorithm while reducing frequency Viresh Kumar
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2017-06-09 10:15 UTC (permalink / raw)
  To: Rafael Wysocki, Ingo Molnar, Peter Zijlstra
  Cc: Viresh Kumar, linaro-kernel, linux-pm, linux-kernel,
	Vincent Guittot, Juri Lelli, patrick.bellasi, john.ettedgui,
	Srinivas Pandruvada, Joel Fernandes, Morten Rasmussen

The purpose of the "cached_raw_freq" field is to avoid a call to
cpufreq_driver_resolve_freq() when the next selected frequency is same
as the one selected last time and we can use sg_policy->next_freq then.

With the recent changes (reduce frequencies slower), we update the next
frequency at the very last moment from sugov_update_commit() and that
breaks the working of cached_raw_freq somewhat.

Here is an example to illustrate what happens now.
Min freq: 1 GHz
Max and current freq: 4 GHz

- get_next_freq() gets called and it calculates the next frequency to be
  1 GHz and so it updates cached_raw_freq as 1 GHz as well. It then
  calls cpufreq_driver_resolve_freq() and that also returns 1 GHz.

- We then call sugov_update_commit() and it updates the
  sg_policy->next_freq to 2.5 GHz.

- get_next_freq() gets called again and this time it calculates the next
  frequency as 2.5 GHz. Even when the previous next_freq was set to 2.5
  GHz, we end up calling cpufreq_driver_resolve_freq() as
  cached_raw_freq was set to 1 GHz.

Moreover, it is not right to update the target frequency after we have
called cpufreq_driver_resolve_freq() as that was called to map the
target frequency to the driver supported one, so that we can avoid the
update completely if we are already at the driver supported frequency.

Fix this by moving the newly added code to get_next_freq() before the
cached_raw_freq is accessed/updated.

Also add a minor comment above that code to explain why it is done.

We cannot take a simple average anymore as the "freq" here can be well
below policy->min and we may end up reducing the frequency drastically.
Take care of that by making sure "freq" is at least as much as
policy->min.

Fixes: 39b64aa1c007 ("cpufreq: schedutil: Reduce frequencies slower")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 622eed1b7658..1852bd73d903 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -101,9 +101,6 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
 	if (sg_policy->next_freq == next_freq)
 		return;
 
-	if (sg_policy->next_freq > next_freq)
-		next_freq = (sg_policy->next_freq + next_freq) >> 1;
-
 	sg_policy->next_freq = next_freq;
 	sg_policy->last_freq_update_time = time;
 
@@ -151,6 +148,17 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 
 	freq = (freq + (freq >> 2)) * util / max;
 
+	/*
+	 * Reduce frequency gradually to avoid undesirable performance drops.
+	 * Before that we need to make sure that freq >= policy->min, else we
+	 * may still end up reducing frequency rapidly.
+	 */
+	if (freq < policy->min)
+		freq = policy->min;
+
+	if (sg_policy->next_freq > freq)
+		freq = (sg_policy->next_freq + freq) >> 1;
+
 	if (freq == sg_policy->cached_raw_freq && sg_policy->next_freq != UINT_MAX)
 		return sg_policy->next_freq;
 	sg_policy->cached_raw_freq = freq;
-- 
2.13.0.70.g6367777092d9

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] cpufreq: schedutil: Fix selection algorithm while reducing frequency
  2017-06-09 10:15 [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions Viresh Kumar
  2017-06-09 10:15 ` [PATCH 1/3] cpufreq: schedutil: Restore cached_raw_freq behavior Viresh Kumar
@ 2017-06-09 10:15 ` Viresh Kumar
  2017-06-10  9:11   ` Joel Fernandes
  2017-06-09 10:15 ` [PATCH 3/3] cpufreq: intel_pstate: Provide resolve_freq() to fix regression Viresh Kumar
  2017-06-09 12:24 ` [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions Rafael J. Wysocki
  3 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2017-06-09 10:15 UTC (permalink / raw)
  To: Rafael Wysocki, Ingo Molnar, Peter Zijlstra
  Cc: Viresh Kumar, linaro-kernel, linux-pm, linux-kernel,
	Vincent Guittot, Juri Lelli, patrick.bellasi, john.ettedgui,
	Srinivas Pandruvada, Joel Fernandes, Morten Rasmussen

While reducing frequency if there are no frequencies available between
"current" and "next" calculated frequency, then the core will never
select the "next" frequency.

For example, consider the possible range of frequencies as 900 MHz, 1
GHz, 1.1 GHz, and 1.2 GHz. If the current frequency is 1.1 GHz and the
next frequency (based on current utilization) is 1 GHz, then the
schedutil governor will try to set the average of these as the next
frequency (i.e. 1.05 GHz).

Because we always try to find the lowest frequency greater than equal to
the target frequency, cpufreq_driver_resolve_freq() will end up
returning 1.1 GHz only. And we will not be able to reduce the frequency
eventually. The worst hit is the policy->min frequency as that will
never get selected after the frequency is increased once.

This affects all the drivers that provide ->target() or ->target_index()
callbacks.

Though for cpufreq drivers, like intel_pstate, which provide ->target()
but not ->resolve_freq() (i.e.  cpufreq_driver_resolve_freq() simply
returns the next frequency), sg_policy->next_freq gets updated with the
average frequency. And so we will finally select the min frequency when
the next_freq is 1 more than the min frequency as the average then will
be equal to the min frequency. But that will also take lots of
iterations of the schedutil update callbacks to happen.

Fix that by not using the average value for the next_freq in such cases.

Note that this still doesn't fix the drivers which provide ->target()
but don't provide ->resolve_freq() (e.g. intel_pstate) and such drivers
need to be updated to provide the ->resolve_freq() callbacks as well in
order to fix this.

Fixes: 39b64aa1c007 ("cpufreq: schedutil: Reduce frequencies slower")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 1852bd73d903..30e6a62d227c 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -117,6 +117,17 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
 	}
 }
 
+static unsigned int resolve_freq(struct sugov_policy *sg_policy,
+				 unsigned int freq)
+{
+	if (freq == sg_policy->cached_raw_freq &&
+	    sg_policy->next_freq != UINT_MAX)
+		return sg_policy->next_freq;
+
+	sg_policy->cached_raw_freq = freq;
+	return cpufreq_driver_resolve_freq(sg_policy->policy, freq);
+}
+
 /**
  * get_next_freq - Compute a new frequency for a given cpufreq policy.
  * @sg_policy: schedutil policy object to compute the new frequency for.
@@ -145,6 +156,7 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned int freq = arch_scale_freq_invariant() ?
 				policy->cpuinfo.max_freq : policy->cur;
+	unsigned int target, original = 0;
 
 	freq = (freq + (freq >> 2)) * util / max;
 
@@ -156,13 +168,24 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 	if (freq < policy->min)
 		freq = policy->min;
 
-	if (sg_policy->next_freq > freq)
+	if (sg_policy->next_freq > freq) {
+		original = freq;
 		freq = (sg_policy->next_freq + freq) >> 1;
+	}
 
-	if (freq == sg_policy->cached_raw_freq && sg_policy->next_freq != UINT_MAX)
-		return sg_policy->next_freq;
-	sg_policy->cached_raw_freq = freq;
-	return cpufreq_driver_resolve_freq(policy, freq);
+	target = resolve_freq(sg_policy, freq);
+
+	/*
+	 * While reducing frequency if there are no frequencies available
+	 * between "original" and "next_freq", resolve_freq() will return
+	 * next_freq because we always try to find the lowest frequency greater
+	 * than equal to the "freq". Fix that by going directly to the
+	 * "original" frequency in that case.
+	 */
+	if (unlikely(original && target == sg_policy->next_freq))
+		target = resolve_freq(sg_policy, original);
+
+	return target;
 }
 
 static void sugov_get_util(unsigned long *util, unsigned long *max)
-- 
2.13.0.70.g6367777092d9

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] cpufreq: intel_pstate: Provide resolve_freq() to fix regression
  2017-06-09 10:15 [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions Viresh Kumar
  2017-06-09 10:15 ` [PATCH 1/3] cpufreq: schedutil: Restore cached_raw_freq behavior Viresh Kumar
  2017-06-09 10:15 ` [PATCH 2/3] cpufreq: schedutil: Fix selection algorithm while reducing frequency Viresh Kumar
@ 2017-06-09 10:15 ` Viresh Kumar
  2017-06-10  9:26   ` Joel Fernandes
  2017-06-09 12:24 ` [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions Rafael J. Wysocki
  3 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2017-06-09 10:15 UTC (permalink / raw)
  To: Rafael Wysocki, Srinivas Pandruvada, Len Brown, Viresh Kumar
  Cc: linaro-kernel, linux-pm, linux-kernel, Vincent Guittot,
	Juri Lelli, Ingo Molnar, Peter Zijlstra, patrick.bellasi,
	john.ettedgui, Joel Fernandes, Morten Rasmussen

When the schedutil governor calls cpufreq_driver_resolve_freq() for the
intel_pstate (in passive mode) driver, it simply returns the requested
frequency as there is no ->resolve_freq() callback provided.

The result is that get_next_freq() doesn't get a chance to know the
frequency which will be set eventually and we can hit a potential
regression as explained in the following paragraph.

For example, consider the possible range of frequencies as 900 MHz, 1
GHz, 1.1 GHz, and 1.2 GHz. If the current frequency is 1.1 GHz and the
next frequency (based on current utilization) is 1 GHz, then the
schedutil governor will try to set the average of these as the next
frequency (i.e. 1.05 GHz).

Because we always try to find the lowest frequency greater than equal to
the target frequency, the intel_pstate driver will end up setting the
frequency as 1.1 GHz.

Though the sg_policy->next_freq field gets updated with the average
frequency only. And so we will finally select the min frequency when the
next_freq is 1 more than the min frequency as the average then will be
equal to the min frequency. But that will also take lots of iterations
of the schedutil update callbacks.

Fix that by providing a resolve_freq() callback.

Tested on desktop with Intel Skylake processors.

Fixes: 39b64aa1c007 ("cpufreq: schedutil: Reduce frequencies slower")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/intel_pstate.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 029a93bfb558..e177352180c3 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2213,6 +2213,19 @@ static int intel_cpufreq_target(struct cpufreq_policy *policy,
 	return 0;
 }
 
+unsigned int intel_cpufreq_resolve_freq(struct cpufreq_policy *policy,
+					unsigned int target_freq)
+{
+	struct cpudata *cpu = all_cpu_data[policy->cpu];
+	int target_pstate;
+
+	update_turbo_state();
+
+	target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
+	target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
+	return target_pstate * cpu->pstate.scaling;
+}
+
 static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
 					      unsigned int target_freq)
 {
@@ -2246,6 +2259,7 @@ static struct cpufreq_driver intel_cpufreq = {
 	.flags		= CPUFREQ_CONST_LOOPS,
 	.verify		= intel_cpufreq_verify_policy,
 	.target		= intel_cpufreq_target,
+	.resolve_freq	= intel_cpufreq_resolve_freq,
 	.fast_switch	= intel_cpufreq_fast_switch,
 	.init		= intel_cpufreq_cpu_init,
 	.exit		= intel_pstate_cpu_exit,
-- 
2.13.0.70.g6367777092d9

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions
  2017-06-09 10:15 [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions Viresh Kumar
                   ` (2 preceding siblings ...)
  2017-06-09 10:15 ` [PATCH 3/3] cpufreq: intel_pstate: Provide resolve_freq() to fix regression Viresh Kumar
@ 2017-06-09 12:24 ` Rafael J. Wysocki
  2017-06-09 12:32   ` Viresh Kumar
  3 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2017-06-09 12:24 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Lists linaro-kernel, Linux PM,
	Linux Kernel Mailing List, Vincent Guittot, Juri Lelli,
	Ingo Molnar, Peter Zijlstra, Patrick Bellasi, John,
	Srinivas Pandruvada, Joel Fernandes, Morten Rasmussen

Hi,

On Fri, Jun 9, 2017 at 12:15 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Hi Rafael,
>
> I have identified some regressions with the schedutil governor which
> happen due to one of your patches that got merged in 4.12-rc1.
>
> This series fixes all the drivers which provide a ->target_index()
> callback but doesn't fix the drivers which provide ->target() callback.
>
> Such platforms need to implement the ->resolve_freq() callback in order
> to get this fixed and I only had hardware for testing intel_pstate,
> which I fixed in this series.
>
> I am wondering if there is another way to fix this issue (than what I
> tried) or if we should revert the offending commit (39b64aa1c007) and
> look for other solutions.

To my eyes, patch [1/3] fixes the problem and then the remaining ones
deal with the issues resulting from that.

I'd rather revert and revisit at this point.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions
  2017-06-09 12:24 ` [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions Rafael J. Wysocki
@ 2017-06-09 12:32   ` Viresh Kumar
  2017-06-09 13:25     ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2017-06-09 12:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Lists linaro-kernel, Linux PM,
	Linux Kernel Mailing List, Vincent Guittot, Juri Lelli,
	Ingo Molnar, Peter Zijlstra, Patrick Bellasi, John,
	Srinivas Pandruvada, Joel Fernandes, Morten Rasmussen

On 9 June 2017 at 17:54, Rafael J. Wysocki <rafael@kernel.org> wrote:
> Hi,
>
> On Fri, Jun 9, 2017 at 12:15 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> Hi Rafael,
>>
>> I have identified some regressions with the schedutil governor which
>> happen due to one of your patches that got merged in 4.12-rc1.
>>
>> This series fixes all the drivers which provide a ->target_index()
>> callback but doesn't fix the drivers which provide ->target() callback.
>>
>> Such platforms need to implement the ->resolve_freq() callback in order
>> to get this fixed and I only had hardware for testing intel_pstate,
>> which I fixed in this series.
>>
>> I am wondering if there is another way to fix this issue (than what I
>> tried) or if we should revert the offending commit (39b64aa1c007) and
>> look for other solutions.
>
> To my eyes, patch [1/3] fixes the problem and then the remaining ones
> deal with the issues resulting from that.

So I saw the issue reported and fixed by 2/3 first and noticed 1/3 while
doing code reviews. So, 1/3 isn't the culprit really as the problem happens
without it as well.

--
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions
  2017-06-09 12:32   ` Viresh Kumar
@ 2017-06-09 13:25     ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2017-06-09 13:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Rafael Wysocki, Lists linaro-kernel, Linux PM,
	Linux Kernel Mailing List, Vincent Guittot, Juri Lelli,
	Ingo Molnar, Peter Zijlstra, Patrick Bellasi, John,
	Srinivas Pandruvada, Joel Fernandes, Morten Rasmussen

On Fri, Jun 9, 2017 at 2:32 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 9 June 2017 at 17:54, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> Hi,
>>
>> On Fri, Jun 9, 2017 at 12:15 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>> Hi Rafael,
>>>
>>> I have identified some regressions with the schedutil governor which
>>> happen due to one of your patches that got merged in 4.12-rc1.
>>>
>>> This series fixes all the drivers which provide a ->target_index()
>>> callback but doesn't fix the drivers which provide ->target() callback.
>>>
>>> Such platforms need to implement the ->resolve_freq() callback in order
>>> to get this fixed and I only had hardware for testing intel_pstate,
>>> which I fixed in this series.
>>>
>>> I am wondering if there is another way to fix this issue (than what I
>>> tried) or if we should revert the offending commit (39b64aa1c007) and
>>> look for other solutions.
>>
>> To my eyes, patch [1/3] fixes the problem and then the remaining ones
>> deal with the issues resulting from that.
>
> So I saw the issue reported and fixed by 2/3 first and noticed 1/3 while
> doing code reviews. So, 1/3 isn't the culprit really as the problem happens
> without it as well.

I see.

OK, let me consider that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] cpufreq: schedutil: Fix selection algorithm while reducing frequency
  2017-06-09 10:15 ` [PATCH 2/3] cpufreq: schedutil: Fix selection algorithm while reducing frequency Viresh Kumar
@ 2017-06-10  9:11   ` Joel Fernandes
  2017-06-11  6:21     ` Joel Fernandes
  0 siblings, 1 reply; 13+ messages in thread
From: Joel Fernandes @ 2017-06-10  9:11 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Ingo Molnar, Peter Zijlstra, linaro-kernel,
	Linux PM, LKML, Vincent Guittot, Juri Lelli, Patrick Bellasi,
	john.ettedgui, Srinivas Pandruvada, Morten Rasmussen

On Fri, Jun 9, 2017 at 3:15 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> While reducing frequency if there are no frequencies available between
> "current" and "next" calculated frequency, then the core will never
> select the "next" frequency.
>
> For example, consider the possible range of frequencies as 900 MHz, 1
> GHz, 1.1 GHz, and 1.2 GHz. If the current frequency is 1.1 GHz and the
> next frequency (based on current utilization) is 1 GHz, then the
> schedutil governor will try to set the average of these as the next
> frequency (i.e. 1.05 GHz).
>
> Because we always try to find the lowest frequency greater than equal to
> the target frequency, cpufreq_driver_resolve_freq() will end up
> returning 1.1 GHz only. And we will not be able to reduce the frequency
> eventually. The worst hit is the policy->min frequency as that will
> never get selected after the frequency is increased once.

But once utilization goes to 0, it will select the min frequency
(because it selects lowest frequency >= target)?

>
> This affects all the drivers that provide ->target() or ->target_index()
> callbacks.
>
> Though for cpufreq drivers, like intel_pstate, which provide ->target()
> but not ->resolve_freq() (i.e.  cpufreq_driver_resolve_freq() simply
> returns the next frequency), sg_policy->next_freq gets updated with the
> average frequency. And so we will finally select the min frequency when
> the next_freq is 1 more than the min frequency as the average then will
> be equal to the min frequency. But that will also take lots of
> iterations of the schedutil update callbacks to happen.
>
> Fix that by not using the average value for the next_freq in such cases.
>
> Note that this still doesn't fix the drivers which provide ->target()
> but don't provide ->resolve_freq() (e.g. intel_pstate) and such drivers
> need to be updated to provide the ->resolve_freq() callbacks as well in
> order to fix this.
>
> Fixes: 39b64aa1c007 ("cpufreq: schedutil: Reduce frequencies slower")
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  kernel/sched/cpufreq_schedutil.c | 33 ++++++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 1852bd73d903..30e6a62d227c 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -117,6 +117,17 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>         }
>  }
>
> +static unsigned int resolve_freq(struct sugov_policy *sg_policy,
> +                                unsigned int freq)
> +{
> +       if (freq == sg_policy->cached_raw_freq &&
> +           sg_policy->next_freq != UINT_MAX)
> +               return sg_policy->next_freq;
> +
> +       sg_policy->cached_raw_freq = freq;
> +       return cpufreq_driver_resolve_freq(sg_policy->policy, freq);
> +}
> +
>  /**
>   * get_next_freq - Compute a new frequency for a given cpufreq policy.
>   * @sg_policy: schedutil policy object to compute the new frequency for.
> @@ -145,6 +156,7 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>         struct cpufreq_policy *policy = sg_policy->policy;
>         unsigned int freq = arch_scale_freq_invariant() ?
>                                 policy->cpuinfo.max_freq : policy->cur;
> +       unsigned int target, original = 0;
>
>         freq = (freq + (freq >> 2)) * util / max;
>
> @@ -156,13 +168,24 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>         if (freq < policy->min)
>                 freq = policy->min;
>
> -       if (sg_policy->next_freq > freq)
> +       if (sg_policy->next_freq > freq) {
> +               original = freq;
>                 freq = (sg_policy->next_freq + freq) >> 1;
> +       }
>
> -       if (freq == sg_policy->cached_raw_freq && sg_policy->next_freq != UINT_MAX)
> -               return sg_policy->next_freq;
> -       sg_policy->cached_raw_freq = freq;
> -       return cpufreq_driver_resolve_freq(policy, freq);
> +       target = resolve_freq(sg_policy, freq);
> +
> +       /*
> +        * While reducing frequency if there are no frequencies available
> +        * between "original" and "next_freq", resolve_freq() will return
> +        * next_freq because we always try to find the lowest frequency greater
> +        * than equal to the "freq". Fix that by going directly to the
> +        * "original" frequency in that case.
> +        */
> +       if (unlikely(original && target == sg_policy->next_freq))
> +               target = resolve_freq(sg_policy, original);
> +
> +       return target;
>  }

I thought its confusing to have a special case like this. On one hand
we're saying we'd like to select next frequency to be lowest frequency
>= target, on the other hand we're saying if the target wasn't low
enough to trigger an OPP change, then we'd just rather drop the
frequency to the lower OPP. I get why you'd like to do that, because
with patch 1/3 you're lowering frequency more slower before doing the
cpufreq_resolve, but what if the reduction in utilization was really
small to begin with and not because the "reduce frequencies more
slowly" stuff that you moved in patch 1/3? Then in that case you'd be
falsely dropping frequency when the right thing to do would be to
select the lowest freq >= target?

Thanks,
Joel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] cpufreq: intel_pstate: Provide resolve_freq() to fix regression
  2017-06-09 10:15 ` [PATCH 3/3] cpufreq: intel_pstate: Provide resolve_freq() to fix regression Viresh Kumar
@ 2017-06-10  9:26   ` Joel Fernandes
  2017-06-12  3:41     ` Viresh Kumar
  0 siblings, 1 reply; 13+ messages in thread
From: Joel Fernandes @ 2017-06-10  9:26 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Srinivas Pandruvada, Len Brown, linaro-kernel,
	Linux PM, LKML, Vincent Guittot, Juri Lelli, Ingo Molnar,
	Peter Zijlstra, Patrick Bellasi, john.ettedgui, Morten Rasmussen

On Fri, Jun 9, 2017 at 3:15 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> When the schedutil governor calls cpufreq_driver_resolve_freq() for the
> intel_pstate (in passive mode) driver, it simply returns the requested
> frequency as there is no ->resolve_freq() callback provided.
>
> The result is that get_next_freq() doesn't get a chance to know the
> frequency which will be set eventually and we can hit a potential
> regression as explained in the following paragraph.
>
> For example, consider the possible range of frequencies as 900 MHz, 1
> GHz, 1.1 GHz, and 1.2 GHz. If the current frequency is 1.1 GHz and the
> next frequency (based on current utilization) is 1 GHz, then the
> schedutil governor will try to set the average of these as the next
> frequency (i.e. 1.05 GHz).
>
> Because we always try to find the lowest frequency greater than equal to
> the target frequency, the intel_pstate driver will end up setting the
> frequency as 1.1 GHz.
>
> Though the sg_policy->next_freq field gets updated with the average
> frequency only. And so we will finally select the min frequency when the
> next_freq is 1 more than the min frequency as the average then will be
> equal to the min frequency. But that will also take lots of iterations
> of the schedutil update callbacks.
>
> Fix that by providing a resolve_freq() callback.
>
> Tested on desktop with Intel Skylake processors.
>
> Fixes: 39b64aa1c007 ("cpufreq: schedutil: Reduce frequencies slower")
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/intel_pstate.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 029a93bfb558..e177352180c3 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2213,6 +2213,19 @@ static int intel_cpufreq_target(struct cpufreq_policy *policy,
>         return 0;
>  }
>
> +unsigned int intel_cpufreq_resolve_freq(struct cpufreq_policy *policy,
> +                                       unsigned int target_freq)

Should be defined as static?

Thanks,
Joel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] cpufreq: schedutil: Fix selection algorithm while reducing frequency
  2017-06-10  9:11   ` Joel Fernandes
@ 2017-06-11  6:21     ` Joel Fernandes
  2017-06-12  3:44       ` Viresh Kumar
  0 siblings, 1 reply; 13+ messages in thread
From: Joel Fernandes @ 2017-06-11  6:21 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Ingo Molnar, Peter Zijlstra, linaro-kernel,
	Linux PM, LKML, Vincent Guittot, Juri Lelli, Patrick Bellasi,
	John Ettedgui, Srinivas Pandruvada, Morten Rasmussen

On Sat, Jun 10, 2017 at 2:11 AM, Joel Fernandes <joelaf@google.com> wrote:
> On Fri, Jun 9, 2017 at 3:15 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> While reducing frequency if there are no frequencies available between
>> "current" and "next" calculated frequency, then the core will never
>> select the "next" frequency.
>>
>> For example, consider the possible range of frequencies as 900 MHz, 1
>> GHz, 1.1 GHz, and 1.2 GHz. If the current frequency is 1.1 GHz and the
>> next frequency (based on current utilization) is 1 GHz, then the
>> schedutil governor will try to set the average of these as the next
>> frequency (i.e. 1.05 GHz).
>>
>> Because we always try to find the lowest frequency greater than equal to
>> the target frequency, cpufreq_driver_resolve_freq() will end up
>> returning 1.1 GHz only. And we will not be able to reduce the frequency
>> eventually. The worst hit is the policy->min frequency as that will
>> never get selected after the frequency is increased once.
>
> But once utilization goes to 0, it will select the min frequency
> (because it selects lowest frequency >= target)?

Never mind my comment about util 0, I see the problem you mention.
However I feel that this entire series adds complexity all to handle
the case of a false cache-miss which I think might not be that bad,
and the tradeoff with complexity/readability of the code kind of
negates the benefit. That's just my opinion about it fwiw.

Thanks,
Joel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] cpufreq: intel_pstate: Provide resolve_freq() to fix regression
  2017-06-10  9:26   ` Joel Fernandes
@ 2017-06-12  3:41     ` Viresh Kumar
  0 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2017-06-12  3:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Rafael Wysocki, Srinivas Pandruvada, Len Brown, linaro-kernel,
	Linux PM, LKML, Vincent Guittot, Juri Lelli, Ingo Molnar,
	Peter Zijlstra, Patrick Bellasi, john.ettedgui, Morten Rasmussen

On 10-06-17, 02:26, Joel Fernandes wrote:
> On Fri, Jun 9, 2017 at 3:15 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > When the schedutil governor calls cpufreq_driver_resolve_freq() for the
> > intel_pstate (in passive mode) driver, it simply returns the requested
> > frequency as there is no ->resolve_freq() callback provided.
> >
> > The result is that get_next_freq() doesn't get a chance to know the
> > frequency which will be set eventually and we can hit a potential
> > regression as explained in the following paragraph.
> >
> > For example, consider the possible range of frequencies as 900 MHz, 1
> > GHz, 1.1 GHz, and 1.2 GHz. If the current frequency is 1.1 GHz and the
> > next frequency (based on current utilization) is 1 GHz, then the
> > schedutil governor will try to set the average of these as the next
> > frequency (i.e. 1.05 GHz).
> >
> > Because we always try to find the lowest frequency greater than equal to
> > the target frequency, the intel_pstate driver will end up setting the
> > frequency as 1.1 GHz.
> >
> > Though the sg_policy->next_freq field gets updated with the average
> > frequency only. And so we will finally select the min frequency when the
> > next_freq is 1 more than the min frequency as the average then will be
> > equal to the min frequency. But that will also take lots of iterations
> > of the schedutil update callbacks.
> >
> > Fix that by providing a resolve_freq() callback.
> >
> > Tested on desktop with Intel Skylake processors.
> >
> > Fixes: 39b64aa1c007 ("cpufreq: schedutil: Reduce frequencies slower")
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> >  drivers/cpufreq/intel_pstate.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > index 029a93bfb558..e177352180c3 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -2213,6 +2213,19 @@ static int intel_cpufreq_target(struct cpufreq_policy *policy,
> >         return 0;
> >  }
> >
> > +unsigned int intel_cpufreq_resolve_freq(struct cpufreq_policy *policy,
> > +                                       unsigned int target_freq)
> 
> Should be defined as static?

Yes.

-- 
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] cpufreq: schedutil: Fix selection algorithm while reducing frequency
  2017-06-11  6:21     ` Joel Fernandes
@ 2017-06-12  3:44       ` Viresh Kumar
  2017-06-12 12:33         ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2017-06-12  3:44 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Rafael Wysocki, Ingo Molnar, Peter Zijlstra, linaro-kernel,
	Linux PM, LKML, Vincent Guittot, Juri Lelli, Patrick Bellasi,
	John Ettedgui, Srinivas Pandruvada, Morten Rasmussen

On 10-06-17, 23:21, Joel Fernandes wrote:
> On Sat, Jun 10, 2017 at 2:11 AM, Joel Fernandes <joelaf@google.com> wrote:
> > On Fri, Jun 9, 2017 at 3:15 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >> While reducing frequency if there are no frequencies available between
> >> "current" and "next" calculated frequency, then the core will never
> >> select the "next" frequency.
> >>
> >> For example, consider the possible range of frequencies as 900 MHz, 1
> >> GHz, 1.1 GHz, and 1.2 GHz. If the current frequency is 1.1 GHz and the
> >> next frequency (based on current utilization) is 1 GHz, then the
> >> schedutil governor will try to set the average of these as the next
> >> frequency (i.e. 1.05 GHz).
> >>
> >> Because we always try to find the lowest frequency greater than equal to
> >> the target frequency, cpufreq_driver_resolve_freq() will end up
> >> returning 1.1 GHz only. And we will not be able to reduce the frequency
> >> eventually. The worst hit is the policy->min frequency as that will
> >> never get selected after the frequency is increased once.
> >
> > But once utilization goes to 0, it will select the min frequency
> > (because it selects lowest frequency >= target)?
> 
> Never mind my comment about util 0, I see the problem you mention.
> However I feel that this entire series adds complexity all to handle
> the case of a false cache-miss which I think might not be that bad,
> and the tradeoff with complexity/readability of the code kind of
> negates the benefit. That's just my opinion about it fwiw.

Right and that's why I said in the cover letter that we may want to revert the
offending commit for the time being as the solutions provided here have too much
dependency on the resolve_freq() callback.

-- 
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] cpufreq: schedutil: Fix selection algorithm while reducing frequency
  2017-06-12  3:44       ` Viresh Kumar
@ 2017-06-12 12:33         ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2017-06-12 12:33 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Joel Fernandes, Rafael Wysocki, Ingo Molnar, Peter Zijlstra,
	Lists linaro-kernel, Linux PM, LKML, Vincent Guittot, Juri Lelli,
	Patrick Bellasi, John Ettedgui, Srinivas Pandruvada,
	Morten Rasmussen

On Mon, Jun 12, 2017 at 5:44 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 10-06-17, 23:21, Joel Fernandes wrote:
>> On Sat, Jun 10, 2017 at 2:11 AM, Joel Fernandes <joelaf@google.com> wrote:
>> > On Fri, Jun 9, 2017 at 3:15 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> >> While reducing frequency if there are no frequencies available between
>> >> "current" and "next" calculated frequency, then the core will never
>> >> select the "next" frequency.
>> >>
>> >> For example, consider the possible range of frequencies as 900 MHz, 1
>> >> GHz, 1.1 GHz, and 1.2 GHz. If the current frequency is 1.1 GHz and the
>> >> next frequency (based on current utilization) is 1 GHz, then the
>> >> schedutil governor will try to set the average of these as the next
>> >> frequency (i.e. 1.05 GHz).
>> >>
>> >> Because we always try to find the lowest frequency greater than equal to
>> >> the target frequency, cpufreq_driver_resolve_freq() will end up
>> >> returning 1.1 GHz only. And we will not be able to reduce the frequency
>> >> eventually. The worst hit is the policy->min frequency as that will
>> >> never get selected after the frequency is increased once.
>> >
>> > But once utilization goes to 0, it will select the min frequency
>> > (because it selects lowest frequency >= target)?
>>
>> Never mind my comment about util 0, I see the problem you mention.
>> However I feel that this entire series adds complexity all to handle
>> the case of a false cache-miss which I think might not be that bad,
>> and the tradeoff with complexity/readability of the code kind of
>> negates the benefit. That's just my opinion about it fwiw.
>
> Right and that's why I said in the cover letter that we may want to revert the
> offending commit for the time being as the solutions provided here have too much
> dependency on the resolve_freq() callback.

So I've decided to revert that commit for 4.12.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-06-12 12:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-09 10:15 [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions Viresh Kumar
2017-06-09 10:15 ` [PATCH 1/3] cpufreq: schedutil: Restore cached_raw_freq behavior Viresh Kumar
2017-06-09 10:15 ` [PATCH 2/3] cpufreq: schedutil: Fix selection algorithm while reducing frequency Viresh Kumar
2017-06-10  9:11   ` Joel Fernandes
2017-06-11  6:21     ` Joel Fernandes
2017-06-12  3:44       ` Viresh Kumar
2017-06-12 12:33         ` Rafael J. Wysocki
2017-06-09 10:15 ` [PATCH 3/3] cpufreq: intel_pstate: Provide resolve_freq() to fix regression Viresh Kumar
2017-06-10  9:26   ` Joel Fernandes
2017-06-12  3:41     ` Viresh Kumar
2017-06-09 12:24 ` [PATCH 0/3] cpufreq: schedutil: Fix 4.12 regressions Rafael J. Wysocki
2017-06-09 12:32   ` Viresh Kumar
2017-06-09 13:25     ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).