linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change
@ 2019-08-02  5:44 Viresh Kumar
  2019-08-02  5:44 ` [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq() Viresh Kumar
  2019-08-02  9:11 ` [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change Rafael J. Wysocki
  0 siblings, 2 replies; 10+ messages in thread
From: Viresh Kumar @ 2019-08-02  5:44 UTC (permalink / raw)
  To: Rafael Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra
  Cc: linux-pm, Vincent Guittot, v4 . 18+, Doug Smythies, linux-kernel

To avoid reducing the frequency of a CPU prematurely, we skip reducing
the frequency if the CPU had been busy recently.

This should not be done when the limits of the policy are changed, for
example due to thermal throttling. We should always get the frequency
within the new limits as soon as possible.

Trying to fix this by using only one flag, i.e. need_freq_update, can
lead to a race condition where the flag gets cleared without forcing us
to change the frequency at least once. And so this patch introduces
another flag to avoid that race condition.

Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Cc: v4.18+ <stable@vger.kernel.org> # v4.18+
Reported-by: Doug Smythies <doug.smythies@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
V2->V3:
- Updated commit log.

V1->V2:
- Fixed the race condition using a different flag.

@Doug: I haven't changed the code since you last tested these. Your
Tested-by tag can be useful while applying the patches. Thanks.

 kernel/sched/cpufreq_schedutil.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 636ca6f88c8e..2f382b0959e5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -40,6 +40,7 @@ struct sugov_policy {
 	struct task_struct	*thread;
 	bool			work_in_progress;
 
+	bool			limits_changed;
 	bool			need_freq_update;
 };
 
@@ -89,8 +90,11 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 	    !cpufreq_this_cpu_can_update(sg_policy->policy))
 		return false;
 
-	if (unlikely(sg_policy->need_freq_update))
+	if (unlikely(sg_policy->limits_changed)) {
+		sg_policy->limits_changed = false;
+		sg_policy->need_freq_update = true;
 		return true;
+	}
 
 	delta_ns = time - sg_policy->last_freq_update_time;
 
@@ -437,7 +441,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
 static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy)
 {
 	if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)
-		sg_policy->need_freq_update = true;
+		sg_policy->limits_changed = true;
 }
 
 static void sugov_update_single(struct update_util_data *hook, u64 time,
@@ -447,7 +451,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned long util, max;
 	unsigned int next_f;
-	bool busy;
+	bool busy = false;
 
 	sugov_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
@@ -457,7 +461,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
-	busy = sugov_cpu_is_busy(sg_cpu);
+	/* Limits may have changed, don't skip frequency update */
+	if (!sg_policy->need_freq_update)
+		busy = sugov_cpu_is_busy(sg_cpu);
 
 	util = sugov_get_util(sg_cpu);
 	max = sg_cpu->max;
@@ -831,6 +837,7 @@ static int sugov_start(struct cpufreq_policy *policy)
 	sg_policy->last_freq_update_time	= 0;
 	sg_policy->next_freq			= 0;
 	sg_policy->work_in_progress		= false;
+	sg_policy->limits_changed		= false;
 	sg_policy->need_freq_update		= false;
 	sg_policy->cached_raw_freq		= 0;
 
@@ -879,7 +886,7 @@ static void sugov_limits(struct cpufreq_policy *policy)
 		mutex_unlock(&sg_policy->work_lock);
 	}
 
-	sg_policy->need_freq_update = true;
+	sg_policy->limits_changed = true;
 }
 
 struct cpufreq_governor schedutil_gov = {
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq()
  2019-08-02  5:44 [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change Viresh Kumar
@ 2019-08-02  5:44 ` Viresh Kumar
  2019-08-02  9:17   ` Rafael J. Wysocki
  2019-08-02  9:11 ` [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change Rafael J. Wysocki
  1 sibling, 1 reply; 10+ messages in thread
From: Viresh Kumar @ 2019-08-02  5:44 UTC (permalink / raw)
  To: Rafael Wysocki, Srinivas Pandruvada, Len Brown, Viresh Kumar
  Cc: linux-pm, Vincent Guittot, v4 . 18+, Doug Smythies, linux-kernel

Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
which can be used to force a limit on the min/max P state of the driver.
Though these files eventually control the min/max frequencies that the
CPUs will run at, they don't make a change to policy->min/max values.

When the values of these files are changed (in passive mode of the
driver), it leads to calling ->limits() callback of the cpufreq
governors, like schedutil. On a call to it the governors shall
forcefully update the frequency to come within the limits. For getting
the value within limits, the schedutil governor calls
cpufreq_driver_resolve_freq(), which eventually tries to call
->resolve_freq() callback for this driver. Since the callback isn't
present, the schedutil governor fails to get the target freq within
limit and sometimes aborts the update believing that the frequency is
already set to the target value.

This patch implements the resolve_freq() callback, so the correct target
frequency can be returned by the driver and the schedutil governor gets
the frequency within limits immediately.

Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Cc: v4.18+ <stable@vger.kernel.org> # v4.18+
Reported-by: Doug Smythies <doug.smythies@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
V3:
- This was earlier posted as a diff to an email reply and is getting
  sent for the first time only as a proper patch.

 drivers/cpufreq/intel_pstate.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index cc27d4c59dca..2d84361fbebc 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2314,6 +2314,18 @@ static int intel_cpufreq_target(struct cpufreq_policy *policy,
 	return 0;
 }
 
+static unsigned int intel_cpufreq_resolve_freq(struct cpufreq_policy *policy,
+					       unsigned int target_freq)
+{
+	struct cpudata *cpu = all_cpu_data[policy->cpu];
+	int target_pstate;
+
+	target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
+	target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
+
+	return target_pstate * cpu->pstate.scaling;
+}
+
 static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
 					      unsigned int target_freq)
 {
@@ -2350,6 +2362,7 @@ static struct cpufreq_driver intel_cpufreq = {
 	.verify		= intel_cpufreq_verify_policy,
 	.target		= intel_cpufreq_target,
 	.fast_switch	= intel_cpufreq_fast_switch,
+	.resolve_freq	= intel_cpufreq_resolve_freq,
 	.init		= intel_cpufreq_cpu_init,
 	.exit		= intel_pstate_cpu_exit,
 	.stop_cpu	= intel_cpufreq_stop_cpu,
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change
  2019-08-02  5:44 [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change Viresh Kumar
  2019-08-02  5:44 ` [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq() Viresh Kumar
@ 2019-08-02  9:11 ` Rafael J. Wysocki
  2019-08-03  0:00   ` Doug Smythies
  1 sibling, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2019-08-02  9:11 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Ingo Molnar, Peter Zijlstra, Linux PM,
	Vincent Guittot, v4 . 18+,
	Doug Smythies, Linux Kernel Mailing List

On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> To avoid reducing the frequency of a CPU prematurely, we skip reducing
> the frequency if the CPU had been busy recently.
>
> This should not be done when the limits of the policy are changed, for
> example due to thermal throttling. We should always get the frequency
> within the new limits as soon as possible.
>
> Trying to fix this by using only one flag, i.e. need_freq_update, can
> lead to a race condition where the flag gets cleared without forcing us
> to change the frequency at least once. And so this patch introduces
> another flag to avoid that race condition.
>
> Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
> Cc: v4.18+ <stable@vger.kernel.org> # v4.18+
> Reported-by: Doug Smythies <doug.smythies@gmail.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> V2->V3:
> - Updated commit log.
>
> V1->V2:
> - Fixed the race condition using a different flag.
>
> @Doug: I haven't changed the code since you last tested these. Your
> Tested-by tag can be useful while applying the patches. Thanks.
>
>  kernel/sched/cpufreq_schedutil.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 636ca6f88c8e..2f382b0959e5 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -40,6 +40,7 @@ struct sugov_policy {
>         struct task_struct      *thread;
>         bool                    work_in_progress;
>
> +       bool                    limits_changed;
>         bool                    need_freq_update;
>  };
>
> @@ -89,8 +90,11 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>             !cpufreq_this_cpu_can_update(sg_policy->policy))
>                 return false;
>
> -       if (unlikely(sg_policy->need_freq_update))
> +       if (unlikely(sg_policy->limits_changed)) {
> +               sg_policy->limits_changed = false;
> +               sg_policy->need_freq_update = true;
>                 return true;
> +       }
>
>         delta_ns = time - sg_policy->last_freq_update_time;
>
> @@ -437,7 +441,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
>  static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy)
>  {
>         if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)
> -               sg_policy->need_freq_update = true;
> +               sg_policy->limits_changed = true;
>  }
>
>  static void sugov_update_single(struct update_util_data *hook, u64 time,
> @@ -447,7 +451,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>         unsigned long util, max;
>         unsigned int next_f;
> -       bool busy;
> +       bool busy = false;

This shouldn't be necessary ->

>
>         sugov_iowait_boost(sg_cpu, time, flags);
>         sg_cpu->last_update = time;
> @@ -457,7 +461,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>         if (!sugov_should_update_freq(sg_policy, time))
>                 return;
>
> -       busy = sugov_cpu_is_busy(sg_cpu);
> +       /* Limits may have changed, don't skip frequency update */
> +       if (!sg_policy->need_freq_update)
> +               busy = sugov_cpu_is_busy(sg_cpu);

-> if this is rewritten as

busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);

which is simpler and avoids the extra branch.


>
>         util = sugov_get_util(sg_cpu);
>         max = sg_cpu->max;
> @@ -831,6 +837,7 @@ static int sugov_start(struct cpufreq_policy *policy)
>         sg_policy->last_freq_update_time        = 0;
>         sg_policy->next_freq                    = 0;
>         sg_policy->work_in_progress             = false;
> +       sg_policy->limits_changed               = false;
>         sg_policy->need_freq_update             = false;
>         sg_policy->cached_raw_freq              = 0;
>
> @@ -879,7 +886,7 @@ static void sugov_limits(struct cpufreq_policy *policy)
>                 mutex_unlock(&sg_policy->work_lock);
>         }
>
> -       sg_policy->need_freq_update = true;
> +       sg_policy->limits_changed = true;
>  }
>
>  struct cpufreq_governor schedutil_gov = {
> --
> 2.21.0.rc0.269.g1a574e7a288b
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq()
  2019-08-02  5:44 ` [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq() Viresh Kumar
@ 2019-08-02  9:17   ` Rafael J. Wysocki
  2019-08-02  9:28     ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2019-08-02  9:17 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Srinivas Pandruvada, Len Brown, Linux PM,
	Vincent Guittot, v4 . 18+,
	Doug Smythies, Linux Kernel Mailing List

On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
> which can be used to force a limit on the min/max P state of the driver.
> Though these files eventually control the min/max frequencies that the
> CPUs will run at, they don't make a change to policy->min/max values.

That's correct.

> When the values of these files are changed (in passive mode of the
> driver), it leads to calling ->limits() callback of the cpufreq
> governors, like schedutil. On a call to it the governors shall
> forcefully update the frequency to come within the limits.

OK, so the problem is that it is a bug to invoke the governor's ->limits()
callback without updating policy->min/max, because that's what
"limits" mean to the governors.

Fair enough.

> For getting the value within limits, the schedutil governor calls
> cpufreq_driver_resolve_freq(), which eventually tries to call
> ->resolve_freq() callback for this driver. Since the callback isn't
> present, the schedutil governor fails to get the target freq within
> limit and sometimes aborts the update believing that the frequency is
> already set to the target value.
>
> This patch implements the resolve_freq() callback, so the correct target
> frequency can be returned by the driver and the schedutil governor gets
> the frequency within limits immediately.

So the problem is that ->resolve_freq() adds overhead and it adds that
overhead even if the limits don't change.  It just sits there and computes
things every time even if that is completely redundant.

So no, this is not the right way to fix it IMO.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq()
  2019-08-02  9:17   ` Rafael J. Wysocki
@ 2019-08-02  9:28     ` Rafael J. Wysocki
  2019-08-03 15:00       ` Doug Smythies
  2019-08-06  4:10       ` Viresh Kumar
  0 siblings, 2 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2019-08-02  9:28 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Srinivas Pandruvada, Len Brown, Linux PM, Vincent Guittot,
	v4 . 18+,
	Doug Smythies, Linux Kernel Mailing List

On Friday, August 2, 2019 11:17:55 AM CEST Rafael J. Wysocki wrote:
> On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
> > which can be used to force a limit on the min/max P state of the driver.
> > Though these files eventually control the min/max frequencies that the
> > CPUs will run at, they don't make a change to policy->min/max values.
> 
> That's correct.
> 
> > When the values of these files are changed (in passive mode of the
> > driver), it leads to calling ->limits() callback of the cpufreq
> > governors, like schedutil. On a call to it the governors shall
> > forcefully update the frequency to come within the limits.
> 
> OK, so the problem is that it is a bug to invoke the governor's ->limits()
> callback without updating policy->min/max, because that's what
> "limits" mean to the governors.
> 
> Fair enough.

AFAICS this can be addressed by adding PM QoS freq limits requests of each CPU to
intel_pstate in the passive mode such that changing min_perf_pct or max_perf_pct
will cause these requests to be updated.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change
  2019-08-02  9:11 ` [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change Rafael J. Wysocki
@ 2019-08-03  0:00   ` Doug Smythies
  0 siblings, 0 replies; 10+ messages in thread
From: Doug Smythies @ 2019-08-03  0:00 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Viresh Kumar'
  Cc: 'Rafael Wysocki', 'Ingo Molnar',
	'Peter Zijlstra', 'Linux PM',
	'Vincent Guittot', 'v4 . 18+',
	'Doug Smythies', 'Linux Kernel Mailing List'

On 2019.08.02 02:12 Rafael J. Wysocki wrote:
> On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>
>> To avoid reducing the frequency of a CPU prematurely, we skip reducing
>> the frequency if the CPU had been busy recently.
>>
>> This should not be done when the limits of the policy are changed, for
>> example due to thermal throttling. We should always get the frequency
>> within the new limits as soon as possible.
>>
>> Trying to fix this by using only one flag, i.e. need_freq_update, can
>> lead to a race condition where the flag gets cleared without forcing us
>> to change the frequency at least once. And so this patch introduces
>>  another flag to avoid that race condition.
>>
>> Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
>> Cc: v4.18+ <stable@vger.kernel.org> # v4.18+
>> Reported-by: Doug Smythies <doug.smythies@gmail.com>
>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> ---
>> V2->V3:
>> - Updated commit log.
>>
>> V1->V2:
>> - Fixed the race condition using a different flag.
>>
>> @Doug: I haven't changed the code since you last tested these. Your
>> Tested-by tag can be useful while applying the patches. Thanks.

Tested-by: Doug Smythies <dsmythies@telus.net>
For acpi-cpufreq/schedutil only (which we already know).

I tested including Rafael's suggested change.
I.E.

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 592ff72..ae3ec77 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -441,7 +441,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
        struct sugov_policy *sg_policy = sg_cpu->sg_policy;
        unsigned long util, max;
        unsigned int next_f;
-       bool busy = false;
+       bool busy;

        sugov_iowait_boost(sg_cpu, time, flags);
        sg_cpu->last_update = time;
@@ -452,8 +452,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
                return;

        /* Limits may have changed, don't skip frequency update */
-       if (!sg_policy->need_freq_update)
-               busy = sugov_cpu_is_busy(sg_cpu);
+       busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);

        util = sugov_get_util(sg_cpu);
        max = sg_cpu->max;

... Doug



^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq()
  2019-08-02  9:28     ` Rafael J. Wysocki
@ 2019-08-03 15:00       ` Doug Smythies
  2019-08-05  8:35         ` Rafael J. Wysocki
  2019-08-06  4:10       ` Viresh Kumar
  1 sibling, 1 reply; 10+ messages in thread
From: Doug Smythies @ 2019-08-03 15:00 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Viresh Kumar'
  Cc: 'Srinivas Pandruvada', 'Len Brown',
	'Linux PM', 'Vincent Guittot', 'v4 . 18+',
	'Doug Smythies', 'Linux Kernel Mailing List'

On 2019.08.02 02:28 Rafael J. Wysocki wrote:
> On Friday, August 2, 2019 11:17:55 AM CEST Rafael J. Wysocki wrote:
>> On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>
>>> Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
>>> which can be used to force a limit on the min/max P state of the driver.
>>> Though these files eventually control the min/max frequencies that the
>>> CPUs will run at, they don't make a change to policy->min/max values.
>> 
>> That's correct.
>> 
>>> When the values of these files are changed (in passive mode of the
>>> driver), it leads to calling ->limits() callback of the cpufreq
>>> governors, like schedutil. On a call to it the governors shall
>>> forcefully update the frequency to come within the limits.
>> 
>> OK, so the problem is that it is a bug to invoke the governor's ->limits()
>> callback without updating policy->min/max, because that's what
>> "limits" mean to the governors.
>> 
>> Fair enough.
>
> AFAICS this can be addressed by adding PM QoS freq limits requests of each CPU to
> intel_pstate in the passive mode such that changing min_perf_pct or max_perf_pct
> will cause these requests to be updated.

All governors for the intel_cpufreq (intel_pstate in passive mode) CPU frequency
scaling driver are broken with respect to this issue, not just the schedutil
governor. My initial escalation had been focused on acpi-cpufreq/schedutil
and intel_cpufreq/schedutil, as they were both broken, and both fixed by my initially
submitted reversion. What can I say, I missed that other intel_cpufreq governors
were also involved.

I tested all of them: conservative ondemand userspace powersave performance schedutil
Note that no other governor uses resolve_freq().

... Doug



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq()
  2019-08-03 15:00       ` Doug Smythies
@ 2019-08-05  8:35         ` Rafael J. Wysocki
  0 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2019-08-05  8:35 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rafael J. Wysocki, Viresh Kumar, Srinivas Pandruvada, Len Brown,
	Linux PM, Vincent Guittot, v4 . 18+,
	Doug Smythies, Linux Kernel Mailing List

On Sat, Aug 3, 2019 at 5:00 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2019.08.02 02:28 Rafael J. Wysocki wrote:
> > On Friday, August 2, 2019 11:17:55 AM CEST Rafael J. Wysocki wrote:
> >> On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>>
> >>> Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
> >>> which can be used to force a limit on the min/max P state of the driver.
> >>> Though these files eventually control the min/max frequencies that the
> >>> CPUs will run at, they don't make a change to policy->min/max values.
> >>
> >> That's correct.
> >>
> >>> When the values of these files are changed (in passive mode of the
> >>> driver), it leads to calling ->limits() callback of the cpufreq
> >>> governors, like schedutil. On a call to it the governors shall
> >>> forcefully update the frequency to come within the limits.
> >>
> >> OK, so the problem is that it is a bug to invoke the governor's ->limits()
> >> callback without updating policy->min/max, because that's what
> >> "limits" mean to the governors.
> >>
> >> Fair enough.
> >
> > AFAICS this can be addressed by adding PM QoS freq limits requests of each CPU to
> > intel_pstate in the passive mode such that changing min_perf_pct or max_perf_pct
> > will cause these requests to be updated.
>
> All governors for the intel_cpufreq (intel_pstate in passive mode) CPU frequency
> scaling driver are broken with respect to this issue, not just the schedutil
> governor.

Right.

My point is that that changing min_perf_pct or max_perf_pct should
cause policy limits to be updated (which is not the case now) instead
of running special driver code on every frequency update just in case
the limits have changed in the meantime.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq()
  2019-08-02  9:28     ` Rafael J. Wysocki
  2019-08-03 15:00       ` Doug Smythies
@ 2019-08-06  4:10       ` Viresh Kumar
  2019-08-06  8:05         ` Rafael J. Wysocki
  1 sibling, 1 reply; 10+ messages in thread
From: Viresh Kumar @ 2019-08-06  4:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srinivas Pandruvada, Len Brown, Linux PM, Vincent Guittot,
	v4 . 18+,
	Doug Smythies, Linux Kernel Mailing List

On 02-08-19, 11:28, Rafael J. Wysocki wrote:
> On Friday, August 2, 2019 11:17:55 AM CEST Rafael J. Wysocki wrote:
> > On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
> > > which can be used to force a limit on the min/max P state of the driver.
> > > Though these files eventually control the min/max frequencies that the
> > > CPUs will run at, they don't make a change to policy->min/max values.
> > 
> > That's correct.
> > 
> > > When the values of these files are changed (in passive mode of the
> > > driver), it leads to calling ->limits() callback of the cpufreq
> > > governors, like schedutil. On a call to it the governors shall
> > > forcefully update the frequency to come within the limits.
> > 
> > OK, so the problem is that it is a bug to invoke the governor's ->limits()
> > callback without updating policy->min/max, because that's what
> > "limits" mean to the governors.
> > 
> > Fair enough.
> 
> AFAICS this can be addressed by adding PM QoS freq limits requests of each CPU to
> intel_pstate in the passive mode such that changing min_perf_pct or max_perf_pct
> will cause these requests to be updated.

Right, that sounds like a good plan.

But that will never make it to the stable kernels as there will be a
long dependency of otherwise unrelated patches to get that done. My
initial thought was to get this patch merged as it is and then later
migrate to QoS, but since this patch doesn't fix ondemand and
conservative, this patch isn't good enough as well.

Maybe we should add the regular notifier based solution first, mark it
for stable kernels, and then add the QoS specific solution ?

-- 
viresh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq()
  2019-08-06  4:10       ` Viresh Kumar
@ 2019-08-06  8:05         ` Rafael J. Wysocki
  0 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2019-08-06  8:05 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Srinivas Pandruvada, Len Brown, Linux PM,
	Vincent Guittot, v4 . 18+,
	Doug Smythies, Linux Kernel Mailing List

On Tue, Aug 6, 2019 at 6:10 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 02-08-19, 11:28, Rafael J. Wysocki wrote:
> > On Friday, August 2, 2019 11:17:55 AM CEST Rafael J. Wysocki wrote:
> > > On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > >
> > > > Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
> > > > which can be used to force a limit on the min/max P state of the driver.
> > > > Though these files eventually control the min/max frequencies that the
> > > > CPUs will run at, they don't make a change to policy->min/max values.
> > >
> > > That's correct.
> > >
> > > > When the values of these files are changed (in passive mode of the
> > > > driver), it leads to calling ->limits() callback of the cpufreq
> > > > governors, like schedutil. On a call to it the governors shall
> > > > forcefully update the frequency to come within the limits.
> > >
> > > OK, so the problem is that it is a bug to invoke the governor's ->limits()
> > > callback without updating policy->min/max, because that's what
> > > "limits" mean to the governors.
> > >
> > > Fair enough.
> >
> > AFAICS this can be addressed by adding PM QoS freq limits requests of each CPU to
> > intel_pstate in the passive mode such that changing min_perf_pct or max_perf_pct
> > will cause these requests to be updated.
>
> Right, that sounds like a good plan.
>
> But that will never make it to the stable kernels as there will be a
> long dependency of otherwise unrelated patches to get that done. My
> initial thought was to get this patch merged as it is and then later
> migrate to QoS, but since this patch doesn't fix ondemand and
> conservative, this patch isn't good enough as well.

Right.

> Maybe we should add the regular notifier based solution first, mark it
> for stable kernels, and then add the QoS specific solution ?

I'm not sure if -stable kernels really need a fix here.

Let's just make sure that the mainline is OK and let's go straight for
the final approach.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-06  8:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-02  5:44 [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change Viresh Kumar
2019-08-02  5:44 ` [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq() Viresh Kumar
2019-08-02  9:17   ` Rafael J. Wysocki
2019-08-02  9:28     ` Rafael J. Wysocki
2019-08-03 15:00       ` Doug Smythies
2019-08-05  8:35         ` Rafael J. Wysocki
2019-08-06  4:10       ` Viresh Kumar
2019-08-06  8:05         ` Rafael J. Wysocki
2019-08-02  9:11 ` [PATCH V3 1/2] cpufreq: schedutil: Don't skip freq update when limits change Rafael J. Wysocki
2019-08-03  0:00   ` Doug Smythies

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).