linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] cpufreq: intel_pstate: Handle powersave governor correctly in the passive mode with HWP
@ 2020-11-05 18:17 Rafael J. Wysocki
  2020-11-05 18:23 ` [PATCH 1/2] cpufreq: Introduce target min and max frequency hints Rafael J. Wysocki
  2020-11-05 18:25 ` [PATCH 2/2] cpufreq: intel_pstate: Take target_min and target_max into account Rafael J. Wysocki
  0 siblings, 2 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2020-11-05 18:17 UTC (permalink / raw)
  To: Linux PM
  Cc: Rafael J. Wysocki, Viresh Kumar, Srinivas Pandruvada, Zhang Rui, LKML

Hi,

Even after the changes made very recently, the handling of the powersave
governor is not exactly as expected when intel_pstate operates in the
"passive" mode with HWP enabled.

Namely, in that case HWP is not limited to the policy min frequency, but it
can scale the frequency up to the policy max limit and it cannot be constrained
currently, because the governor has no way to tell the driver how much room
there is for adjustments around the target frequency passed to it.

For this reason, patch [1/2] introduces new policy parameters, target_min and
target_max, that can be used by the governor to pass that information to the
driver and modifies the powersave and peformance governors to use them.

Patch [2/2] modifies intel_pstate to take them into account so as to fix the
powersave governor issue, but they may be applicable for other purposes in the
future (eg. if the driver is updated to pass the "desired" P-state to the HWP
logic instead of just setting the HWP floor to the target one, both the
powersave and performance governors will need target_min and target_max to
basically work as documented).

Thanks!




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] cpufreq: Introduce target min and max frequency hints
  2020-11-05 18:17 [PATCH 0/2] cpufreq: intel_pstate: Handle powersave governor correctly in the passive mode with HWP Rafael J. Wysocki
@ 2020-11-05 18:23 ` Rafael J. Wysocki
  2020-11-06  1:49   ` Doug Smythies
  2020-11-06 10:07   ` Viresh Kumar
  2020-11-05 18:25 ` [PATCH 2/2] cpufreq: intel_pstate: Take target_min and target_max into account Rafael J. Wysocki
  1 sibling, 2 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2020-11-05 18:23 UTC (permalink / raw)
  To: Linux PM
  Cc: Rafael J. Wysocki, Viresh Kumar, Srinivas Pandruvada, Zhang Rui, LKML

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Some cpufreq drivers, like intel_pstate (in the passive mode with
HWP enabled) or the CPPC driver, take the "target frequency" coming
from the governor as a hint to pass to the hardware rather than the
exact value to apply.  Then, the hardware may choose to run at
whatever performance point it regards as appropriate, given the
hint and some other data available to it.

Of course, the performance point chosen by the hardware should
stay within the policy min and max limits, but in some cases it may
be necessary to request the hardware to limit the range of
performance points to consider beyond that.

For example, if the powersave governor is in use, it attempts to
make the hardware run at the policy min frequency, but that may
not actually work if the hardware thinks that it has a reason to
run faster and the policy max limit is above the policy min.

In those cases, it is useful to pass additional information to the
driver to indicate that it should tell the hardware to consider a
narrower range of performance points, so add two new fields,
target_min and target_max, to struct cpufreq_policy for this purpose
and make the powersave and performance governors set them to indicate
that the CPU is expected to run exactly at the given frequency (the
policy min or max, respectively).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c             |    3 +++
 drivers/cpufreq/cpufreq_performance.c |    4 ++++
 drivers/cpufreq/cpufreq_powersave.c   |    4 ++++
 include/linux/cpufreq.h               |   16 ++++++++++++++++
 4 files changed, 27 insertions(+)

Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -63,6 +63,8 @@ struct cpufreq_policy {
 
 	unsigned int		min;    /* in kHz */
 	unsigned int		max;    /* in kHz */
+	unsigned int		target_min; /* in kHz */
+	unsigned int		target_max; /* in kHz */
 	unsigned int		cur;    /* in kHz, only needed if cpufreq
 					 * governors are used */
 	unsigned int		suspend_freq; /* freq to set during suspend */
Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -2272,6 +2272,9 @@ static int cpufreq_init_governor(struct
 
 	pr_debug("%s: for CPU %u\n", __func__, policy->cpu);
 
+	policy->target_min = policy->cpuinfo.min_freq;
+	policy->target_max = policy->cpuinfo.max_freq;
+
 	if (policy->governor->init) {
 		ret = policy->governor->init(policy);
 		if (ret) {
Index: linux-pm/drivers/cpufreq/cpufreq_performance.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_performance.c
+++ linux-pm/drivers/cpufreq/cpufreq_performance.c
@@ -14,6 +14,10 @@
 static void cpufreq_gov_performance_limits(struct cpufreq_policy *policy)
 {
 	pr_debug("setting to %u kHz\n", policy->max);
+
+	policy->target_min = policy->max;
+	policy->target_max = policy->max;
+
 	__cpufreq_driver_target(policy, policy->max, CPUFREQ_RELATION_H);
 }
 
Index: linux-pm/drivers/cpufreq/cpufreq_powersave.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_powersave.c
+++ linux-pm/drivers/cpufreq/cpufreq_powersave.c
@@ -14,6 +14,10 @@
 static void cpufreq_gov_powersave_limits(struct cpufreq_policy *policy)
 {
 	pr_debug("setting to %u kHz\n", policy->min);
+
+	policy->target_min = policy->min;
+	policy->target_max = policy->min;
+
 	__cpufreq_driver_target(policy, policy->min, CPUFREQ_RELATION_L);
 }
 




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] cpufreq: intel_pstate: Take target_min and target_max into account
  2020-11-05 18:17 [PATCH 0/2] cpufreq: intel_pstate: Handle powersave governor correctly in the passive mode with HWP Rafael J. Wysocki
  2020-11-05 18:23 ` [PATCH 1/2] cpufreq: Introduce target min and max frequency hints Rafael J. Wysocki
@ 2020-11-05 18:25 ` Rafael J. Wysocki
  1 sibling, 0 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2020-11-05 18:25 UTC (permalink / raw)
  To: Linux PM
  Cc: Rafael J. Wysocki, Viresh Kumar, Srinivas Pandruvada, Zhang Rui, LKML

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the intel_pstate driver take the new target_min and target_max
cpufreq policy parameters into accout when it operates in the passive
mode with HWP enabled, so as to fix the "powersave" governor behavior
in that case (currently, HWP is allowed to scale the performance all
the way up to the policy max limit when the "powersave" governor is
used, but it should be contrained to the policy min limit then).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |   32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2527,7 +2527,7 @@ static void intel_cpufreq_trace(struct c
 }
 
 static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 target_pstate,
-				     bool fast_switch)
+				     u32 target_max, bool fast_switch)
 {
 	u64 prev = READ_ONCE(cpu->hwp_req_cached), value = prev;
 
@@ -2539,7 +2539,7 @@ static void intel_cpufreq_adjust_hwp(str
 	 * field in it, so opportunistically update the max too if needed.
 	 */
 	value &= ~HWP_MAX_PERF(~0L);
-	value |= HWP_MAX_PERF(cpu->max_perf_ratio);
+	value |= HWP_MAX_PERF(target_max);
 
 	if (value == prev)
 		return;
@@ -2562,19 +2562,31 @@ static void intel_cpufreq_adjust_perf_ct
 			      pstate_funcs.get_val(cpu, target_pstate));
 }
 
-static int intel_cpufreq_update_pstate(struct cpudata *cpu, int target_pstate,
-				       bool fast_switch)
+static int intel_cpufreq_update_pstate(struct cpufreq_policy *policy,
+				       int target_pstate, bool fast_switch)
 {
+	struct cpudata *cpu = all_cpu_data[policy->cpu];
 	int old_pstate = cpu->pstate.current_pstate;
 
-	target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
 	if (hwp_active) {
-		intel_cpufreq_adjust_hwp(cpu, target_pstate, fast_switch);
-		cpu->pstate.current_pstate = target_pstate;
+		int min_pstate = max(cpu->pstate.min_pstate, cpu->min_perf_ratio);
+		int max_pstate = max(min_pstate, cpu->max_perf_ratio);
+		int target_min = DIV_ROUND_UP(policy->target_min,
+					      cpu->pstate.scaling);
+		int target_max = policy->target_max / cpu->pstate.scaling;
+
+		target_min = clamp_t(int, target_min, min_pstate, max_pstate);
+		target_max = clamp_t(int, target_max, min_pstate, max_pstate);
+
+		target_pstate = clamp_t(int, target_pstate, target_min, target_max);
+
+		intel_cpufreq_adjust_hwp(cpu, target_pstate, target_max, fast_switch);
 	} else if (target_pstate != old_pstate) {
+		target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
+
 		intel_cpufreq_adjust_perf_ctl(cpu, target_pstate, fast_switch);
-		cpu->pstate.current_pstate = target_pstate;
 	}
+	cpu->pstate.current_pstate = target_pstate;
 
 	intel_cpufreq_trace(cpu, fast_switch ? INTEL_PSTATE_TRACE_FAST_SWITCH :
 			    INTEL_PSTATE_TRACE_TARGET, old_pstate);
@@ -2609,7 +2621,7 @@ static int intel_cpufreq_target(struct c
 		break;
 	}
 
-	target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, false);
+	target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, false);
 
 	freqs.new = target_pstate * cpu->pstate.scaling;
 
@@ -2628,7 +2640,7 @@ static unsigned int intel_cpufreq_fast_s
 
 	target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
 
-	target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, true);
+	target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, true);
 
 	return target_pstate * cpu->pstate.scaling;
 }




^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/2] cpufreq: Introduce target min and max frequency hints
  2020-11-05 18:23 ` [PATCH 1/2] cpufreq: Introduce target min and max frequency hints Rafael J. Wysocki
@ 2020-11-06  1:49   ` Doug Smythies
  2020-11-06 10:07   ` Viresh Kumar
  1 sibling, 0 replies; 8+ messages in thread
From: Doug Smythies @ 2020-11-06  1:49 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Linux PM'
  Cc: 'Rafael J. Wysocki', 'Viresh Kumar',
	'Srinivas Pandruvada', 'Zhang Rui',
	'LKML'

Hi Rafael:

Thank you for this patch set.

I can not get the patch to apply.
I was trying on top on 5.10-rc2, and have been unable to determine
what other patches might need to be applied first.

On 2020.11.05 10:24 Rafael J. Wysocki wrote:

...

> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/cpufreq.c             |    3 +++
>  drivers/cpufreq/cpufreq_performance.c |    4 ++++
>  drivers/cpufreq/cpufreq_powersave.c   |    4 ++++
>  include/linux/cpufreq.h               |   16 ++++++++++++++++

I do not understand why this part says to look for 16
differences, but I can only find 2.

>  4 files changed, 27 insertions(+)
> 
> Index: linux-pm/include/linux/cpufreq.h
> ===================================================================
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -63,6 +63,8 @@ struct cpufreq_policy {
> 
>  	unsigned int		min;    /* in kHz */
>  	unsigned int		max;    /* in kHz */
> +	unsigned int		target_min; /* in kHz */
> +	unsigned int		target_max; /* in kHz */
>  	unsigned int		cur;    /* in kHz, only needed if cpufreq
>  					 * governors are used */
>  	unsigned int		suspend_freq; /* freq to set during suspend */
> Index: linux-pm/drivers/cpufreq/cpufreq.c

...

Anyway, I edited the patch, deleting the include/linux/cpufreq.h part,
then it applied, as did patch 2 of 2.
I edited include/linux/cpufreq.h manually.

Issues with the powersave governor reported in [1] and [2]
are fixed. Relevant part quoted and updated below:

> In early September Doug wrote:
>> powersave governor:
>> acpi-cpufreq: good
>> intel_cpufreq hwp: bad

Now good, with this patch set.

>> intel_cpufreq no hwp: good

...

> For the powersave governor, this is what we have now:
> 
> intel_cpufreq hwp == intel_pstate hwp
> intel_cpufreq no hwp == acpi-cpufreq == always minimum freq
> intel_pstate no hwp ~= acpi-cpufreq/ondemand

...

> My expectation was/is:
> 
> intel_cpufreq hwp == intel_cpufreq no hwp == acpi-cpufreq == always minimum freq

And this is what we now have, with this patch set.

> intel_pstate no hwp ~= acpi-cpufreq/ondemand
> intel_pstate hwp == Unique. Say, extremely course version of ondemand.

[1] https://marc.info/?l=linux-pm&m=159769839401767&w=2
[2] https://marc.info/?l=linux-pm&m=159943780220923&w=2

... Doug



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] cpufreq: Introduce target min and max frequency hints
  2020-11-05 18:23 ` [PATCH 1/2] cpufreq: Introduce target min and max frequency hints Rafael J. Wysocki
  2020-11-06  1:49   ` Doug Smythies
@ 2020-11-06 10:07   ` Viresh Kumar
  2020-11-06 17:02     ` Rafael J. Wysocki
  1 sibling, 1 reply; 8+ messages in thread
From: Viresh Kumar @ 2020-11-06 10:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Rafael J. Wysocki, Srinivas Pandruvada, Zhang Rui, LKML

On 05-11-20, 19:23, Rafael J. Wysocki wrote:
> Index: linux-pm/include/linux/cpufreq.h
> ===================================================================
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -63,6 +63,8 @@ struct cpufreq_policy {
>  
>  	unsigned int		min;    /* in kHz */
>  	unsigned int		max;    /* in kHz */
> +	unsigned int		target_min; /* in kHz */
> +	unsigned int		target_max; /* in kHz */
>  	unsigned int		cur;    /* in kHz, only needed if cpufreq
>  					 * governors are used */
>  	unsigned int		suspend_freq; /* freq to set during suspend */

Rafael, honestly speaking I didn't like this patch very much. We need
to fix a very specific problem with the intel-pstate driver when it is
used with powersave/performance governor to make sure the hard limits
are enforced. And this is something which no one else may face as
well.

What about doing something like this instead in the intel_pstate
driver only to get this fixed ?

        if (!strcmp(policy->governor->name, "powersave") ||
            !strcmp(policy->governor->name, "performance"))
                hard-limit-to-be-enforced;

This would be a much simpler and contained approach IMHO.

-- 
viresh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] cpufreq: Introduce target min and max frequency hints
  2020-11-06 10:07   ` Viresh Kumar
@ 2020-11-06 17:02     ` Rafael J. Wysocki
  2020-11-09  4:39       ` Viresh Kumar
  0 siblings, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2020-11-06 17:02 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Linux PM, Rafael J. Wysocki,
	Srinivas Pandruvada, Zhang Rui, LKML

On Fri, Nov 6, 2020 at 11:07 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 05-11-20, 19:23, Rafael J. Wysocki wrote:
> > Index: linux-pm/include/linux/cpufreq.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/cpufreq.h
> > +++ linux-pm/include/linux/cpufreq.h
> > @@ -63,6 +63,8 @@ struct cpufreq_policy {
> >
> >       unsigned int            min;    /* in kHz */
> >       unsigned int            max;    /* in kHz */
> > +     unsigned int            target_min; /* in kHz */
> > +     unsigned int            target_max; /* in kHz */
> >       unsigned int            cur;    /* in kHz, only needed if cpufreq
> >                                        * governors are used */
> >       unsigned int            suspend_freq; /* freq to set during suspend */
>
> Rafael, honestly speaking I didn't like this patch very much.

So what's the concern, specifically?

> We need to fix a very specific problem with the intel-pstate driver when it is
> used with powersave/performance governor to make sure the hard limits
> are enforced. And this is something which no one else may face as
> well.

Well, I predict that the CPPC driver will face this problem too at one point.

As well as any other driver which doesn't select OPPs directly for
that matter, at least to some extent (note that intel_pstate in the
"passive" mode without HWP has it too, but since there is no way to
enforce the target max in that case, it is not relevant).

> What about doing something like this instead in the intel_pstate
> driver only to get this fixed ?
>
>         if (!strcmp(policy->governor->name, "powersave") ||
>             !strcmp(policy->governor->name, "performance"))
>                 hard-limit-to-be-enforced;
>
> This would be a much simpler and contained approach IMHO.

I obviously prefer to do it the way I did in this series, because it
is more general and it is based on the governor telling the driver
what is needed instead of the driver trying to figure out what the
governor is and guessing what may be needed because of that.

But if you have a very specific technical concern regarding my
approach, I can do it the other way too.

Cheers!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] cpufreq: Introduce target min and max frequency hints
  2020-11-06 17:02     ` Rafael J. Wysocki
@ 2020-11-09  4:39       ` Viresh Kumar
  2020-11-09 12:27         ` Rafael J. Wysocki
  0 siblings, 1 reply; 8+ messages in thread
From: Viresh Kumar @ 2020-11-09  4:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Srinivas Pandruvada, Zhang Rui, LKML

On 06-11-20, 18:02, Rafael J. Wysocki wrote:
> On Fri, Nov 6, 2020 at 11:07 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 05-11-20, 19:23, Rafael J. Wysocki wrote:
> > > Index: linux-pm/include/linux/cpufreq.h
> > > ===================================================================
> > > --- linux-pm.orig/include/linux/cpufreq.h
> > > +++ linux-pm/include/linux/cpufreq.h
> > > @@ -63,6 +63,8 @@ struct cpufreq_policy {
> > >
> > >       unsigned int            min;    /* in kHz */
> > >       unsigned int            max;    /* in kHz */
> > > +     unsigned int            target_min; /* in kHz */
> > > +     unsigned int            target_max; /* in kHz */
> > >       unsigned int            cur;    /* in kHz, only needed if cpufreq
> > >                                        * governors are used */
> > >       unsigned int            suspend_freq; /* freq to set during suspend */
> >
> > Rafael, honestly speaking I didn't like this patch very much.
> 
> So what's the concern, specifically?
> 
> > We need to fix a very specific problem with the intel-pstate driver when it is
> > used with powersave/performance governor to make sure the hard limits
> > are enforced. And this is something which no one else may face as
> > well.
> 
> Well, I predict that the CPPC driver will face this problem too at one point.
> 
> As well as any other driver which doesn't select OPPs directly for
> that matter, at least to some extent (note that intel_pstate in the
> "passive" mode without HWP has it too, but since there is no way to
> enforce the target max in that case, it is not relevant).
> 
> > What about doing something like this instead in the intel_pstate
> > driver only to get this fixed ?
> >
> >         if (!strcmp(policy->governor->name, "powersave") ||
> >             !strcmp(policy->governor->name, "performance"))
> >                 hard-limit-to-be-enforced;
> >
> > This would be a much simpler and contained approach IMHO.
> 
> I obviously prefer to do it the way I did in this series, because it
> is more general and it is based on the governor telling the driver
> what is needed instead of the driver trying to figure out what the
> governor is and guessing what may be needed because of that.
> 
> But if you have a very specific technical concern regarding my
> approach, I can do it the other way too.

I was concerned about adding those fields in the policy structure, but
I get that you want to do it in a more generic way.

What about adding a field name "fixed" (or something else) in the
governor's structure which tells us that the frequency is fixed and
must be honored by the driver.

-- 
viresh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] cpufreq: Introduce target min and max frequency hints
  2020-11-09  4:39       ` Viresh Kumar
@ 2020-11-09 12:27         ` Rafael J. Wysocki
  0 siblings, 0 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2020-11-09 12:27 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM,
	Srinivas Pandruvada, Zhang Rui, LKML

On Mon, Nov 9, 2020 at 5:39 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 06-11-20, 18:02, Rafael J. Wysocki wrote:
> > On Fri, Nov 6, 2020 at 11:07 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 05-11-20, 19:23, Rafael J. Wysocki wrote:
> > > > Index: linux-pm/include/linux/cpufreq.h
> > > > ===================================================================
> > > > --- linux-pm.orig/include/linux/cpufreq.h
> > > > +++ linux-pm/include/linux/cpufreq.h
> > > > @@ -63,6 +63,8 @@ struct cpufreq_policy {
> > > >
> > > >       unsigned int            min;    /* in kHz */
> > > >       unsigned int            max;    /* in kHz */
> > > > +     unsigned int            target_min; /* in kHz */
> > > > +     unsigned int            target_max; /* in kHz */
> > > >       unsigned int            cur;    /* in kHz, only needed if cpufreq
> > > >                                        * governors are used */
> > > >       unsigned int            suspend_freq; /* freq to set during suspend */
> > >
> > > Rafael, honestly speaking I didn't like this patch very much.
> >
> > So what's the concern, specifically?
> >
> > > We need to fix a very specific problem with the intel-pstate driver when it is
> > > used with powersave/performance governor to make sure the hard limits
> > > are enforced. And this is something which no one else may face as
> > > well.
> >
> > Well, I predict that the CPPC driver will face this problem too at one point.
> >
> > As well as any other driver which doesn't select OPPs directly for
> > that matter, at least to some extent (note that intel_pstate in the
> > "passive" mode without HWP has it too, but since there is no way to
> > enforce the target max in that case, it is not relevant).
> >
> > > What about doing something like this instead in the intel_pstate
> > > driver only to get this fixed ?
> > >
> > >         if (!strcmp(policy->governor->name, "powersave") ||
> > >             !strcmp(policy->governor->name, "performance"))
> > >                 hard-limit-to-be-enforced;
> > >
> > > This would be a much simpler and contained approach IMHO.
> >
> > I obviously prefer to do it the way I did in this series, because it
> > is more general and it is based on the governor telling the driver
> > what is needed instead of the driver trying to figure out what the
> > governor is and guessing what may be needed because of that.
> >
> > But if you have a very specific technical concern regarding my
> > approach, I can do it the other way too.
>
> I was concerned about adding those fields in the policy structure, but
> I get that you want to do it in a more generic way.
>
> What about adding a field name "fixed" (or something else) in the
> governor's structure which tells us that the frequency is fixed and
> must be honored by the driver.

That would work for powersave/performance and it would suffice for the
time being, so let me try to implement that.

Still, there is a more general problem related to that which is how to
prevent the perf control in the hardware from going beyond certain
limits, possibly narrower than the policy min and max.

For example, the kernel may need to reserve some capacity for deadline
tasks or similar, or when there is a min utilization clamp in place,
and it would be good to have a way to let the HW know that it should
not reduce the available capacity below a certain boundary, even
though that may appear to be the right thing to do to it. [This is
kind of addressed by intel_pstate by setting the HWP floor to the
target frequency requested by the governor, but that is suboptimal,
because it generally causes too much capacity to be reserved which
costs energy.]

Analogously, the kernel may not want the HW to increase capacity too
much when it knows that doing so would not increase the amount of work
done or when the work being done is not urgent (like when there is a
max utilization clamp in place),  [This last issue is particularly
visible in some GPU-related workloads where the processor sees
conditions for ramping up a "one-core turbo" frequency very high, but
this is a mistake, because it doesn't cause work to be done any
faster, since the task doing the work is in fact periodic and it does
the same amount of work in every period regardless of how fast the CPU
doing it runs.]

So while the powersave/performance case can be addressed in a simpler
way, the need for a more general approach is still there.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-09 12:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-05 18:17 [PATCH 0/2] cpufreq: intel_pstate: Handle powersave governor correctly in the passive mode with HWP Rafael J. Wysocki
2020-11-05 18:23 ` [PATCH 1/2] cpufreq: Introduce target min and max frequency hints Rafael J. Wysocki
2020-11-06  1:49   ` Doug Smythies
2020-11-06 10:07   ` Viresh Kumar
2020-11-06 17:02     ` Rafael J. Wysocki
2020-11-09  4:39       ` Viresh Kumar
2020-11-09 12:27         ` Rafael J. Wysocki
2020-11-05 18:25 ` [PATCH 2/2] cpufreq: intel_pstate: Take target_min and target_max into account Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).