[1/2] cpufreq: intel_pstate: Avoid missing HWP max updates in passive mode
diff mbox series

Message ID 76352140.UXiy1LajID@kreacher
State New, archived
Headers show
Series
  • cpufreq: intel_pstate: Avoid missing HWP max limit updates with powersave governor
Related show

Commit Message

Rafael J. Wysocki Oct. 22, 2020, 11:57 a.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

If the cpufreq policy max limit is changed when intel_pstate operates
in the passive mode with HWP enabled and the "powersave" governor is
used on top of it, the HWP max limit is not updated as appropriate.

Namely, in the "powersave" governor case, the target P-state
is always equal to the policy min limit, so if the latter does
not change, the "target_freq == policy->cur" check in
__cpufreq_driver_target() is "true" and the
"target_pstate != old_pstate" check in intel_cpufreq_update_pstate()
is "false", so intel_cpufreq_adjust_hwp() is not invoked to update
the HWP Request MSR and the HWP max limit is not updated as a result.

To prevent that from occurring, modify __cpufreq_driver_target()
to do the "target_freq == policy->cur" check only in the frequency
table case and change intel_cpufreq_update_pstate() to do the
"target_pstate != old_pstate" check only in the non-HWP case and
let intel_cpufreq_adjust_hwp() always run in the HWP case (it will
update HWP Request only if the cached value of the register is
different from the new one including the limits, so if neither the
target P-state value nor the max limit changes, the register write
will still be avoided).

Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Reported-by: Zhang Rui <rui.zhang@intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c      |    6 +++---
 drivers/cpufreq/intel_pstate.c |   12 +++++-------
 2 files changed, 8 insertions(+), 10 deletions(-)

Comments

Viresh Kumar Oct. 23, 2020, 6:10 a.m. UTC | #1
On 22-10-20, 13:57, Rafael J. Wysocki wrote:
> Index: linux-pm/drivers/cpufreq/cpufreq.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/cpufreq.c
> +++ linux-pm/drivers/cpufreq/cpufreq.c
> @@ -2182,6 +2182,9 @@ int __cpufreq_driver_target(struct cpufr
>  	pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n",
>  		 policy->cpu, target_freq, relation, old_target_freq);
>  
> +	if (cpufreq_driver->target)
> +		return cpufreq_driver->target(policy, target_freq, relation);
> +
>  	/*
>  	 * This might look like a redundant call as we are checking it again
>  	 * after finding index. But it is left intentionally for cases where
> @@ -2194,9 +2197,6 @@ int __cpufreq_driver_target(struct cpufr
>  	/* Save last value to restore later on errors */
>  	policy->restore_freq = policy->cur;
>  
> -	if (cpufreq_driver->target)
> -		return cpufreq_driver->target(policy, target_freq, relation);
> -
>  	if (!cpufreq_driver->target_index)
>  		return -EINVAL;

From what I understood, you want your driver to get notified about
policy->min/max changes and right now they are making it work from
within the target() callback. Your commit log talks about policy->max
and powersave combination, I think the same will be true in case of
policy->min and performance ? And also with any other governor (like
schedutil) if the target_freq doesn't change for a while.

And IMHO, this change is more like a band-aid which is going to remove
the check of target != cur for all target() type drivers (which aren't
many) and it feels like a penalty on them (which is also there for
intel-cpufreq without hwp), and that we will get into the same problem
for target_index() drivers as well if they want to do something
similar in future, i.e. skip checking for same-freq.

Maybe adding a new flag for the cpufreq-driver for force-updates would
be a better solution ? Which will make this very much driver
dependent.
Rafael J. Wysocki Oct. 23, 2020, 11:57 a.m. UTC | #2
On Fri, Oct 23, 2020 at 8:10 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 22-10-20, 13:57, Rafael J. Wysocki wrote:
> > Index: linux-pm/drivers/cpufreq/cpufreq.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/cpufreq.c
> > +++ linux-pm/drivers/cpufreq/cpufreq.c
> > @@ -2182,6 +2182,9 @@ int __cpufreq_driver_target(struct cpufr
> >       pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n",
> >                policy->cpu, target_freq, relation, old_target_freq);
> >
> > +     if (cpufreq_driver->target)
> > +             return cpufreq_driver->target(policy, target_freq, relation);
> > +
> >       /*
> >        * This might look like a redundant call as we are checking it again
> >        * after finding index. But it is left intentionally for cases where
> > @@ -2194,9 +2197,6 @@ int __cpufreq_driver_target(struct cpufr
> >       /* Save last value to restore later on errors */
> >       policy->restore_freq = policy->cur;
> >
> > -     if (cpufreq_driver->target)
> > -             return cpufreq_driver->target(policy, target_freq, relation);
> > -
> >       if (!cpufreq_driver->target_index)
> >               return -EINVAL;
>
> From what I understood, you want your driver to get notified about
> policy->min/max changes and right now they are making it work from
> within the target() callback.

Not exactly.

The driver needs to update some internal upper and lower boundary
values along with the target freq and skipping the update when target
freq doesn't change prevents it from doing that.

The policy min and max changes are communicated to the driver via the
governor ->limits() callback and that can only call
__cpufreq_driver_target() then or defer the update to the next
->fast_switch() invocation.  Either way, the driver needs to have a
chance to carry out the full update even if the target frequency
doesn't change, but the policy min or max limits may have changed.

> Your commit log talks about policy->max and powersave combination,

Yes, it does, because that's a very clearly visible symptom.

> I think the same will be true in case of policy->min and performance ?

It might in theory, but it is not in practice, because the HWP min is
set to the target freq (and the target freq is already clamped between
the policy min and max).

But generally speaking you are right, this would be a problem for any
driver having to update some internal upper and lower boundary
settings along with the target freq.

> And also with any other governor (like schedutil) if the target_freq doesn't change for a while.

Well, yes.

A change of one of the limits that doesn't cause the target to change
may be missed in general.

> And IMHO, this change is more like a band-aid which is going to remove
> the check of target != cur for all target() type drivers (which aren't
> many) and it feels like a penalty on them (which is also there for
> intel-cpufreq without hwp), and that we will get into the same problem
> for target_index() drivers as well if they want to do something
> similar in future, i.e. skip checking for same-freq.
>
> Maybe adding a new flag for the cpufreq-driver for force-updates would
> be a better solution ? Which will make this very much driver
> dependent.

Fair enough, I'll add a driver flag for that.

Also I split the patch into the core part and the driver-specific part
for clarity.

Thanks!

Patch
diff mbox series

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2550,14 +2550,12 @@  static int intel_cpufreq_update_pstate(s
 	int old_pstate = cpu->pstate.current_pstate;
 
 	target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
-	if (target_pstate != old_pstate) {
+	if (hwp_active) {
+		intel_cpufreq_adjust_hwp(cpu, target_pstate, fast_switch);
+		cpu->pstate.current_pstate = target_pstate;
+	} else if (target_pstate != old_pstate) {
+		intel_cpufreq_adjust_perf_ctl(cpu, target_pstate, fast_switch);
 		cpu->pstate.current_pstate = target_pstate;
-		if (hwp_active)
-			intel_cpufreq_adjust_hwp(cpu, target_pstate,
-						 fast_switch);
-		else
-			intel_cpufreq_adjust_perf_ctl(cpu, target_pstate,
-						      fast_switch);
 	}
 
 	intel_cpufreq_trace(cpu, fast_switch ? INTEL_PSTATE_TRACE_FAST_SWITCH :
Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -2182,6 +2182,9 @@  int __cpufreq_driver_target(struct cpufr
 	pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n",
 		 policy->cpu, target_freq, relation, old_target_freq);
 
+	if (cpufreq_driver->target)
+		return cpufreq_driver->target(policy, target_freq, relation);
+
 	/*
 	 * This might look like a redundant call as we are checking it again
 	 * after finding index. But it is left intentionally for cases where
@@ -2194,9 +2197,6 @@  int __cpufreq_driver_target(struct cpufr
 	/* Save last value to restore later on errors */
 	policy->restore_freq = policy->cur;
 
-	if (cpufreq_driver->target)
-		return cpufreq_driver->target(policy, target_freq, relation);
-
 	if (!cpufreq_driver->target_index)
 		return -EINVAL;