linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Doug Smythies" <dsmythies@telus.net>
To: "'Viresh Kumar'" <viresh.kumar@linaro.org>
Cc: <linux-pm@vger.kernel.org>,
	"'Vincent Guittot'" <vincent.guittot@linaro.org>,
	<linux-kernel@vger.kernel.org>,
	"'Rafael Wysocki'" <rjw@rjwysocki.net>,
	"'Srinivas Pandruvada'" <srinivas.pandruvada@linux.intel.com>,
	"'Len Brown'" <lenb@kernel.org>
Subject: RE: [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
Date: Thu, 8 Aug 2019 09:25:34 -0700	[thread overview]
Message-ID: <000601d54e05$e93d0130$bbb70390$@net> (raw)
In-Reply-To: <e789eceae3f32a66fff923daeb85b33b88f21fe1.1565161495.git.viresh.kumar@linaro.org>

On 2019.08.07 00:06 Viresh Kumar wrote:

Thanks for your work on this.

> Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
> which can be used to force a limit on the min/max P state of the driver.
> Though these files eventually control the min/max frequencies that the
> CPUs will run at, they don't make a change to policy->min/max values.
>
> When the values of these files are changed (in passive mode of the
> driver), it leads to calling ->limits() callback of the cpufreq
> governors, like schedutil. On a call to it the governors shall
> forcefully update the frequency to come within the limits. Since the
> limits, i.e.  policy->min/max, aren't updated by the driver, the
> governors fails to get the target freq within limit and sometimes aborts
> the update believing that the frequency is already set to the target
> value.
>
> This patch implements the QoS supported frequency constraints to update
> policy->min/max values whenever min_perf_pct or max_perf_pct files are
> updated. This is only done for the passive mode as of now, as the driver
> is already working fine in active mode.
>
> Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
> Reported-by: Doug Smythies <dsmythies@telus.net>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Tested by: Doug Smythies <dsmythies@telus.net>
Thermald seems to now be working O.K. for all the governors.

I do note that if one sets
/sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
It seems to override subsequent attempts via
/sys/devices/system/cpu/intel_pstate/max_perf_pct.
Myself, I find this confusing.

So the question becomes which one is the "master"?

Example:

# for file in /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq; do echo "2200000" > $file; done
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
2200000
2200000
2200000
2200000
2200000
2200000
2200000
2200000
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
... (Note: 50% = 1900000)
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
1900000
1900000
1900000
1900000
1900000
1900000
1900000
1900000
root@s15:/home/doug/temp# echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
... (Note: 50% = 3800000, and my expectation is 3.8 GHz below)
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
2200000
2200000
2200000
2200000
2200000
2200000
2200000
2200000

Similarly for the minimum side of things:

root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq; do echo "3200000" > $file; done
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq
3200000
3200000
3200000
3200000
3200000
3200000
3200000
3200000
root@s15:/home/doug/temp# echo 42 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
42   ... (note 42% = 1600000 = processor minimum, and that is my expectation below.)
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq
3200000
3200000
3200000
3200000
3200000
3200000
3200000
3200000

I thought these minimum anomalies would cause problems for thermald, but
for whatever reason, it seems to work properly.

> ---
> V3->V4:
> - Reimplemented the solution using QoS constraints instead of
>   resolve_freq() callback.
>
> drivers/cpufreq/intel_pstate.c | 120 +++++++++++++++++++++++++++++++--
> 1 file changed, 116 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index cc27d4c59dca..e9fbd6c36822 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -24,6 +24,7 @@
>  #include <linux/fs.h>
>  #include <linux/acpi.h>
>  #include <linux/vmalloc.h>
> +#include <linux/pm_qos.h>
>  #include <trace/events/power.h>
>  
>  #include <asm/div64.h>
> @@ -1085,6 +1086,47 @@ static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b,
>  	return count;
>  }
>  
> +static struct cpufreq_driver intel_pstate;
> +
> +static void update_qos_request(enum dev_pm_qos_req_type type)
> +{
> +	int max_state, turbo_max, freq, i, perf_pct;
> +	struct dev_pm_qos_request *req;
> +	struct cpufreq_policy *policy;
> +
> +	for_each_possible_cpu(i) {
> +		struct cpudata *cpu = all_cpu_data[i];
> +
> +		policy = cpufreq_cpu_get(i);
> +		if (!policy)
> +			continue;
> +
> +		req = policy->driver_data;
> +		cpufreq_cpu_put(policy);
> +
> +		if (!req)
> +			continue;
> +
> +		if (hwp_active)
> +			intel_pstate_get_hwp_max(i, &turbo_max, &max_state);
> +		else
> +			turbo_max = cpu->pstate.turbo_pstate;
> +
> +		if (type == DEV_PM_QOS_MIN_FREQUENCY) {

Is it O.K. to assume if the passed op code is
not DEV_PM_QOS_MIN_FREQUENCY
then it must have been
DEV_PM_QOS_MAX_FREQUENCY
?

It is within this patch, but what about in future?

> +			perf_pct = global.min_perf_pct;
> +		} else {
> +			req++;
> +			perf_pct = global.max_perf_pct;
> +		}
> +
> +		freq = DIV_ROUND_UP(turbo_max * perf_pct, 100);
> +		freq *= cpu->pstate.scaling;
> +
> +		if (dev_pm_qos_update_request(req, freq))
> +			pr_warn("Failed to update freq constraint: CPU%d\n", i);

I get many of these messages (4520 so far, always in groups of 8 (I have 8 CPUs)),
and have yet to figure out exactly why. It seems to actually be working fine.

> +	}
> +}
> +
>  static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
>  				  const char *buf, size_t count)
>  {
> @@ -1108,7 +1150,10 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
>  
>  	mutex_unlock(&intel_pstate_limits_lock);
>  
> -	intel_pstate_update_policies();
> +	if (intel_pstate_driver == &intel_pstate)
> +		intel_pstate_update_policies();
> +	else
> +		update_qos_request(DEV_PM_QOS_MAX_FREQUENCY);
>  
>  	mutex_unlock(&intel_pstate_driver_lock);
>  
> @@ -1139,7 +1184,10 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct kobj_attribute *b,
>  
>  	mutex_unlock(&intel_pstate_limits_lock);
>  
> -	intel_pstate_update_policies();
> +	if (intel_pstate_driver == &intel_pstate)
> +		intel_pstate_update_policies();
> +	else
> +		update_qos_request(DEV_PM_QOS_MIN_FREQUENCY);
>  
>  	mutex_unlock(&intel_pstate_driver_lock);
>  
> @@ -2332,8 +2380,16 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
>  
>  static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  {
> -	int ret = __intel_pstate_cpu_init(policy);
> +	int max_state, turbo_max, min_freq, max_freq, ret;
> +	struct dev_pm_qos_request *req;
> +	struct cpudata *cpu;
> +	struct device *dev;
> +
> +	dev = get_cpu_device(policy->cpu);
> +	if (!dev)
> +		return -ENODEV;
>  
> +	ret = __intel_pstate_cpu_init(policy);
>  	if (ret)
>  		return ret;
>  
> @@ -2342,7 +2398,63 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  	/* This reflects the intel_pstate_get_cpu_pstates() setting. */
>  	policy->cur = policy->cpuinfo.min_freq;
>  
> +	req = kcalloc(2, sizeof(*req), GFP_KERNEL);
> +	if (!req) {
> +		ret = -ENOMEM;
> +		goto pstate_exit;
> +	}
> +
> +	cpu = all_cpu_data[policy->cpu];
> +
> +	if (hwp_active)
> +		intel_pstate_get_hwp_max(policy->cpu, &turbo_max, &max_state);
> +	else
> +		turbo_max = cpu->pstate.turbo_pstate;
> +
> +	min_freq = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100);
> +	min_freq *= cpu->pstate.scaling;
> +	max_freq = DIV_ROUND_UP(turbo_max * global.max_perf_pct, 100);
> +	max_freq *= cpu->pstate.scaling;
> +
> +	ret = dev_pm_qos_add_request(dev, req, DEV_PM_QOS_MIN_FREQUENCY,
> +				     min_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
> +		goto free_req;
> +	}
> +
> +	ret = dev_pm_qos_add_request(dev, req + 1, DEV_PM_QOS_MAX_FREQUENCY,
> +				     max_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
> +		goto remove_min_req;
> +	}
> +
> +	policy->driver_data = req;
> +
>  	return 0;
> +
> +remove_min_req:
> +	dev_pm_qos_remove_request(req);
> +free_req:
> +	kfree(req);
> +pstate_exit:
> +	intel_pstate_exit_perf_limits(policy);
> +
> +	return ret;
> +}
> +
> +static int intel_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> +{
> +	struct dev_pm_qos_request *req;
> +
> +	req = policy->driver_data;
> +
> +	dev_pm_qos_remove_request(req + 1);
> +	dev_pm_qos_remove_request(req);
> +	kfree(req);
> +
> +	return intel_pstate_cpu_exit(policy);
>  }
>  
>  static struct cpufreq_driver intel_cpufreq = {
> @@ -2351,7 +2463,7 @@ static struct cpufreq_driver intel_cpufreq = {
>  	.target		= intel_cpufreq_target,
>  	.fast_switch	= intel_cpufreq_fast_switch,
>  	.init		= intel_cpufreq_cpu_init,
> -	.exit		= intel_pstate_cpu_exit,
> +	.exit		= intel_cpufreq_cpu_exit,
>  	.stop_cpu	= intel_cpufreq_stop_cpu,
>  	.update_limits	= intel_pstate_update_limits,
>  	.name		= "intel_cpufreq",
> -- 
> 2.21.0.rc0.269.g1a574e7a288b



  reply	other threads:[~2019-08-08 16:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-07  7:06 [PATCH V4 1/2] cpufreq: schedutil: Don't skip freq update when limits change Viresh Kumar
2019-08-07  7:06 ` [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints Viresh Kumar
2019-08-08 16:25   ` Doug Smythies [this message]
2019-08-08 16:46     ` Rafael J. Wysocki
2019-08-09  2:16     ` Viresh Kumar
2019-08-09  6:35       ` Doug Smythies
2019-08-09  6:51         ` Viresh Kumar
2019-08-09  2:22   ` [PATCH V5 " Viresh Kumar
2019-08-09  5:48     ` Doug Smythies
2019-08-26  9:18     ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000601d54e05$e93d0130$bbb70390$@net' \
    --to=dsmythies@telus.net \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).