linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 1/2] cpufreq: schedutil: Don't skip freq update when limits change
@ 2019-08-07  7:06 Viresh Kumar
  2019-08-07  7:06 ` [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints Viresh Kumar
  0 siblings, 1 reply; 10+ messages in thread
From: Viresh Kumar @ 2019-08-07  7:06 UTC (permalink / raw)
  To: Rafael Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra
  Cc: linux-pm, Vincent Guittot, v4 . 18+, Doug Smythies, linux-kernel

To avoid reducing the frequency of a CPU prematurely, we skip reducing
the frequency if the CPU had been busy recently.

This should not be done when the limits of the policy are changed, for
example due to thermal throttling. We should always get the frequency
within the new limits as soon as possible.

Trying to fix this by using only one flag, i.e. need_freq_update, can
lead to a race condition where the flag gets cleared without forcing us
to change the frequency at least once. And so this patch introduces
another flag to avoid that race condition.

Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Cc: v4.18+ <stable@vger.kernel.org> # v4.18+
Reported-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
V3->V4:
- Rewrite "if" block to avoid setting variable to false at
  initialization.
- Added Tested-by from Doug.

 kernel/sched/cpufreq_schedutil.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 636ca6f88c8e..867b4bb6d4be 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -40,6 +40,7 @@ struct sugov_policy {
 	struct task_struct	*thread;
 	bool			work_in_progress;
 
+	bool			limits_changed;
 	bool			need_freq_update;
 };
 
@@ -89,8 +90,11 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 	    !cpufreq_this_cpu_can_update(sg_policy->policy))
 		return false;
 
-	if (unlikely(sg_policy->need_freq_update))
+	if (unlikely(sg_policy->limits_changed)) {
+		sg_policy->limits_changed = false;
+		sg_policy->need_freq_update = true;
 		return true;
+	}
 
 	delta_ns = time - sg_policy->last_freq_update_time;
 
@@ -437,7 +441,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
 static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy)
 {
 	if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)
-		sg_policy->need_freq_update = true;
+		sg_policy->limits_changed = true;
 }
 
 static void sugov_update_single(struct update_util_data *hook, u64 time,
@@ -457,7 +461,8 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
-	busy = sugov_cpu_is_busy(sg_cpu);
+	/* Limits may have changed, don't skip frequency update */
+	busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
 
 	util = sugov_get_util(sg_cpu);
 	max = sg_cpu->max;
@@ -831,6 +836,7 @@ static int sugov_start(struct cpufreq_policy *policy)
 	sg_policy->last_freq_update_time	= 0;
 	sg_policy->next_freq			= 0;
 	sg_policy->work_in_progress		= false;
+	sg_policy->limits_changed		= false;
 	sg_policy->need_freq_update		= false;
 	sg_policy->cached_raw_freq		= 0;
 
@@ -879,7 +885,7 @@ static void sugov_limits(struct cpufreq_policy *policy)
 		mutex_unlock(&sg_policy->work_lock);
 	}
 
-	sg_policy->need_freq_update = true;
+	sg_policy->limits_changed = true;
 }
 
 struct cpufreq_governor schedutil_gov = {
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
  2019-08-07  7:06 [PATCH V4 1/2] cpufreq: schedutil: Don't skip freq update when limits change Viresh Kumar
@ 2019-08-07  7:06 ` Viresh Kumar
  2019-08-08 16:25   ` Doug Smythies
  2019-08-09  2:22   ` [PATCH V5 " Viresh Kumar
  0 siblings, 2 replies; 10+ messages in thread
From: Viresh Kumar @ 2019-08-07  7:06 UTC (permalink / raw)
  To: Rafael Wysocki, Srinivas Pandruvada, Len Brown, Viresh Kumar
  Cc: linux-pm, Vincent Guittot, Doug Smythies, linux-kernel

Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
which can be used to force a limit on the min/max P state of the driver.
Though these files eventually control the min/max frequencies that the
CPUs will run at, they don't make a change to policy->min/max values.

When the values of these files are changed (in passive mode of the
driver), it leads to calling ->limits() callback of the cpufreq
governors, like schedutil. On a call to it the governors shall
forcefully update the frequency to come within the limits. Since the
limits, i.e.  policy->min/max, aren't updated by the driver, the
governors fails to get the target freq within limit and sometimes aborts
the update believing that the frequency is already set to the target
value.

This patch implements the QoS supported frequency constraints to update
policy->min/max values whenever min_perf_pct or max_perf_pct files are
updated. This is only done for the passive mode as of now, as the driver
is already working fine in active mode.

Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
V3->V4:
- Reimplemented the solution using QoS constraints instead of
  resolve_freq() callback.

 drivers/cpufreq/intel_pstate.c | 120 +++++++++++++++++++++++++++++++--
 1 file changed, 116 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index cc27d4c59dca..e9fbd6c36822 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -24,6 +24,7 @@
 #include <linux/fs.h>
 #include <linux/acpi.h>
 #include <linux/vmalloc.h>
+#include <linux/pm_qos.h>
 #include <trace/events/power.h>
 
 #include <asm/div64.h>
@@ -1085,6 +1086,47 @@ static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b,
 	return count;
 }
 
+static struct cpufreq_driver intel_pstate;
+
+static void update_qos_request(enum dev_pm_qos_req_type type)
+{
+	int max_state, turbo_max, freq, i, perf_pct;
+	struct dev_pm_qos_request *req;
+	struct cpufreq_policy *policy;
+
+	for_each_possible_cpu(i) {
+		struct cpudata *cpu = all_cpu_data[i];
+
+		policy = cpufreq_cpu_get(i);
+		if (!policy)
+			continue;
+
+		req = policy->driver_data;
+		cpufreq_cpu_put(policy);
+
+		if (!req)
+			continue;
+
+		if (hwp_active)
+			intel_pstate_get_hwp_max(i, &turbo_max, &max_state);
+		else
+			turbo_max = cpu->pstate.turbo_pstate;
+
+		if (type == DEV_PM_QOS_MIN_FREQUENCY) {
+			perf_pct = global.min_perf_pct;
+		} else {
+			req++;
+			perf_pct = global.max_perf_pct;
+		}
+
+		freq = DIV_ROUND_UP(turbo_max * perf_pct, 100);
+		freq *= cpu->pstate.scaling;
+
+		if (dev_pm_qos_update_request(req, freq))
+			pr_warn("Failed to update freq constraint: CPU%d\n", i);
+	}
+}
+
 static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
 				  const char *buf, size_t count)
 {
@@ -1108,7 +1150,10 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
 
 	mutex_unlock(&intel_pstate_limits_lock);
 
-	intel_pstate_update_policies();
+	if (intel_pstate_driver == &intel_pstate)
+		intel_pstate_update_policies();
+	else
+		update_qos_request(DEV_PM_QOS_MAX_FREQUENCY);
 
 	mutex_unlock(&intel_pstate_driver_lock);
 
@@ -1139,7 +1184,10 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct kobj_attribute *b,
 
 	mutex_unlock(&intel_pstate_limits_lock);
 
-	intel_pstate_update_policies();
+	if (intel_pstate_driver == &intel_pstate)
+		intel_pstate_update_policies();
+	else
+		update_qos_request(DEV_PM_QOS_MIN_FREQUENCY);
 
 	mutex_unlock(&intel_pstate_driver_lock);
 
@@ -2332,8 +2380,16 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
 
 static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
-	int ret = __intel_pstate_cpu_init(policy);
+	int max_state, turbo_max, min_freq, max_freq, ret;
+	struct dev_pm_qos_request *req;
+	struct cpudata *cpu;
+	struct device *dev;
+
+	dev = get_cpu_device(policy->cpu);
+	if (!dev)
+		return -ENODEV;
 
+	ret = __intel_pstate_cpu_init(policy);
 	if (ret)
 		return ret;
 
@@ -2342,7 +2398,63 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	/* This reflects the intel_pstate_get_cpu_pstates() setting. */
 	policy->cur = policy->cpuinfo.min_freq;
 
+	req = kcalloc(2, sizeof(*req), GFP_KERNEL);
+	if (!req) {
+		ret = -ENOMEM;
+		goto pstate_exit;
+	}
+
+	cpu = all_cpu_data[policy->cpu];
+
+	if (hwp_active)
+		intel_pstate_get_hwp_max(policy->cpu, &turbo_max, &max_state);
+	else
+		turbo_max = cpu->pstate.turbo_pstate;
+
+	min_freq = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100);
+	min_freq *= cpu->pstate.scaling;
+	max_freq = DIV_ROUND_UP(turbo_max * global.max_perf_pct, 100);
+	max_freq *= cpu->pstate.scaling;
+
+	ret = dev_pm_qos_add_request(dev, req, DEV_PM_QOS_MIN_FREQUENCY,
+				     min_freq);
+	if (ret < 0) {
+		dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
+		goto free_req;
+	}
+
+	ret = dev_pm_qos_add_request(dev, req + 1, DEV_PM_QOS_MAX_FREQUENCY,
+				     max_freq);
+	if (ret < 0) {
+		dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
+		goto remove_min_req;
+	}
+
+	policy->driver_data = req;
+
 	return 0;
+
+remove_min_req:
+	dev_pm_qos_remove_request(req);
+free_req:
+	kfree(req);
+pstate_exit:
+	intel_pstate_exit_perf_limits(policy);
+
+	return ret;
+}
+
+static int intel_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+	struct dev_pm_qos_request *req;
+
+	req = policy->driver_data;
+
+	dev_pm_qos_remove_request(req + 1);
+	dev_pm_qos_remove_request(req);
+	kfree(req);
+
+	return intel_pstate_cpu_exit(policy);
 }
 
 static struct cpufreq_driver intel_cpufreq = {
@@ -2351,7 +2463,7 @@ static struct cpufreq_driver intel_cpufreq = {
 	.target		= intel_cpufreq_target,
 	.fast_switch	= intel_cpufreq_fast_switch,
 	.init		= intel_cpufreq_cpu_init,
-	.exit		= intel_pstate_cpu_exit,
+	.exit		= intel_cpufreq_cpu_exit,
 	.stop_cpu	= intel_cpufreq_stop_cpu,
 	.update_limits	= intel_pstate_update_limits,
 	.name		= "intel_cpufreq",
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
  2019-08-07  7:06 ` [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints Viresh Kumar
@ 2019-08-08 16:25   ` Doug Smythies
  2019-08-08 16:46     ` Rafael J. Wysocki
  2019-08-09  2:16     ` Viresh Kumar
  2019-08-09  2:22   ` [PATCH V5 " Viresh Kumar
  1 sibling, 2 replies; 10+ messages in thread
From: Doug Smythies @ 2019-08-08 16:25 UTC (permalink / raw)
  To: 'Viresh Kumar'
  Cc: linux-pm, 'Vincent Guittot',
	linux-kernel, 'Rafael Wysocki',
	'Srinivas Pandruvada', 'Len Brown'

On 2019.08.07 00:06 Viresh Kumar wrote:

Thanks for your work on this.

> Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
> which can be used to force a limit on the min/max P state of the driver.
> Though these files eventually control the min/max frequencies that the
> CPUs will run at, they don't make a change to policy->min/max values.
>
> When the values of these files are changed (in passive mode of the
> driver), it leads to calling ->limits() callback of the cpufreq
> governors, like schedutil. On a call to it the governors shall
> forcefully update the frequency to come within the limits. Since the
> limits, i.e.  policy->min/max, aren't updated by the driver, the
> governors fails to get the target freq within limit and sometimes aborts
> the update believing that the frequency is already set to the target
> value.
>
> This patch implements the QoS supported frequency constraints to update
> policy->min/max values whenever min_perf_pct or max_perf_pct files are
> updated. This is only done for the passive mode as of now, as the driver
> is already working fine in active mode.
>
> Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
> Reported-by: Doug Smythies <dsmythies@telus.net>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Tested by: Doug Smythies <dsmythies@telus.net>
Thermald seems to now be working O.K. for all the governors.

I do note that if one sets
/sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
It seems to override subsequent attempts via
/sys/devices/system/cpu/intel_pstate/max_perf_pct.
Myself, I find this confusing.

So the question becomes which one is the "master"?

Example:

# for file in /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq; do echo "2200000" > $file; done
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
2200000
2200000
2200000
2200000
2200000
2200000
2200000
2200000
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
... (Note: 50% = 1900000)
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
1900000
1900000
1900000
1900000
1900000
1900000
1900000
1900000
root@s15:/home/doug/temp# echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
... (Note: 50% = 3800000, and my expectation is 3.8 GHz below)
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
2200000
2200000
2200000
2200000
2200000
2200000
2200000
2200000

Similarly for the minimum side of things:

root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq; do echo "3200000" > $file; done
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq
3200000
3200000
3200000
3200000
3200000
3200000
3200000
3200000
root@s15:/home/doug/temp# echo 42 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
42   ... (note 42% = 1600000 = processor minimum, and that is my expectation below.)
root@s15:/home/doug/temp# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq
3200000
3200000
3200000
3200000
3200000
3200000
3200000
3200000

I thought these minimum anomalies would cause problems for thermald, but
for whatever reason, it seems to work properly.

> ---
> V3->V4:
> - Reimplemented the solution using QoS constraints instead of
>   resolve_freq() callback.
>
> drivers/cpufreq/intel_pstate.c | 120 +++++++++++++++++++++++++++++++--
> 1 file changed, 116 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index cc27d4c59dca..e9fbd6c36822 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -24,6 +24,7 @@
>  #include <linux/fs.h>
>  #include <linux/acpi.h>
>  #include <linux/vmalloc.h>
> +#include <linux/pm_qos.h>
>  #include <trace/events/power.h>
>  
>  #include <asm/div64.h>
> @@ -1085,6 +1086,47 @@ static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b,
>  	return count;
>  }
>  
> +static struct cpufreq_driver intel_pstate;
> +
> +static void update_qos_request(enum dev_pm_qos_req_type type)
> +{
> +	int max_state, turbo_max, freq, i, perf_pct;
> +	struct dev_pm_qos_request *req;
> +	struct cpufreq_policy *policy;
> +
> +	for_each_possible_cpu(i) {
> +		struct cpudata *cpu = all_cpu_data[i];
> +
> +		policy = cpufreq_cpu_get(i);
> +		if (!policy)
> +			continue;
> +
> +		req = policy->driver_data;
> +		cpufreq_cpu_put(policy);
> +
> +		if (!req)
> +			continue;
> +
> +		if (hwp_active)
> +			intel_pstate_get_hwp_max(i, &turbo_max, &max_state);
> +		else
> +			turbo_max = cpu->pstate.turbo_pstate;
> +
> +		if (type == DEV_PM_QOS_MIN_FREQUENCY) {

Is it O.K. to assume if the passed op code is
not DEV_PM_QOS_MIN_FREQUENCY
then it must have been
DEV_PM_QOS_MAX_FREQUENCY
?

It is within this patch, but what about in future?

> +			perf_pct = global.min_perf_pct;
> +		} else {
> +			req++;
> +			perf_pct = global.max_perf_pct;
> +		}
> +
> +		freq = DIV_ROUND_UP(turbo_max * perf_pct, 100);
> +		freq *= cpu->pstate.scaling;
> +
> +		if (dev_pm_qos_update_request(req, freq))
> +			pr_warn("Failed to update freq constraint: CPU%d\n", i);

I get many of these messages (4520 so far, always in groups of 8 (I have 8 CPUs)),
and have yet to figure out exactly why. It seems to actually be working fine.

> +	}
> +}
> +
>  static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
>  				  const char *buf, size_t count)
>  {
> @@ -1108,7 +1150,10 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
>  
>  	mutex_unlock(&intel_pstate_limits_lock);
>  
> -	intel_pstate_update_policies();
> +	if (intel_pstate_driver == &intel_pstate)
> +		intel_pstate_update_policies();
> +	else
> +		update_qos_request(DEV_PM_QOS_MAX_FREQUENCY);
>  
>  	mutex_unlock(&intel_pstate_driver_lock);
>  
> @@ -1139,7 +1184,10 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct kobj_attribute *b,
>  
>  	mutex_unlock(&intel_pstate_limits_lock);
>  
> -	intel_pstate_update_policies();
> +	if (intel_pstate_driver == &intel_pstate)
> +		intel_pstate_update_policies();
> +	else
> +		update_qos_request(DEV_PM_QOS_MIN_FREQUENCY);
>  
>  	mutex_unlock(&intel_pstate_driver_lock);
>  
> @@ -2332,8 +2380,16 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
>  
>  static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  {
> -	int ret = __intel_pstate_cpu_init(policy);
> +	int max_state, turbo_max, min_freq, max_freq, ret;
> +	struct dev_pm_qos_request *req;
> +	struct cpudata *cpu;
> +	struct device *dev;
> +
> +	dev = get_cpu_device(policy->cpu);
> +	if (!dev)
> +		return -ENODEV;
>  
> +	ret = __intel_pstate_cpu_init(policy);
>  	if (ret)
>  		return ret;
>  
> @@ -2342,7 +2398,63 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  	/* This reflects the intel_pstate_get_cpu_pstates() setting. */
>  	policy->cur = policy->cpuinfo.min_freq;
>  
> +	req = kcalloc(2, sizeof(*req), GFP_KERNEL);
> +	if (!req) {
> +		ret = -ENOMEM;
> +		goto pstate_exit;
> +	}
> +
> +	cpu = all_cpu_data[policy->cpu];
> +
> +	if (hwp_active)
> +		intel_pstate_get_hwp_max(policy->cpu, &turbo_max, &max_state);
> +	else
> +		turbo_max = cpu->pstate.turbo_pstate;
> +
> +	min_freq = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100);
> +	min_freq *= cpu->pstate.scaling;
> +	max_freq = DIV_ROUND_UP(turbo_max * global.max_perf_pct, 100);
> +	max_freq *= cpu->pstate.scaling;
> +
> +	ret = dev_pm_qos_add_request(dev, req, DEV_PM_QOS_MIN_FREQUENCY,
> +				     min_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
> +		goto free_req;
> +	}
> +
> +	ret = dev_pm_qos_add_request(dev, req + 1, DEV_PM_QOS_MAX_FREQUENCY,
> +				     max_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
> +		goto remove_min_req;
> +	}
> +
> +	policy->driver_data = req;
> +
>  	return 0;
> +
> +remove_min_req:
> +	dev_pm_qos_remove_request(req);
> +free_req:
> +	kfree(req);
> +pstate_exit:
> +	intel_pstate_exit_perf_limits(policy);
> +
> +	return ret;
> +}
> +
> +static int intel_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> +{
> +	struct dev_pm_qos_request *req;
> +
> +	req = policy->driver_data;
> +
> +	dev_pm_qos_remove_request(req + 1);
> +	dev_pm_qos_remove_request(req);
> +	kfree(req);
> +
> +	return intel_pstate_cpu_exit(policy);
>  }
>  
>  static struct cpufreq_driver intel_cpufreq = {
> @@ -2351,7 +2463,7 @@ static struct cpufreq_driver intel_cpufreq = {
>  	.target		= intel_cpufreq_target,
>  	.fast_switch	= intel_cpufreq_fast_switch,
>  	.init		= intel_cpufreq_cpu_init,
> -	.exit		= intel_pstate_cpu_exit,
> +	.exit		= intel_cpufreq_cpu_exit,
>  	.stop_cpu	= intel_cpufreq_stop_cpu,
>  	.update_limits	= intel_pstate_update_limits,
>  	.name		= "intel_cpufreq",
> -- 
> 2.21.0.rc0.269.g1a574e7a288b



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
  2019-08-08 16:25   ` Doug Smythies
@ 2019-08-08 16:46     ` Rafael J. Wysocki
  2019-08-09  2:16     ` Viresh Kumar
  1 sibling, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 16:46 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Viresh Kumar, Linux PM, Vincent Guittot,
	Linux Kernel Mailing List, Rafael Wysocki, Srinivas Pandruvada,
	Len Brown

, On Thu, Aug 8, 2019 at 6:25 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2019.08.07 00:06 Viresh Kumar wrote:
>
> Thanks for your work on this.
>
> > Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
> > which can be used to force a limit on the min/max P state of the driver.
> > Though these files eventually control the min/max frequencies that the
> > CPUs will run at, they don't make a change to policy->min/max values.
> >
> > When the values of these files are changed (in passive mode of the
> > driver), it leads to calling ->limits() callback of the cpufreq
> > governors, like schedutil. On a call to it the governors shall
> > forcefully update the frequency to come within the limits. Since the
> > limits, i.e.  policy->min/max, aren't updated by the driver, the
> > governors fails to get the target freq within limit and sometimes aborts
> > the update believing that the frequency is already set to the target
> > value.
> >
> > This patch implements the QoS supported frequency constraints to update
> > policy->min/max values whenever min_perf_pct or max_perf_pct files are
> > updated. This is only done for the passive mode as of now, as the driver
> > is already working fine in active mode.
> >
> > Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
> > Reported-by: Doug Smythies <dsmythies@telus.net>
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>
> Tested by: Doug Smythies <dsmythies@telus.net>
> Thermald seems to now be working O.K. for all the governors.
>
> I do note that if one sets
> /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
> It seems to override subsequent attempts via
> /sys/devices/system/cpu/intel_pstate/max_perf_pct.
> Myself, I find this confusing.
>
> So the question becomes which one is the "master"?

For the min freq limit, the max of (scaling_min_freq, re-scaled
min_perf_pct) should be used.

For the max freq limit, the min of (scaling_max_freq, re-scaled
max_perf_pct) should be used.

Overall, the setting that "wins" is the one that causes the set of
available frequencies to be narrower.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
  2019-08-08 16:25   ` Doug Smythies
  2019-08-08 16:46     ` Rafael J. Wysocki
@ 2019-08-09  2:16     ` Viresh Kumar
  2019-08-09  6:35       ` Doug Smythies
  1 sibling, 1 reply; 10+ messages in thread
From: Viresh Kumar @ 2019-08-09  2:16 UTC (permalink / raw)
  To: Doug Smythies
  Cc: linux-pm, 'Vincent Guittot',
	linux-kernel, 'Rafael Wysocki',
	'Srinivas Pandruvada', 'Len Brown'

On 08-08-19, 09:25, Doug Smythies wrote:
> On 2019.08.07 00:06 Viresh Kumar wrote:
> Tested by: Doug Smythies <dsmythies@telus.net>
> Thermald seems to now be working O.K. for all the governors.

Thanks for testing Doug.

> I do note that if one sets
> /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
> It seems to override subsequent attempts via
> /sys/devices/system/cpu/intel_pstate/max_perf_pct.
> Myself, I find this confusing.
> 
> So the question becomes which one is the "master"?

No one is master, cpufreq takes all the requests for frequency
constraints and tries to set the value based on aggregation of all. So
for max frequency, the lowest value wins and is shown up in sysfs.

So, everything looks okay to me.

> > +static void update_qos_request(enum dev_pm_qos_req_type type)
> > +{
> > +	int max_state, turbo_max, freq, i, perf_pct;
> > +	struct dev_pm_qos_request *req;
> > +	struct cpufreq_policy *policy;
> > +
> > +	for_each_possible_cpu(i) {
> > +		struct cpudata *cpu = all_cpu_data[i];
> > +
> > +		policy = cpufreq_cpu_get(i);
> > +		if (!policy)
> > +			continue;
> > +
> > +		req = policy->driver_data;
> > +		cpufreq_cpu_put(policy);
> > +
> > +		if (!req)
> > +			continue;
> > +
> > +		if (hwp_active)
> > +			intel_pstate_get_hwp_max(i, &turbo_max, &max_state);
> > +		else
> > +			turbo_max = cpu->pstate.turbo_pstate;
> > +
> > +		if (type == DEV_PM_QOS_MIN_FREQUENCY) {
> 
> Is it O.K. to assume if the passed op code is
> not DEV_PM_QOS_MIN_FREQUENCY
> then it must have been
> DEV_PM_QOS_MAX_FREQUENCY
> ?
> 
> It is within this patch, but what about in future?

Yes, because it is called locally there is no need to add another if
statement here. And reviews should catch it in future and I don't
expect it to change much anyway.

> > +			perf_pct = global.min_perf_pct;
> > +		} else {
> > +			req++;
> > +			perf_pct = global.max_perf_pct;
> > +		}
> > +
> > +		freq = DIV_ROUND_UP(turbo_max * perf_pct, 100);
> > +		freq *= cpu->pstate.scaling;
> > +
> > +		if (dev_pm_qos_update_request(req, freq))
> > +			pr_warn("Failed to update freq constraint: CPU%d\n", i);
> 
> I get many of these messages (4520 so far, always in groups of 8 (I have 8 CPUs)),
> and have yet to figure out exactly why. It seems to actually be working fine.

Because of something I missed. dev_pm_qos_update_request() can return
1, when the constraint value gets changed. Will fix this patch.

-- 
viresh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH V5 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
  2019-08-07  7:06 ` [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints Viresh Kumar
  2019-08-08 16:25   ` Doug Smythies
@ 2019-08-09  2:22   ` Viresh Kumar
  2019-08-09  5:48     ` Doug Smythies
  2019-08-26  9:18     ` Rafael J. Wysocki
  1 sibling, 2 replies; 10+ messages in thread
From: Viresh Kumar @ 2019-08-09  2:22 UTC (permalink / raw)
  To: Rafael Wysocki, Srinivas Pandruvada, Len Brown, Viresh Kumar
  Cc: linux-pm, Vincent Guittot, Doug Smythies, linux-kernel

Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
which can be used to force a limit on the min/max P state of the driver.
Though these files eventually control the min/max frequencies that the
CPUs will run at, they don't make a change to policy->min/max values.

When the values of these files are changed (in passive mode of the
driver), it leads to calling ->limits() callback of the cpufreq
governors, like schedutil. On a call to it the governors shall
forcefully update the frequency to come within the limits. Since the
limits, i.e.  policy->min/max, aren't updated by the driver, the
governors fails to get the target freq within limit and sometimes aborts
the update believing that the frequency is already set to the target
value.

This patch implements the QoS supported frequency constraints to update
policy->min/max values whenever min_perf_pct or max_perf_pct files are
updated. This is only done for the passive mode as of now, as the driver
is already working fine in active mode.

Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
V4->V5:
- dev_pm_qos_update_request() can return 1 in case of success, handle
  that.

 drivers/cpufreq/intel_pstate.c | 120 +++++++++++++++++++++++++++++++--
 1 file changed, 116 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index cc27d4c59dca..32f27563613b 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -24,6 +24,7 @@
 #include <linux/fs.h>
 #include <linux/acpi.h>
 #include <linux/vmalloc.h>
+#include <linux/pm_qos.h>
 #include <trace/events/power.h>
 
 #include <asm/div64.h>
@@ -1085,6 +1086,47 @@ static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b,
 	return count;
 }
 
+static struct cpufreq_driver intel_pstate;
+
+static void update_qos_request(enum dev_pm_qos_req_type type)
+{
+	int max_state, turbo_max, freq, i, perf_pct;
+	struct dev_pm_qos_request *req;
+	struct cpufreq_policy *policy;
+
+	for_each_possible_cpu(i) {
+		struct cpudata *cpu = all_cpu_data[i];
+
+		policy = cpufreq_cpu_get(i);
+		if (!policy)
+			continue;
+
+		req = policy->driver_data;
+		cpufreq_cpu_put(policy);
+
+		if (!req)
+			continue;
+
+		if (hwp_active)
+			intel_pstate_get_hwp_max(i, &turbo_max, &max_state);
+		else
+			turbo_max = cpu->pstate.turbo_pstate;
+
+		if (type == DEV_PM_QOS_MIN_FREQUENCY) {
+			perf_pct = global.min_perf_pct;
+		} else {
+			req++;
+			perf_pct = global.max_perf_pct;
+		}
+
+		freq = DIV_ROUND_UP(turbo_max * perf_pct, 100);
+		freq *= cpu->pstate.scaling;
+
+		if (dev_pm_qos_update_request(req, freq) < 0)
+			pr_warn("Failed to update freq constraint: CPU%d\n", i);
+	}
+}
+
 static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
 				  const char *buf, size_t count)
 {
@@ -1108,7 +1150,10 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
 
 	mutex_unlock(&intel_pstate_limits_lock);
 
-	intel_pstate_update_policies();
+	if (intel_pstate_driver == &intel_pstate)
+		intel_pstate_update_policies();
+	else
+		update_qos_request(DEV_PM_QOS_MAX_FREQUENCY);
 
 	mutex_unlock(&intel_pstate_driver_lock);
 
@@ -1139,7 +1184,10 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct kobj_attribute *b,
 
 	mutex_unlock(&intel_pstate_limits_lock);
 
-	intel_pstate_update_policies();
+	if (intel_pstate_driver == &intel_pstate)
+		intel_pstate_update_policies();
+	else
+		update_qos_request(DEV_PM_QOS_MIN_FREQUENCY);
 
 	mutex_unlock(&intel_pstate_driver_lock);
 
@@ -2332,8 +2380,16 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
 
 static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
-	int ret = __intel_pstate_cpu_init(policy);
+	int max_state, turbo_max, min_freq, max_freq, ret;
+	struct dev_pm_qos_request *req;
+	struct cpudata *cpu;
+	struct device *dev;
+
+	dev = get_cpu_device(policy->cpu);
+	if (!dev)
+		return -ENODEV;
 
+	ret = __intel_pstate_cpu_init(policy);
 	if (ret)
 		return ret;
 
@@ -2342,7 +2398,63 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	/* This reflects the intel_pstate_get_cpu_pstates() setting. */
 	policy->cur = policy->cpuinfo.min_freq;
 
+	req = kcalloc(2, sizeof(*req), GFP_KERNEL);
+	if (!req) {
+		ret = -ENOMEM;
+		goto pstate_exit;
+	}
+
+	cpu = all_cpu_data[policy->cpu];
+
+	if (hwp_active)
+		intel_pstate_get_hwp_max(policy->cpu, &turbo_max, &max_state);
+	else
+		turbo_max = cpu->pstate.turbo_pstate;
+
+	min_freq = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100);
+	min_freq *= cpu->pstate.scaling;
+	max_freq = DIV_ROUND_UP(turbo_max * global.max_perf_pct, 100);
+	max_freq *= cpu->pstate.scaling;
+
+	ret = dev_pm_qos_add_request(dev, req, DEV_PM_QOS_MIN_FREQUENCY,
+				     min_freq);
+	if (ret < 0) {
+		dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
+		goto free_req;
+	}
+
+	ret = dev_pm_qos_add_request(dev, req + 1, DEV_PM_QOS_MAX_FREQUENCY,
+				     max_freq);
+	if (ret < 0) {
+		dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
+		goto remove_min_req;
+	}
+
+	policy->driver_data = req;
+
 	return 0;
+
+remove_min_req:
+	dev_pm_qos_remove_request(req);
+free_req:
+	kfree(req);
+pstate_exit:
+	intel_pstate_exit_perf_limits(policy);
+
+	return ret;
+}
+
+static int intel_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+	struct dev_pm_qos_request *req;
+
+	req = policy->driver_data;
+
+	dev_pm_qos_remove_request(req + 1);
+	dev_pm_qos_remove_request(req);
+	kfree(req);
+
+	return intel_pstate_cpu_exit(policy);
 }
 
 static struct cpufreq_driver intel_cpufreq = {
@@ -2351,7 +2463,7 @@ static struct cpufreq_driver intel_cpufreq = {
 	.target		= intel_cpufreq_target,
 	.fast_switch	= intel_cpufreq_fast_switch,
 	.init		= intel_cpufreq_cpu_init,
-	.exit		= intel_pstate_cpu_exit,
+	.exit		= intel_cpufreq_cpu_exit,
 	.stop_cpu	= intel_cpufreq_stop_cpu,
 	.update_limits	= intel_pstate_update_limits,
 	.name		= "intel_cpufreq",
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH V5 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
  2019-08-09  2:22   ` [PATCH V5 " Viresh Kumar
@ 2019-08-09  5:48     ` Doug Smythies
  2019-08-26  9:18     ` Rafael J. Wysocki
  1 sibling, 0 replies; 10+ messages in thread
From: Doug Smythies @ 2019-08-09  5:48 UTC (permalink / raw)
  To: 'Viresh Kumar'
  Cc: linux-pm, 'Vincent Guittot',
	linux-kernel, 'Rafael Wysocki',
	'Srinivas Pandruvada', 'Len Brown'

On 2019.08.08 19:23 Viresh Kumar wrote:
> ---
> V4->V5:
> - dev_pm_qos_update_request() can return 1 in case of success, handle
>   that.

O.K. thanks,
That fixes the "Fail" messages I was getting with V4.

... Doug



^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
  2019-08-09  2:16     ` Viresh Kumar
@ 2019-08-09  6:35       ` Doug Smythies
  2019-08-09  6:51         ` Viresh Kumar
  0 siblings, 1 reply; 10+ messages in thread
From: Doug Smythies @ 2019-08-09  6:35 UTC (permalink / raw)
  To: 'Viresh Kumar'
  Cc: linux-pm, 'Vincent Guittot',
	linux-kernel, 'Rafael Wysocki',
	'Srinivas Pandruvada', 'Len Brown'

On 2019.08.08 19:16 Viresh Kumar wrote:
> On 08-08-19, 09:25, Doug Smythies wrote:
>> On 2019.08.07 00:06 Viresh Kumar wrote:
>> Tested by: Doug Smythies <dsmythies@telus.net>
>> Thermald seems to now be working O.K. for all the governors.
>
> Thanks for testing Doug.
> 
>> I do note that if one sets
>> /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
>> It seems to override subsequent attempts via
>> /sys/devices/system/cpu/intel_pstate/max_perf_pct.
>> Myself, I find this confusing.
>> 
>> So the question becomes which one is the "master"?
>
> No one is master, cpufreq takes all the requests for frequency
> constraints and tries to set the value based on aggregation of all. So
> for max frequency, the lowest value wins and is shown up in sysfs.
>
> So, everything looks okay to me.

O.K. While I understand the explanations, I still struggle with
this scenario:
 
doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
50    <<< Note: 50% = 1.9 GHz in my system)
doug@s15:~/temp$ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
/sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq:1900000
/sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq:1900000
/sys/devices/system/cpu/cpufreq/policy2/scaling_max_freq:1900000
/sys/devices/system/cpu/cpufreq/policy3/scaling_max_freq:1900000
/sys/devices/system/cpu/cpufreq/policy4/scaling_max_freq:1900000
/sys/devices/system/cpu/cpufreq/policy5/scaling_max_freq:1900000
/sys/devices/system/cpu/cpufreq/policy6/scaling_max_freq:1900000
/sys/devices/system/cpu/cpufreq/policy7/scaling_max_freq:1900000

At this point I am not certain what I'll get if I try to
set max_perf_pct to 100%, nor do I know how to find out
with a user command.

So, I'll try it:

doug@s15:~/temp$ echo 100 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
100
doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100  <<< Note: 100% = 3.8 GHz in my system)
doug@s15:~/temp$ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
/sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq:2200000
/sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq:2200000
/sys/devices/system/cpu/cpufreq/policy2/scaling_max_freq:2200000
/sys/devices/system/cpu/cpufreq/policy3/scaling_max_freq:2200000
/sys/devices/system/cpu/cpufreq/policy4/scaling_max_freq:2200000
/sys/devices/system/cpu/cpufreq/policy5/scaling_max_freq:2200000
/sys/devices/system/cpu/cpufreq/policy6/scaling_max_freq:2200000
/sys/devices/system/cpu/cpufreq/policy7/scaling_max_freq:2200000

I guess I had set it sometime earlier, forgot, and then didn't
get 3.8 Ghz as I had expected via max_perf_pct.

... Doug



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
  2019-08-09  6:35       ` Doug Smythies
@ 2019-08-09  6:51         ` Viresh Kumar
  0 siblings, 0 replies; 10+ messages in thread
From: Viresh Kumar @ 2019-08-09  6:51 UTC (permalink / raw)
  To: Doug Smythies
  Cc: linux-pm, 'Vincent Guittot',
	linux-kernel, 'Rafael Wysocki',
	'Srinivas Pandruvada', 'Len Brown'

On 08-08-19, 23:35, Doug Smythies wrote:
> O.K. While I understand the explanations, I still struggle with
> this scenario:
>  
> doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> 50    <<< Note: 50% = 1.9 GHz in my system)
> doug@s15:~/temp$ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
> /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq:1900000
> /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq:1900000
> /sys/devices/system/cpu/cpufreq/policy2/scaling_max_freq:1900000
> /sys/devices/system/cpu/cpufreq/policy3/scaling_max_freq:1900000
> /sys/devices/system/cpu/cpufreq/policy4/scaling_max_freq:1900000
> /sys/devices/system/cpu/cpufreq/policy5/scaling_max_freq:1900000
> /sys/devices/system/cpu/cpufreq/policy6/scaling_max_freq:1900000
> /sys/devices/system/cpu/cpufreq/policy7/scaling_max_freq:1900000
> 
> At this point I am not certain what I'll get if I try to
> set max_perf_pct to 100%, nor do I know how to find out
> with a user command.
> 
> So, I'll try it:
> 
> doug@s15:~/temp$ echo 100 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
> 100
> doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> 100  <<< Note: 100% = 3.8 GHz in my system)
> doug@s15:~/temp$ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
> /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq:2200000
> /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq:2200000
> /sys/devices/system/cpu/cpufreq/policy2/scaling_max_freq:2200000
> /sys/devices/system/cpu/cpufreq/policy3/scaling_max_freq:2200000
> /sys/devices/system/cpu/cpufreq/policy4/scaling_max_freq:2200000
> /sys/devices/system/cpu/cpufreq/policy5/scaling_max_freq:2200000
> /sys/devices/system/cpu/cpufreq/policy6/scaling_max_freq:2200000
> /sys/devices/system/cpu/cpufreq/policy7/scaling_max_freq:2200000
> 
> I guess I had set it sometime earlier, forgot, and then didn't
> get 3.8 Ghz as I had expected via max_perf_pct.

Right, so the problem here (since ages) is that there is no file in
sysfs to show the value earlier set by the user to
scaling_max/min_freq. Rather when you try to read those files, cpufreq
core gives you the current value of policy->min/max, which can come
from so many factors, like userspace, thermal, max_perf_pct, etc.

You saw 2200000 here because you must have set that value to
scaling_max_freq earlier, else maybe thermal or other stuff
constrained that value for you.

Though the cpufreq core stores the value set by user and uses it to
recalculate policy->max using it. To get rid of your constraint, set
scaling_max_freq to the highest frequency and you will be good.

-- 
viresh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V5 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints
  2019-08-09  2:22   ` [PATCH V5 " Viresh Kumar
  2019-08-09  5:48     ` Doug Smythies
@ 2019-08-26  9:18     ` Rafael J. Wysocki
  1 sibling, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2019-08-26  9:18 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Srinivas Pandruvada, Len Brown, linux-pm, Vincent Guittot,
	Doug Smythies, linux-kernel

On Friday, August 9, 2019 4:22:49 AM CEST Viresh Kumar wrote:
> Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
> which can be used to force a limit on the min/max P state of the driver.
> Though these files eventually control the min/max frequencies that the
> CPUs will run at, they don't make a change to policy->min/max values.
> 
> When the values of these files are changed (in passive mode of the
> driver), it leads to calling ->limits() callback of the cpufreq
> governors, like schedutil. On a call to it the governors shall
> forcefully update the frequency to come within the limits. Since the
> limits, i.e.  policy->min/max, aren't updated by the driver, the
> governors fails to get the target freq within limit and sometimes aborts
> the update believing that the frequency is already set to the target
> value.
> 
> This patch implements the QoS supported frequency constraints to update
> policy->min/max values whenever min_perf_pct or max_perf_pct files are
> updated. This is only done for the passive mode as of now, as the driver
> is already working fine in active mode.
> 
> Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
> Reported-by: Doug Smythies <dsmythies@telus.net>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> V4->V5:
> - dev_pm_qos_update_request() can return 1 in case of success, handle
>   that.
> 
>  drivers/cpufreq/intel_pstate.c | 120 +++++++++++++++++++++++++++++++--
>  1 file changed, 116 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index cc27d4c59dca..32f27563613b 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -24,6 +24,7 @@
>  #include <linux/fs.h>
>  #include <linux/acpi.h>
>  #include <linux/vmalloc.h>
> +#include <linux/pm_qos.h>
>  #include <trace/events/power.h>
>  
>  #include <asm/div64.h>
> @@ -1085,6 +1086,47 @@ static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b,
>  	return count;
>  }
>  
> +static struct cpufreq_driver intel_pstate;
> +
> +static void update_qos_request(enum dev_pm_qos_req_type type)
> +{
> +	int max_state, turbo_max, freq, i, perf_pct;
> +	struct dev_pm_qos_request *req;
> +	struct cpufreq_policy *policy;
> +
> +	for_each_possible_cpu(i) {
> +		struct cpudata *cpu = all_cpu_data[i];
> +
> +		policy = cpufreq_cpu_get(i);
> +		if (!policy)
> +			continue;
> +
> +		req = policy->driver_data;
> +		cpufreq_cpu_put(policy);
> +
> +		if (!req)
> +			continue;
> +
> +		if (hwp_active)
> +			intel_pstate_get_hwp_max(i, &turbo_max, &max_state);
> +		else
> +			turbo_max = cpu->pstate.turbo_pstate;
> +
> +		if (type == DEV_PM_QOS_MIN_FREQUENCY) {
> +			perf_pct = global.min_perf_pct;
> +		} else {
> +			req++;
> +			perf_pct = global.max_perf_pct;
> +		}
> +
> +		freq = DIV_ROUND_UP(turbo_max * perf_pct, 100);
> +		freq *= cpu->pstate.scaling;
> +
> +		if (dev_pm_qos_update_request(req, freq) < 0)
> +			pr_warn("Failed to update freq constraint: CPU%d\n", i);
> +	}
> +}
> +
>  static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
>  				  const char *buf, size_t count)
>  {
> @@ -1108,7 +1150,10 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b,
>  
>  	mutex_unlock(&intel_pstate_limits_lock);
>  
> -	intel_pstate_update_policies();
> +	if (intel_pstate_driver == &intel_pstate)
> +		intel_pstate_update_policies();
> +	else
> +		update_qos_request(DEV_PM_QOS_MAX_FREQUENCY);
>  
>  	mutex_unlock(&intel_pstate_driver_lock);
>  
> @@ -1139,7 +1184,10 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct kobj_attribute *b,
>  
>  	mutex_unlock(&intel_pstate_limits_lock);
>  
> -	intel_pstate_update_policies();
> +	if (intel_pstate_driver == &intel_pstate)
> +		intel_pstate_update_policies();
> +	else
> +		update_qos_request(DEV_PM_QOS_MIN_FREQUENCY);
>  
>  	mutex_unlock(&intel_pstate_driver_lock);
>  
> @@ -2332,8 +2380,16 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
>  
>  static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  {
> -	int ret = __intel_pstate_cpu_init(policy);
> +	int max_state, turbo_max, min_freq, max_freq, ret;
> +	struct dev_pm_qos_request *req;
> +	struct cpudata *cpu;
> +	struct device *dev;
> +
> +	dev = get_cpu_device(policy->cpu);
> +	if (!dev)
> +		return -ENODEV;
>  
> +	ret = __intel_pstate_cpu_init(policy);
>  	if (ret)
>  		return ret;
>  
> @@ -2342,7 +2398,63 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  	/* This reflects the intel_pstate_get_cpu_pstates() setting. */
>  	policy->cur = policy->cpuinfo.min_freq;
>  
> +	req = kcalloc(2, sizeof(*req), GFP_KERNEL);
> +	if (!req) {
> +		ret = -ENOMEM;
> +		goto pstate_exit;
> +	}
> +
> +	cpu = all_cpu_data[policy->cpu];
> +
> +	if (hwp_active)
> +		intel_pstate_get_hwp_max(policy->cpu, &turbo_max, &max_state);
> +	else
> +		turbo_max = cpu->pstate.turbo_pstate;
> +
> +	min_freq = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100);
> +	min_freq *= cpu->pstate.scaling;
> +	max_freq = DIV_ROUND_UP(turbo_max * global.max_perf_pct, 100);
> +	max_freq *= cpu->pstate.scaling;
> +
> +	ret = dev_pm_qos_add_request(dev, req, DEV_PM_QOS_MIN_FREQUENCY,
> +				     min_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
> +		goto free_req;
> +	}
> +
> +	ret = dev_pm_qos_add_request(dev, req + 1, DEV_PM_QOS_MAX_FREQUENCY,
> +				     max_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
> +		goto remove_min_req;
> +	}
> +
> +	policy->driver_data = req;
> +
>  	return 0;
> +
> +remove_min_req:
> +	dev_pm_qos_remove_request(req);
> +free_req:
> +	kfree(req);
> +pstate_exit:
> +	intel_pstate_exit_perf_limits(policy);
> +
> +	return ret;
> +}
> +
> +static int intel_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> +{
> +	struct dev_pm_qos_request *req;
> +
> +	req = policy->driver_data;
> +
> +	dev_pm_qos_remove_request(req + 1);
> +	dev_pm_qos_remove_request(req);
> +	kfree(req);
> +
> +	return intel_pstate_cpu_exit(policy);
>  }
>  
>  static struct cpufreq_driver intel_cpufreq = {
> @@ -2351,7 +2463,7 @@ static struct cpufreq_driver intel_cpufreq = {
>  	.target		= intel_cpufreq_target,
>  	.fast_switch	= intel_cpufreq_fast_switch,
>  	.init		= intel_cpufreq_cpu_init,
> -	.exit		= intel_pstate_cpu_exit,
> +	.exit		= intel_cpufreq_cpu_exit,
>  	.stop_cpu	= intel_cpufreq_stop_cpu,
>  	.update_limits	= intel_pstate_update_limits,
>  	.name		= "intel_cpufreq",
> 

Applied, thanks!





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-26  9:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-07  7:06 [PATCH V4 1/2] cpufreq: schedutil: Don't skip freq update when limits change Viresh Kumar
2019-08-07  7:06 ` [PATCH V4 2/2] cpufreq: intel_pstate: Implement QoS supported freq constraints Viresh Kumar
2019-08-08 16:25   ` Doug Smythies
2019-08-08 16:46     ` Rafael J. Wysocki
2019-08-09  2:16     ` Viresh Kumar
2019-08-09  6:35       ` Doug Smythies
2019-08-09  6:51         ` Viresh Kumar
2019-08-09  2:22   ` [PATCH V5 " Viresh Kumar
2019-08-09  5:48     ` Doug Smythies
2019-08-26  9:18     ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).