* [PATCH 0/2] cpufreq: schedutil: Fix and optimization
@ 2017-03-19 13:21 Rafael J. Wysocki
2017-03-19 13:30 ` [PATCH 1/2] cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start() Rafael J. Wysocki
2017-03-19 13:34 ` [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Rafael J. Wysocki
0 siblings, 2 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-19 13:21 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
Juri Lelli, Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
Hi,
Patch [1/2] in this series fixes an initialization issue in schedutil.
Patch [2/2] is an optimization, but it also might be regarded as a fix or at
least a problem workaround.
Namely, while playing with the passive mode in intel_pstate lately I noticed
that schedutil evidently underestimated CPU utilization at high loads. I guess
it has been known for a while, but this is the first time I've seen that so
clearly (actually by looking at the values written to the P-state request
register and not via frequencies).
After some investigation I can quite confidently attribute that to a PELT
metric issue, as everything else has been ruled out with high confidence,
and hence the patch (which also works around the problem adding even more
condifence to the observation that PELT underestimates CPU utilization at
least sometimes).
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* [PATCH 1/2] cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
2017-03-19 13:21 [PATCH 0/2] cpufreq: schedutil: Fix and optimization Rafael J. Wysocki
@ 2017-03-19 13:30 ` Rafael J. Wysocki
2017-03-20 3:28 ` Viresh Kumar
2017-03-19 13:34 ` [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Rafael J. Wysocki
1 sibling, 1 reply; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-19 13:30 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
Juri Lelli, Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
sugov_start() only initializes struct sugov_cpu per-CPU structures
for shared policies, but it should do that for single-CPU policies too.
That in particular makes the IO-wait boost mechanism work in the
cases when cpufreq policies correspond to individual CPUs.
Fixes: 21ca6d2c52f8 (cpufreq: schedutil: Add iowait boosting)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
kernel/sched/cpufreq_schedutil.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -577,20 +577,14 @@ static int sugov_start(struct cpufreq_po
for_each_cpu(cpu, policy->cpus) {
struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu);
+ memset(sg_cpu, 0, sizeof(*sg_cpu));
sg_cpu->sg_policy = sg_policy;
- if (policy_is_shared(policy)) {
- sg_cpu->util = 0;
- sg_cpu->max = 0;
- sg_cpu->flags = SCHED_CPUFREQ_RT;
- sg_cpu->last_update = 0;
- sg_cpu->iowait_boost = 0;
- sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
- cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
- sugov_update_shared);
- } else {
- cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
- sugov_update_single);
- }
+ sg_cpu->flags = SCHED_CPUFREQ_RT;
+ sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
+ cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
+ policy_is_shared(policy) ?
+ sugov_update_shared :
+ sugov_update_single);
}
return 0;
}
^ permalink raw reply [flat|nested] 71+ messages in thread
* [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-19 13:21 [PATCH 0/2] cpufreq: schedutil: Fix and optimization Rafael J. Wysocki
2017-03-19 13:30 ` [PATCH 1/2] cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start() Rafael J. Wysocki
@ 2017-03-19 13:34 ` Rafael J. Wysocki
2017-03-19 21:24 ` Rafael J. Wysocki
` (3 more replies)
1 sibling, 4 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-19 13:34 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
Juri Lelli, Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The PELT metric used by the schedutil governor underestimates the
CPU utilization in some cases. The reason for that may be time spent
in interrupt handlers and similar which is not accounted for by PELT.
That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register. Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again. That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.
To work around this issue use the observation that, from the
schedutil governor's perspective, CPUs that are never idle should
always run at the maximum frequency and make that happen.
To that end, add a counter of idle calls to struct sugov_cpu and
modify cpuidle_idle_call() to increment that counter every time it
is about to put the given CPU into an idle state. Next, make the
schedutil governor look at that counter for the current CPU every
time before it is about to start heavy computations. If the counter
has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
the CPU has not been idle for at least that long and the governor
will choose the maximum frequency for it without looking at the PELT
metric at all.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
include/linux/sched/cpufreq.h | 6 ++++++
kernel/sched/cpufreq_schedutil.c | 38 ++++++++++++++++++++++++++++++++++++--
kernel/sched/idle.c | 3 +++
3 files changed, 45 insertions(+), 2 deletions(-)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -20,6 +20,7 @@
#include "sched.h"
#define SUGOV_KTHREAD_PRIORITY 50
+#define SUGOV_BUSY_THRESHOLD (50 * NSEC_PER_MSEC)
struct sugov_tunables {
struct gov_attr_set attr_set;
@@ -55,6 +56,9 @@ struct sugov_cpu {
unsigned long iowait_boost;
unsigned long iowait_boost_max;
+ unsigned long idle_calls;
+ unsigned long saved_idle_calls;
+ u64 busy_start;
u64 last_update;
/* The fields below are only needed when sharing a policy. */
@@ -192,6 +196,34 @@ static void sugov_iowait_boost(struct su
sg_cpu->iowait_boost >>= 1;
}
+void cpufreq_schedutil_idle_call(void)
+{
+ struct sugov_cpu *sg_cpu = this_cpu_ptr(&sugov_cpu);
+
+ sg_cpu->idle_calls++;
+}
+
+static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
+{
+ if (sg_cpu->idle_calls != sg_cpu->saved_idle_calls) {
+ sg_cpu->busy_start = 0;
+ return false;
+ }
+
+ if (!sg_cpu->busy_start) {
+ sg_cpu->busy_start = sg_cpu->last_update;
+ return false;
+ }
+
+ return sg_cpu->last_update - sg_cpu->busy_start > SUGOV_BUSY_THRESHOLD;
+}
+
+static void sugov_save_idle_calls(struct sugov_cpu *sg_cpu)
+{
+ if (!sg_cpu->busy_start)
+ sg_cpu->saved_idle_calls = sg_cpu->idle_calls;
+}
+
static void sugov_update_single(struct update_util_data *hook, u64 time,
unsigned int flags)
{
@@ -207,7 +239,7 @@ static void sugov_update_single(struct u
if (!sugov_should_update_freq(sg_policy, time))
return;
- if (flags & SCHED_CPUFREQ_RT_DL) {
+ if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu)) {
next_f = policy->cpuinfo.max_freq;
} else {
sugov_get_util(&util, &max);
@@ -215,6 +247,7 @@ static void sugov_update_single(struct u
next_f = get_next_freq(sg_policy, util, max);
}
sugov_update_commit(sg_policy, time, next_f);
+ sugov_save_idle_calls(sg_cpu);
}
static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
@@ -278,12 +311,13 @@ static void sugov_update_shared(struct u
sg_cpu->last_update = time;
if (sugov_should_update_freq(sg_policy, time)) {
- if (flags & SCHED_CPUFREQ_RT_DL)
+ if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu))
next_f = sg_policy->policy->cpuinfo.max_freq;
else
next_f = sugov_next_freq_shared(sg_cpu);
sugov_update_commit(sg_policy, time, next_f);
+ sugov_save_idle_calls(sg_cpu);
}
raw_spin_unlock(&sg_policy->update_lock);
Index: linux-pm/kernel/sched/idle.c
===================================================================
--- linux-pm.orig/kernel/sched/idle.c
+++ linux-pm/kernel/sched/idle.c
@@ -151,6 +151,9 @@ static void cpuidle_idle_call(void)
*/
rcu_idle_enter();
+ /* Notify the schedutil cpufreq governor that we are entering idle. */
+ cpufreq_schedutil_idle_call();
+
if (cpuidle_not_available(drv, dev)) {
default_idle_call();
goto exit_idle;
Index: linux-pm/include/linux/sched/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/sched/cpufreq.h
+++ linux-pm/include/linux/sched/cpufreq.h
@@ -24,4 +24,10 @@ void cpufreq_add_update_util_hook(int cp
void cpufreq_remove_update_util_hook(int cpu);
#endif /* CONFIG_CPU_FREQ */
+#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
+extern void cpufreq_schedutil_idle_call(void);
+#else
+static inline void cpufreq_schedutil_idle_call(void) {}
+#endif /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
+
#endif /* _LINUX_SCHED_CPUFREQ_H */
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-19 13:34 ` [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Rafael J. Wysocki
@ 2017-03-19 21:24 ` Rafael J. Wysocki
2017-03-19 21:42 ` Rafael J. Wysocki
2017-03-20 10:38 ` Peter Zijlstra
2017-03-20 3:57 ` Viresh Kumar
` (2 subsequent siblings)
3 siblings, 2 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-19 21:24 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
Juri Lelli, Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
[-- Attachment #1: Type: text/plain, Size: 3957 bytes --]
On Sunday, March 19, 2017 02:34:32 PM Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The PELT metric used by the schedutil governor underestimates the
> CPU utilization in some cases. The reason for that may be time spent
> in interrupt handlers and similar which is not accounted for by PELT.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
In case you are wondering about the actual numbers, attached are two turbostat
log files from two runs of the same workload, without (before.txt) and with (after.txt)
the patch applied.
The workload is essentially "make -j 5" in the kernel source tree and the
machine has an SSD storage and a quad-core Intel Sandy Bridge processor.
The P-states available for each core are between 8 and 31 (0x1f) corresponding
to 800 MHz and 3.1 GHz, respectively. All cores can run sustainably at 2.9 GHz
at the same time, although that is not a guaranteed sustainable frequency
(it may be dropped occasionally for thermal reasons, for example).
The interesting columns are Bzy_MHz (and specifically the rows with "-" under
CPU that correspond to the entire processor), which is the avreage frequency
between iterations based on the numbers read from feedback registers, and
the rightmost one, which is the values written to the P-state request register
(the 3rd and 4th hex digits from the right represent the requested P-state).
The turbostat data collection ran every 2 seconds and I looked at the last 30
iterations in each case corresponding to about 1 minute of the workload run
during which all of the cores were around 100% busy.
Now, if you look at after.txt (the run with the patch applied), you'll notice that
during those last 30 iterations P-state 31 (0x1f) had been requested on all
cores pretty much 100% of the time (meaning: as expected in that case) and
the average processor frequency (computed by taking the average from
all of the 30 "-" rows) was 2899.33 MHz (apparently, the hardware decided to
drop it from 2.9 GHz occasionally).
In the before.txt case (without the patch) the average frequency over the last
30 iterations was 2896.90 MHz which is about 0.8% slower than with the patch
applied (on the average). That already is quite a measurable difference, but it
would have been much worse if the processor had not coordinated P-states in
hardware (such that if any core requested 31, the processor would pick that
one or close to it for the entire package regardless of the requests from the
other cores).
Namely, if you look at the P-states requested for different cores (during the last
30 iterations of the before.txt run), which essentially is what should be used
according to the governor, the average of them is 27.25 (almost 4 bins lower
than the maximum) and the standard deviation is 6, so it is not like they are a
little off occasionally. At least some of them are way off most of the time.
Honestly, if the processor had been capable of doing per-core P-states, that
would have been a disaster and there are customers who wouldn't look at
schedutil again after being confronted with these numbers.
So this is rather serious.
BTW, both intel_pstate in the active mode and ondemand request 0x1f on
all cores for that workload, like in the after.txt case.
Thanks,
Rafael
[-- Attachment #2: before.txt --]
[-- Type: text/plain, Size: 15917 bytes --]
CPUID(7): No-SGX
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 93 11.26 826 2494 0x00000000
0 61 7.56 808 2494 0x00000800
2 90 10.87 832 2494 0x00000800
1 130 15.79 822 2494 0x00000800
3 90 10.83 837 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 54 6.63 811 2494 0x00000000
0 46 5.68 816 2494 0x00000800
2 55 6.87 804 2494 0x00000800
1 79 9.77 806 2494 0x00000800
3 35 4.20 828 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 305 36.58 835 2494 0x00000000
0 332 39.69 838 2494 0x00000800
2 307 36.95 833 2494 0x00000800
1 248 29.82 832 2494 0x00000800
3 333 39.87 836 2494 0x00000c00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 357 38.08 940 2495 0x00000000
0 424 36.72 1158 2495 0x00000800
2 312 36.27 862 2495 0x00000800
1 407 46.72 874 2494 0x00000800
3 284 32.62 874 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2017 92.64 2183 2494 0x00000000
0 2041 94.65 2162 2494 0x00000800
2 2032 93.65 2175 2494 0x00001200
1 2024 92.33 2197 2494 0x00001f00
3 1972 89.92 2198 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2455 99.21 2480 2494 0x00000000
0 2450 98.95 2482 2494 0x00000900
2 2457 99.29 2480 2494 0x00000900
1 2453 99.11 2481 2494 0x00001000
3 2461 99.49 2479 2494 0x00000d00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2308 99.14 2333 2494 0x00000000
0 2309 99.04 2337 2494 0x00001f00
2 2307 99.12 2333 2494 0x00001200
1 2305 99.03 2333 2494 0x00001600
3 2310 99.39 2329 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2613 99.22 2639 2494 0x00000000
0 2614 99.32 2638 2494 0x00000800
2 2589 98.19 2643 2494 0x00001200
1 2623 99.60 2640 2494 0x00001800
3 2624 99.76 2637 2494 0x00000a00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2687 98.70 2729 2494 0x00000000
0 2700 99.24 2727 2494 0x00000800
2 2707 99.48 2728 2494 0x00001f00
1 2643 97.09 2728 2494 0x00001c00
3 2699 99.01 2733 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.98 2900 2494 0x00000000
0 2892 99.96 2900 2494 0x00001f00
2 2892 99.96 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001b00
2 2893 100.00 2900 2494 0x00001800
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001a00
2 2893 100.00 2900 2494 0x00000d00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001900
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2886 100.00 2893 2494 0x00000000
0 2886 100.00 2893 2494 0x00001f00
2 2886 100.00 2893 2494 0x00001f00
1 2886 100.00 2893 2494 0x00001900
3 2886 100.00 2893 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001b00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001c00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001900
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2890 99.99 2897 2494 0x00000000
0 2890 100.00 2897 2494 0x00001f00
2 2890 100.00 2897 2494 0x00001800
1 2890 100.00 2897 2494 0x00000e00
3 2889 99.97 2897 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001600
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00000a00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2890 100.00 2897 2494 0x00000000
0 2890 100.00 2897 2494 0x00001f00
2 2890 100.00 2897 2494 0x00001f00
1 2890 100.00 2897 2494 0x00001b00
3 2890 100.00 2897 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2870 99.94 2878 2494 0x00000000
0 2872 100.00 2878 2494 0x00001f00
2 2867 99.84 2878 2494 0x00001f00
1 2872 100.00 2878 2494 0x00001f00
3 2870 99.92 2879 2494 0x00001200
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00000f00
2 2892 100.00 2898 2494 0x00001b00
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2891 99.94 2899 2494 0x00000000
0 2893 100.00 2899 2494 0x00001f00
2 2887 99.78 2899 2494 0x00001f00
1 2893 100.00 2899 2494 0x00001f00
3 2893 100.00 2899 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001d00
2 2893 100.00 2900 2494 0x00001c00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2889 100.00 2895 2494 0x00000000
0 2889 100.00 2895 2494 0x00001f00
2 2889 100.00 2895 2494 0x00001600
1 2889 100.00 2895 2494 0x00001c00
3 2889 100.00 2895 2494 0x00000b00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001800
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001600
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001200
3 2892 99.96 2900 2494 0x00001300
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2891 99.99 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2891 99.97 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 99.99 2899 2494 0x00000000
0 2892 100.00 2899 2494 0x00001f00
2 2892 100.00 2899 2494 0x00001100
1 2891 99.96 2899 2494 0x00001f00
3 2892 100.00 2899 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2886 100.00 2893 2494 0x00000000
0 2886 100.00 2893 2494 0x00000f00
2 2886 100.00 2893 2494 0x00000900
1 2886 100.00 2893 2494 0x00001300
3 2886 100.00 2893 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2495 0x00000000
0 2892 100.00 2898 2495 0x00001f00
2 2892 100.00 2898 2495 0x00001700
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001700
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00000d00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2899 2494 0x00000000
0 2892 99.98 2899 2494 0x00001f00
2 2892 100.00 2899 2494 0x00001f00
1 2892 100.00 2899 2494 0x00001f00
3 2892 100.00 2899 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2879 100.00 2885 2494 0x00000000
0 2879 100.00 2885 2494 0x00001500
2 2879 100.00 2885 2494 0x00000900
1 2878 99.99 2885 2494 0x00001f00
3 2879 100.00 2885 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2883 100.00 2889 2494 0x00000000
0 2883 100.00 2889 2494 0x00001f00
2 2883 100.00 2889 2494 0x00001f00
1 2883 100.00 2889 2494 0x00001f00
3 2883 100.00 2889 2494 0x00001600
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001400
1 2893 100.00 2900 2494 0x00001b00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001100
2 2893 100.00 2900 2494 0x00001000
1 2893 100.00 2900 2494 0x00001f00
3 2893 99.98 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001300
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001500
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2899 2494 0x00000000
0 2893 100.00 2899 2494 0x00001f00
2 2893 100.00 2899 2494 0x00001c00
1 2893 100.00 2899 2494 0x00001f00
3 2893 100.00 2899 2494 0x00001c00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001b00
3 2893 100.00 2900 2494 0x00001d00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2879 100.00 2886 2494 0x00000000
0 2879 100.00 2886 2494 0x00001200
2 2879 100.00 2886 2494 0x00001f00
1 2879 100.00 2886 2494 0x00001f00
3 2879 100.00 2886 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2892 99.98 2900 2494 0x00001300
3 2893 100.00 2900 2494 0x00001000
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2495 0x00000000
0 2892 100.00 2898 2495 0x00001f00
2 2892 100.00 2898 2495 0x00000800
1 2892 100.00 2898 2495 0x00001b00
3 2892 100.00 2898 2494 0x00001600
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001b00
1 2892 99.96 2900 2494 0x00001f00
3 2893 99.99 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2891 99.99 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001a00
2 2892 100.00 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00001f00
3 2890 99.95 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001600
2 2893 100.00 2900 2494 0x00001100
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001c00
[-- Attachment #3: after.txt --]
[-- Type: text/plain, Size: 18461 bytes --]
CPUID(7): No-SGX
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 171 20.00 854 2496 0x00000000
0 182 21.23 858 2496 0x00000800
2 170 19.93 855 2496 0x00000800
1 173 20.54 845 2496 0x00000800
3 157 18.32 858 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 222 25.57 872 2494 0x00000000
0 210 23.66 889 2494 0x00000800
2 200 23.20 865 2494 0x00000800
1 189 21.33 886 2494 0x00000800
3 291 34.09 856 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 132 16.37 808 2494 0x00000000
0 131 16.19 811 2494 0x00000800
2 215 26.87 801 2494 0x00000800
1 107 13.33 803 2494 0x00000800
3 76 9.08 834 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 503 47.44 1063 2494 0x00000000
0 510 48.60 1051 2494 0x00000a00
2 482 45.17 1070 2494 0x00000800
1 512 48.94 1048 2494 0x00000a00
3 508 47.06 1082 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 447 40.48 1108 2494 0x00000000
0 444 39.85 1117 2494 0x00000800
2 422 38.66 1093 2494 0x00000a00
1 440 38.68 1140 2494 0x00000800
3 484 44.75 1084 2494 0x00000a00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 405 34.24 1185 2494 0x00000000
0 422 33.63 1258 2494 0x00000800
2 435 35.97 1211 2494 0x00000800
1 296 26.52 1118 2494 0x00000800
3 468 40.85 1147 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 155 19.07 813 2494 0x00000000
0 161 19.92 809 2494 0x00000800
2 156 19.31 810 2494 0x00000800
1 119 14.79 809 2494 0x00000800
3 183 22.26 823 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 632 36.32 1744 2494 0x00000000
0 1237 58.35 2125 2494 0x00001f00
2 503 31.16 1617 2494 0x00000800
1 396 28.29 1403 2494 0x00000800
3 392 27.50 1430 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 799 36.92 2168 2494 0x00000000
0 1935 77.42 2505 2494 0x00000800
2 386 22.32 1733 2494 0x00000800
1 474 25.39 1870 2494 0x00000800
3 400 22.53 1778 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 1856 82.07 2267 2494 0x00000000
0 1850 82.38 2251 2494 0x00000900
2 1836 81.44 2260 2494 0x00001500
1 1850 80.66 2299 2494 0x00001700
3 1889 83.80 2259 2494 0x00000a00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2601 95.10 2741 2494 0x00000000
0 2625 96.00 2740 2494 0x00001200
2 2573 93.94 2746 2494 0x00001000
1 2635 96.58 2734 2494 0x00000c00
3 2570 93.87 2744 2494 0x00000b00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2488 92.91 2684 2494 0x00000000
0 2475 92.17 2692 2494 0x00000b00
2 2525 94.49 2678 2494 0x00001c00
1 2506 93.54 2685 2494 0x00001f00
3 2446 91.43 2682 2494 0x00000800
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2508 94.52 2659 2494 0x00000000
0 2551 96.11 2661 2494 0x00001200
2 2525 95.10 2661 2494 0x00001f00
1 2531 95.48 2657 2494 0x00001f00
3 2423 91.40 2657 2494 0x00001400
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2616 94.99 2760 2494 0x00000000
0 2577 93.80 2754 2494 0x00001f00
2 2653 96.61 2753 2494 0x00001f00
1 2606 94.38 2767 2494 0x00001f00
3 2626 95.17 2765 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2851 98.63 2897 2494 0x00000000
0 2823 97.66 2897 2494 0x00000f00
2 2854 98.71 2898 2494 0x00001600
1 2881 99.69 2897 2494 0x00001f00
3 2845 98.44 2897 2494 0x00001900
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2871 99.23 2900 2494 0x00000000
0 2884 99.68 2900 2494 0x00001900
2 2868 99.11 2900 2494 0x00001f00
1 2875 99.35 2900 2494 0x00001f00
3 2858 98.78 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 99.96 2900 2494 0x00000000
0 2893 99.97 2900 2494 0x00001f00
2 2892 99.93 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2892 99.94 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 99.97 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 99.97 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 99.98 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2495 0x00000000
0 2892 100.00 2898 2495 0x00001f00
2 2892 100.00 2898 2495 0x00001f00
1 2892 100.00 2898 2495 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2892 99.94 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2891 99.99 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2890 99.95 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2892 99.96 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.98 2900 2494 0x00000000
0 2892 99.97 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2892 99.95 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 99.98 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2894 100.00 2900 2494 0x00000000
0 2894 100.00 2900 2495 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2890 100.00 2897 2494 0x00000000
0 2890 100.00 2897 2494 0x00001f00
2 2890 100.00 2897 2494 0x00001f00
1 2890 100.00 2897 2494 0x00001f00
3 2890 100.00 2897 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 99.99 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2892 99.96 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2888 99.99 2895 2494 0x00000000
0 2888 100.00 2895 2494 0x00001f00
2 2888 100.00 2895 2494 0x00001f00
1 2888 100.00 2895 2494 0x00001f00
3 2888 99.98 2895 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 100.00 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2892 100.00 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 99.99 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2892 99.96 2900 2494 0x00000000
0 2889 99.84 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2893 100.00 2900 2494 0x00000000
0 2893 100.00 2900 2494 0x00001f00
2 2893 100.00 2900 2494 0x00001f00
1 2893 100.00 2900 2494 0x00001f00
3 2893 100.00 2900 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2886 99.81 2898 2494 0x00000000
0 2892 100.00 2898 2494 0x00001f00
2 2890 99.95 2898 2494 0x00001f00
1 2872 99.30 2898 2494 0x00001f00
3 2891 99.98 2898 2494 0x00001f00
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x199
- 2889 99.90 2898 2494 0x00000000
0 2891 99.98 2898 2494 0x00001f00
2 2892 100.00 2898 2494 0x00001f00
1 2880 99.61 2898 2494 0x00001f00
3 2892 100.00 2898 2494 0x00001f00
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-19 21:24 ` Rafael J. Wysocki
@ 2017-03-19 21:42 ` Rafael J. Wysocki
2017-03-20 10:38 ` Peter Zijlstra
1 sibling, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-19 21:42 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
Juri Lelli, Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Sunday, March 19, 2017 10:24:24 PM Rafael J. Wysocki wrote:
> On Sunday, March 19, 2017 02:34:32 PM Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The PELT metric used by the schedutil governor underestimates the
> > CPU utilization in some cases. The reason for that may be time spent
> > in interrupt handlers and similar which is not accounted for by PELT.
> >
> > That can be easily demonstrated by running kernel compilation on
> > a Sandy Bridge Intel processor, running turbostat in parallel with
> > it and looking at the values written to the MSR_IA32_PERF_CTL
> > register. Namely, the expected result would be that when all CPUs
> > were 100% busy, all of them would be requested to run in the maximum
> > P-state, but observation shows that this clearly isn't the case.
> > The CPUs run in the maximum P-state for a while and then are
> > requested to run slower and go back to the maximum P-state after
> > a while again. That causes the actual frequency of the processor to
> > visibly oscillate below the sustainable maximum in a jittery fashion
> > which clearly is not desirable.
>
> In case you are wondering about the actual numbers, attached are two turbostat
> log files from two runs of the same workload, without (before.txt) and with (after.txt)
> the patch applied.
>
> The workload is essentially "make -j 5" in the kernel source tree and the
> machine has an SSD storage and a quad-core Intel Sandy Bridge processor.
> The P-states available for each core are between 8 and 31 (0x1f) corresponding
> to 800 MHz and 3.1 GHz, respectively. All cores can run sustainably at 2.9 GHz
> at the same time, although that is not a guaranteed sustainable frequency
> (it may be dropped occasionally for thermal reasons, for example).
>
> The interesting columns are Bzy_MHz (and specifically the rows with "-" under
> CPU that correspond to the entire processor), which is the avreage frequency
> between iterations based on the numbers read from feedback registers, and
> the rightmost one, which is the values written to the P-state request register
> (the 3rd and 4th hex digits from the right represent the requested P-state).
>
> The turbostat data collection ran every 2 seconds and I looked at the last 30
> iterations in each case corresponding to about 1 minute of the workload run
> during which all of the cores were around 100% busy.
>
> Now, if you look at after.txt (the run with the patch applied), you'll notice that
> during those last 30 iterations P-state 31 (0x1f) had been requested on all
> cores pretty much 100% of the time (meaning: as expected in that case) and
> the average processor frequency (computed by taking the average from
> all of the 30 "-" rows) was 2899.33 MHz (apparently, the hardware decided to
> drop it from 2.9 GHz occasionally).
>
> In the before.txt case (without the patch) the average frequency over the last
> 30 iterations was 2896.90 MHz which is about 0.8% slower than with the patch
> applied (on the average).
0.08% of course, sorry. Still visible, though. :-)
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [PATCH 1/2] cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
2017-03-19 13:30 ` [PATCH 1/2] cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start() Rafael J. Wysocki
@ 2017-03-20 3:28 ` Viresh Kumar
2017-03-20 12:36 ` Rafael J. Wysocki
0 siblings, 1 reply; 71+ messages in thread
From: Viresh Kumar @ 2017-03-20 3:28 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Peter Zijlstra, Srinivas Pandruvada, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 19-03-17, 14:30, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> sugov_start() only initializes struct sugov_cpu per-CPU structures
> for shared policies, but it should do that for single-CPU policies too.
>
> That in particular makes the IO-wait boost mechanism work in the
> cases when cpufreq policies correspond to individual CPUs.
>
> Fixes: 21ca6d2c52f8 (cpufreq: schedutil: Add iowait boosting)
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
--
viresh
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-19 13:34 ` [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Rafael J. Wysocki
2017-03-19 21:24 ` Rafael J. Wysocki
@ 2017-03-20 3:57 ` Viresh Kumar
2017-03-20 8:26 ` Vincent Guittot
2017-03-20 12:48 ` Rafael J. Wysocki
2017-03-20 10:36 ` Peter Zijlstra
2017-03-20 21:46 ` [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of " Rafael J. Wysocki
3 siblings, 2 replies; 71+ messages in thread
From: Viresh Kumar @ 2017-03-20 3:57 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Peter Zijlstra, Srinivas Pandruvada, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 19-03-17, 14:34, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The PELT metric used by the schedutil governor underestimates the
> CPU utilization in some cases. The reason for that may be time spent
> in interrupt handlers and similar which is not accounted for by PELT.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
>
> To work around this issue use the observation that, from the
> schedutil governor's perspective, CPUs that are never idle should
> always run at the maximum frequency and make that happen.
>
> To that end, add a counter of idle calls to struct sugov_cpu and
> modify cpuidle_idle_call() to increment that counter every time it
> is about to put the given CPU into an idle state. Next, make the
> schedutil governor look at that counter for the current CPU every
> time before it is about to start heavy computations. If the counter
> has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> the CPU has not been idle for at least that long and the governor
> will choose the maximum frequency for it without looking at the PELT
> metric at all.
Looks like we are fixing a PELT problem with a schedutil Hack :)
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> include/linux/sched/cpufreq.h | 6 ++++++
> kernel/sched/cpufreq_schedutil.c | 38 ++++++++++++++++++++++++++++++++++++--
> kernel/sched/idle.c | 3 +++
> 3 files changed, 45 insertions(+), 2 deletions(-)
>
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -20,6 +20,7 @@
> #include "sched.h"
>
> #define SUGOV_KTHREAD_PRIORITY 50
> +#define SUGOV_BUSY_THRESHOLD (50 * NSEC_PER_MSEC)
>
> struct sugov_tunables {
> struct gov_attr_set attr_set;
> @@ -55,6 +56,9 @@ struct sugov_cpu {
>
> unsigned long iowait_boost;
> unsigned long iowait_boost_max;
> + unsigned long idle_calls;
> + unsigned long saved_idle_calls;
> + u64 busy_start;
> u64 last_update;
>
> /* The fields below are only needed when sharing a policy. */
> @@ -192,6 +196,34 @@ static void sugov_iowait_boost(struct su
> sg_cpu->iowait_boost >>= 1;
> }
>
> +void cpufreq_schedutil_idle_call(void)
> +{
> + struct sugov_cpu *sg_cpu = this_cpu_ptr(&sugov_cpu);
> +
> + sg_cpu->idle_calls++;
> +}
> +
> +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> +{
> + if (sg_cpu->idle_calls != sg_cpu->saved_idle_calls) {
> + sg_cpu->busy_start = 0;
> + return false;
> + }
> +
> + if (!sg_cpu->busy_start) {
> + sg_cpu->busy_start = sg_cpu->last_update;
> + return false;
> + }
> +
> + return sg_cpu->last_update - sg_cpu->busy_start > SUGOV_BUSY_THRESHOLD;
> +}
> +
> +static void sugov_save_idle_calls(struct sugov_cpu *sg_cpu)
> +{
> + if (!sg_cpu->busy_start)
> + sg_cpu->saved_idle_calls = sg_cpu->idle_calls;
Why aren't we doing this in sugov_cpu_is_busy() itself ? And isn't it possible
for idle_calls to get incremented by this time?
> +}
> +
> static void sugov_update_single(struct update_util_data *hook, u64 time,
> unsigned int flags)
> {
> @@ -207,7 +239,7 @@ static void sugov_update_single(struct u
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> - if (flags & SCHED_CPUFREQ_RT_DL) {
> + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu)) {
> next_f = policy->cpuinfo.max_freq;
> } else {
> sugov_get_util(&util, &max);
> @@ -215,6 +247,7 @@ static void sugov_update_single(struct u
> next_f = get_next_freq(sg_policy, util, max);
> }
> sugov_update_commit(sg_policy, time, next_f);
> + sugov_save_idle_calls(sg_cpu);
> }
>
> static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
> @@ -278,12 +311,13 @@ static void sugov_update_shared(struct u
> sg_cpu->last_update = time;
>
> if (sugov_should_update_freq(sg_policy, time)) {
> - if (flags & SCHED_CPUFREQ_RT_DL)
> + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu))
What about others CPUs in this policy?
> next_f = sg_policy->policy->cpuinfo.max_freq;
> else
> next_f = sugov_next_freq_shared(sg_cpu);
>
> sugov_update_commit(sg_policy, time, next_f);
> + sugov_save_idle_calls(sg_cpu);
> }
>
> raw_spin_unlock(&sg_policy->update_lock);
--
viresh
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 3:57 ` Viresh Kumar
@ 2017-03-20 8:26 ` Vincent Guittot
2017-03-20 12:34 ` Patrick Bellasi
2017-03-20 12:59 ` Rafael J. Wysocki
2017-03-20 12:48 ` Rafael J. Wysocki
1 sibling, 2 replies; 71+ messages in thread
From: Vincent Guittot @ 2017-03-20 8:26 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael J. Wysocki, Linux PM, LKML, Peter Zijlstra,
Srinivas Pandruvada, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 20 March 2017 at 04:57, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 19-03-17, 14:34, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> The PELT metric used by the schedutil governor underestimates the
>> CPU utilization in some cases. The reason for that may be time spent
>> in interrupt handlers and similar which is not accounted for by PELT.
Are you sure of the root cause described above (time stolen by irq
handler) or is it just a hypotheses ? That would be good to be sure of
the root cause
Furthermore, IIRC the time spent in irq context is also accounted as
run time for the running cfs task but not RT and deadline task running
time
So I'm not really aligned with the description of your problem: PELT
metric underestimates the load of the CPU. The PELT is just about
tracking CFS task utilization but not whole CPU utilization and
according to your description of the problem (time stolen by irq),
your problem doesn't come from an underestimation of CFS task but from
time spent in something else but not accounted in the value used by
schedutil
That would be good to be sure of what is running during this not
accounted time and find a way to account them
>>
>> That can be easily demonstrated by running kernel compilation on
>> a Sandy Bridge Intel processor, running turbostat in parallel with
>> it and looking at the values written to the MSR_IA32_PERF_CTL
>> register. Namely, the expected result would be that when all CPUs
>> were 100% busy, all of them would be requested to run in the maximum
>> P-state, but observation shows that this clearly isn't the case.
>> The CPUs run in the maximum P-state for a while and then are
>> requested to run slower and go back to the maximum P-state after
>> a while again. That causes the actual frequency of the processor to
>> visibly oscillate below the sustainable maximum in a jittery fashion
>> which clearly is not desirable.
>>
>> To work around this issue use the observation that, from the
>> schedutil governor's perspective, CPUs that are never idle should
>> always run at the maximum frequency and make that happen.
>>
>> To that end, add a counter of idle calls to struct sugov_cpu and
>> modify cpuidle_idle_call() to increment that counter every time it
>> is about to put the given CPU into an idle state. Next, make the
>> schedutil governor look at that counter for the current CPU every
>> time before it is about to start heavy computations. If the counter
>> has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
>> the CPU has not been idle for at least that long and the governor
>> will choose the maximum frequency for it without looking at the PELT
>> metric at all.
>
> Looks like we are fixing a PELT problem with a schedutil Hack :)
I would not say that it's a PELT problem (at least based on current
description) but more that we don't track all
activities of CPU
>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>> include/linux/sched/cpufreq.h | 6 ++++++
>> kernel/sched/cpufreq_schedutil.c | 38 ++++++++++++++++++++++++++++++++++++--
>> kernel/sched/idle.c | 3 +++
>> 3 files changed, 45 insertions(+), 2 deletions(-)
>>
>> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
>> ===================================================================
>> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
>> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
>> @@ -20,6 +20,7 @@
>> #include "sched.h"
>>
>> #define SUGOV_KTHREAD_PRIORITY 50
>> +#define SUGOV_BUSY_THRESHOLD (50 * NSEC_PER_MSEC)
>>
>> struct sugov_tunables {
>> struct gov_attr_set attr_set;
>> @@ -55,6 +56,9 @@ struct sugov_cpu {
>>
>> unsigned long iowait_boost;
>> unsigned long iowait_boost_max;
>> + unsigned long idle_calls;
>> + unsigned long saved_idle_calls;
>> + u64 busy_start;
>> u64 last_update;
>>
>> /* The fields below are only needed when sharing a policy. */
>> @@ -192,6 +196,34 @@ static void sugov_iowait_boost(struct su
>> sg_cpu->iowait_boost >>= 1;
>> }
>>
>> +void cpufreq_schedutil_idle_call(void)
>> +{
>> + struct sugov_cpu *sg_cpu = this_cpu_ptr(&sugov_cpu);
>> +
>> + sg_cpu->idle_calls++;
>> +}
>> +
>> +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
>> +{
>> + if (sg_cpu->idle_calls != sg_cpu->saved_idle_calls) {
>> + sg_cpu->busy_start = 0;
>> + return false;
>> + }
>> +
>> + if (!sg_cpu->busy_start) {
>> + sg_cpu->busy_start = sg_cpu->last_update;
>> + return false;
>> + }
>> +
>> + return sg_cpu->last_update - sg_cpu->busy_start > SUGOV_BUSY_THRESHOLD;
>> +}
>> +
>> +static void sugov_save_idle_calls(struct sugov_cpu *sg_cpu)
>> +{
>> + if (!sg_cpu->busy_start)
>> + sg_cpu->saved_idle_calls = sg_cpu->idle_calls;
>
> Why aren't we doing this in sugov_cpu_is_busy() itself ? And isn't it possible
> for idle_calls to get incremented by this time?
>
>> +}
>> +
>> static void sugov_update_single(struct update_util_data *hook, u64 time,
>> unsigned int flags)
>> {
>> @@ -207,7 +239,7 @@ static void sugov_update_single(struct u
>> if (!sugov_should_update_freq(sg_policy, time))
>> return;
>>
>> - if (flags & SCHED_CPUFREQ_RT_DL) {
>> + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu)) {
>> next_f = policy->cpuinfo.max_freq;
>> } else {
>> sugov_get_util(&util, &max);
>> @@ -215,6 +247,7 @@ static void sugov_update_single(struct u
>> next_f = get_next_freq(sg_policy, util, max);
>> }
>> sugov_update_commit(sg_policy, time, next_f);
>> + sugov_save_idle_calls(sg_cpu);
>> }
>>
>> static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
>> @@ -278,12 +311,13 @@ static void sugov_update_shared(struct u
>> sg_cpu->last_update = time;
>>
>> if (sugov_should_update_freq(sg_policy, time)) {
>> - if (flags & SCHED_CPUFREQ_RT_DL)
>> + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu))
>
> What about others CPUs in this policy?
>
>> next_f = sg_policy->policy->cpuinfo.max_freq;
>> else
>> next_f = sugov_next_freq_shared(sg_cpu);
>>
>> sugov_update_commit(sg_policy, time, next_f);
>> + sugov_save_idle_calls(sg_cpu);
>> }
>>
>> raw_spin_unlock(&sg_policy->update_lock);
>
> --
> viresh
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-19 13:34 ` [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Rafael J. Wysocki
2017-03-19 21:24 ` Rafael J. Wysocki
2017-03-20 3:57 ` Viresh Kumar
@ 2017-03-20 10:36 ` Peter Zijlstra
2017-03-20 12:35 ` Rafael J. Wysocki
2017-03-20 21:46 ` [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of " Rafael J. Wysocki
3 siblings, 1 reply; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-20 10:36 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Sun, Mar 19, 2017 at 02:34:32PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The PELT metric used by the schedutil governor underestimates the
> CPU utilization in some cases. The reason for that may be time spent
> in interrupt handlers and similar which is not accounted for by PELT.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
>
> To work around this issue use the observation that, from the
> schedutil governor's perspective, CPUs that are never idle should
> always run at the maximum frequency and make that happen.
>
> To that end, add a counter of idle calls to struct sugov_cpu and
> modify cpuidle_idle_call() to increment that counter every time it
> is about to put the given CPU into an idle state. Next, make the
> schedutil governor look at that counter for the current CPU every
> time before it is about to start heavy computations. If the counter
> has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> the CPU has not been idle for at least that long and the governor
> will choose the maximum frequency for it without looking at the PELT
> metric at all.
Why the time limit?
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-19 21:24 ` Rafael J. Wysocki
2017-03-19 21:42 ` Rafael J. Wysocki
@ 2017-03-20 10:38 ` Peter Zijlstra
2017-03-20 12:31 ` Rafael J. Wysocki
1 sibling, 1 reply; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-20 10:38 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Sun, Mar 19, 2017 at 10:24:24PM +0100, Rafael J. Wysocki wrote:
> Honestly, if the processor had been capable of doing per-core P-states, that
> would have been a disaster and there are customers who wouldn't look at
> schedutil again after being confronted with these numbers.
This, I feel, is a bit overstated. We have bug, we fix them.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 10:38 ` Peter Zijlstra
@ 2017-03-20 12:31 ` Rafael J. Wysocki
0 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-20 12:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linux PM, LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Monday, March 20, 2017 11:38:15 AM Peter Zijlstra wrote:
> On Sun, Mar 19, 2017 at 10:24:24PM +0100, Rafael J. Wysocki wrote:
>
> > Honestly, if the processor had been capable of doing per-core P-states, that
> > would have been a disaster and there are customers who wouldn't look at
> > schedutil again after being confronted with these numbers.
>
> This, I feel, is a bit overstated.
Maybe a bit, but I'm not really sure ...
> We have bug, we fix them.
Of course. :-)
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 8:26 ` Vincent Guittot
@ 2017-03-20 12:34 ` Patrick Bellasi
2017-03-22 23:56 ` Joel Fernandes
2017-03-20 12:59 ` Rafael J. Wysocki
1 sibling, 1 reply; 71+ messages in thread
From: Patrick Bellasi @ 2017-03-20 12:34 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Viresh Kumar, Vincent Guittot, Linux PM, LKML, Peter Zijlstra,
Srinivas Pandruvada, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 20-Mar 09:26, Vincent Guittot wrote:
> On 20 March 2017 at 04:57, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > On 19-03-17, 14:34, Rafael J. Wysocki wrote:
> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> >> The PELT metric used by the schedutil governor underestimates the
> >> CPU utilization in some cases. The reason for that may be time spent
> >> in interrupt handlers and similar which is not accounted for by PELT.
>
> Are you sure of the root cause described above (time stolen by irq
> handler) or is it just a hypotheses ? That would be good to be sure of
> the root cause
> Furthermore, IIRC the time spent in irq context is also accounted as
> run time for the running cfs task but not RT and deadline task running
> time
As long as the IRQ processing does not generate a context switch,
which is happening (eventually) if the top half schedule some deferred
work to be executed by a bottom half.
Thus, me too I would say that all the top half time is accounted in
PELT, since the current task is still RUNNABLE/RUNNING.
> So I'm not really aligned with the description of your problem: PELT
> metric underestimates the load of the CPU. The PELT is just about
> tracking CFS task utilization but not whole CPU utilization and
> according to your description of the problem (time stolen by irq),
> your problem doesn't come from an underestimation of CFS task but from
> time spent in something else but not accounted in the value used by
> schedutil
Quite likely. Indeed, it can really be that the CFS task is preempted
because of some RT activity generated by the IRQ handler.
More in general, I've also noticed many suboptimal freq switches when
RT tasks interleave with CFS ones, because of:
- relatively long down _and up_ throttling times
- the way schedutil's flags are tracked and updated
- the callsites from where we call schedutil updates
For example it can really happen that we are running at the highest
OPP because of some RT activity. Then we switch back to a relatively
low utilization CFS workload and then:
1. a tick happens which produces a frequency drop
2. right after a new IRQ starts a RT task
Well, because of the schedutil's throttling mechanism we can end up
running at the reduce OPP for quite long before (eventually) going
back to the higher OPP... if and only we are lucky enough to get a new
RT task starting outside of a throttling window.
I guess that if we have quite a lot of IRQ->RT activations in a CPU
with a relatively low CFS utilization, this unwanted frequency hopping
is quite likely.
> That would be good to be sure of what is running during this not
> accounted time and find a way to account them
Do we have any number to characterize the frequency of IRQ/RT
activations?
We should notice also that, while CFS tasks are preempted by RT ones,
the PELT signal decays. This can contribute on lowering even further
the utilization demand from the CFS class.
> >> That can be easily demonstrated by running kernel compilation on
> >> a Sandy Bridge Intel processor, running turbostat in parallel with
> >> it and looking at the values written to the MSR_IA32_PERF_CTL
> >> register. Namely, the expected result would be that when all CPUs
> >> were 100% busy, all of them would be requested to run in the maximum
> >> P-state, but observation shows that this clearly isn't the case.
Can you share a trace with at least sched_switch events enabled?
This can help on identify which workloads are there and how they
interact with schedutil.
> >> The CPUs run in the maximum P-state for a while and then are
> >> requested to run slower and go back to the maximum P-state after
> >> a while again. That causes the actual frequency of the processor to
That's possibly exactly the pattern I've described above.
> >> visibly oscillate below the sustainable maximum in a jittery fashion
> >> which clearly is not desirable.
> >>
> >> To work around this issue use the observation that, from the
> >> schedutil governor's perspective, CPUs that are never idle should
> >> always run at the maximum frequency and make that happen.
Perhaps some of the patches I've posted in this series:
https://lkml.org/lkml/2017/3/2/385
can help on that side.
The pattern you report seems to me quite matching with the case "b"
described in the cover letter. I've also shared a notebook:
https://gist.github.com/derkling/d6a21b459a18091b2b058668a550010d
which clearly shows that behavior happening in the first plot,
cell named Out[15].
Did you had a chance to have a look at them?
It would be nice to know if they can provide some benefits to your
use-case.
> >> To that end, add a counter of idle calls to struct sugov_cpu and
> >> modify cpuidle_idle_call() to increment that counter every time it
> >> is about to put the given CPU into an idle state. Next, make the
> >> schedutil governor look at that counter for the current CPU every
> >> time before it is about to start heavy computations. If the counter
> >> has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> >> the CPU has not been idle for at least that long and the governor
> >> will choose the maximum frequency for it without looking at the PELT
> >> metric at all.
> >
> > Looks like we are fixing a PELT problem with a schedutil Hack :)
>
> I would not say that it's a PELT problem (at least based on current
> description) but more that we don't track all
> activities of CPU
Agree... with both comments.
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> ---
> >> include/linux/sched/cpufreq.h | 6 ++++++
> >> kernel/sched/cpufreq_schedutil.c | 38 ++++++++++++++++++++++++++++++++++++--
> >> kernel/sched/idle.c | 3 +++
> >> 3 files changed, 45 insertions(+), 2 deletions(-)
> >>
> >> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> >> ===================================================================
> >> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> >> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> >> @@ -20,6 +20,7 @@
> >> #include "sched.h"
> >>
> >> #define SUGOV_KTHREAD_PRIORITY 50
> >> +#define SUGOV_BUSY_THRESHOLD (50 * NSEC_PER_MSEC)
> >>
> >> struct sugov_tunables {
> >> struct gov_attr_set attr_set;
> >> @@ -55,6 +56,9 @@ struct sugov_cpu {
> >>
> >> unsigned long iowait_boost;
> >> unsigned long iowait_boost_max;
> >> + unsigned long idle_calls;
> >> + unsigned long saved_idle_calls;
> >> + u64 busy_start;
> >> u64 last_update;
> >>
> >> /* The fields below are only needed when sharing a policy. */
> >> @@ -192,6 +196,34 @@ static void sugov_iowait_boost(struct su
> >> sg_cpu->iowait_boost >>= 1;
> >> }
> >>
> >> +void cpufreq_schedutil_idle_call(void)
> >> +{
> >> + struct sugov_cpu *sg_cpu = this_cpu_ptr(&sugov_cpu);
> >> +
> >> + sg_cpu->idle_calls++;
> >> +}
> >> +
> >> +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> >> +{
> >> + if (sg_cpu->idle_calls != sg_cpu->saved_idle_calls) {
> >> + sg_cpu->busy_start = 0;
> >> + return false;
> >> + }
> >> +
> >> + if (!sg_cpu->busy_start) {
> >> + sg_cpu->busy_start = sg_cpu->last_update;
> >> + return false;
> >> + }
> >> +
> >> + return sg_cpu->last_update - sg_cpu->busy_start > SUGOV_BUSY_THRESHOLD;
> >> +}
> >> +
> >> +static void sugov_save_idle_calls(struct sugov_cpu *sg_cpu)
> >> +{
> >> + if (!sg_cpu->busy_start)
> >> + sg_cpu->saved_idle_calls = sg_cpu->idle_calls;
> >
> > Why aren't we doing this in sugov_cpu_is_busy() itself ? And isn't it possible
> > for idle_calls to get incremented by this time?
> >
> >> +}
> >> +
> >> static void sugov_update_single(struct update_util_data *hook, u64 time,
> >> unsigned int flags)
> >> {
> >> @@ -207,7 +239,7 @@ static void sugov_update_single(struct u
> >> if (!sugov_should_update_freq(sg_policy, time))
> >> return;
> >>
> >> - if (flags & SCHED_CPUFREQ_RT_DL) {
> >> + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu)) {
> >> next_f = policy->cpuinfo.max_freq;
> >> } else {
> >> sugov_get_util(&util, &max);
> >> @@ -215,6 +247,7 @@ static void sugov_update_single(struct u
> >> next_f = get_next_freq(sg_policy, util, max);
> >> }
> >> sugov_update_commit(sg_policy, time, next_f);
> >> + sugov_save_idle_calls(sg_cpu);
> >> }
> >>
> >> static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
> >> @@ -278,12 +311,13 @@ static void sugov_update_shared(struct u
> >> sg_cpu->last_update = time;
> >>
> >> if (sugov_should_update_freq(sg_policy, time)) {
> >> - if (flags & SCHED_CPUFREQ_RT_DL)
> >> + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu))
> >
> > What about others CPUs in this policy?
> >
> >> next_f = sg_policy->policy->cpuinfo.max_freq;
> >> else
> >> next_f = sugov_next_freq_shared(sg_cpu);
> >>
> >> sugov_update_commit(sg_policy, time, next_f);
> >> + sugov_save_idle_calls(sg_cpu);
> >> }
> >>
> >> raw_spin_unlock(&sg_policy->update_lock);
> >
> > --
> > viresh
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 10:36 ` Peter Zijlstra
@ 2017-03-20 12:35 ` Rafael J. Wysocki
2017-03-20 12:50 ` Peter Zijlstra
0 siblings, 1 reply; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-20 12:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linux PM, LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Monday, March 20, 2017 11:36:45 AM Peter Zijlstra wrote:
> On Sun, Mar 19, 2017 at 02:34:32PM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The PELT metric used by the schedutil governor underestimates the
> > CPU utilization in some cases. The reason for that may be time spent
> > in interrupt handlers and similar which is not accounted for by PELT.
> >
> > That can be easily demonstrated by running kernel compilation on
> > a Sandy Bridge Intel processor, running turbostat in parallel with
> > it and looking at the values written to the MSR_IA32_PERF_CTL
> > register. Namely, the expected result would be that when all CPUs
> > were 100% busy, all of them would be requested to run in the maximum
> > P-state, but observation shows that this clearly isn't the case.
> > The CPUs run in the maximum P-state for a while and then are
> > requested to run slower and go back to the maximum P-state after
> > a while again. That causes the actual frequency of the processor to
> > visibly oscillate below the sustainable maximum in a jittery fashion
> > which clearly is not desirable.
> >
> > To work around this issue use the observation that, from the
> > schedutil governor's perspective, CPUs that are never idle should
> > always run at the maximum frequency and make that happen.
> >
> > To that end, add a counter of idle calls to struct sugov_cpu and
> > modify cpuidle_idle_call() to increment that counter every time it
> > is about to put the given CPU into an idle state. Next, make the
> > schedutil governor look at that counter for the current CPU every
> > time before it is about to start heavy computations. If the counter
> > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> > the CPU has not been idle for at least that long and the governor
> > will choose the maximum frequency for it without looking at the PELT
> > metric at all.
>
> Why the time limit?
One iteration appeared to be a bit too aggressive, but honestly I think
I need to check again if this thing is regarded as viable at all.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [PATCH 1/2] cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
2017-03-20 3:28 ` Viresh Kumar
@ 2017-03-20 12:36 ` Rafael J. Wysocki
0 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-20 12:36 UTC (permalink / raw)
To: Viresh Kumar
Cc: Linux PM, LKML, Peter Zijlstra, Srinivas Pandruvada, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Monday, March 20, 2017 08:58:40 AM Viresh Kumar wrote:
> On 19-03-17, 14:30, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > sugov_start() only initializes struct sugov_cpu per-CPU structures
> > for shared policies, but it should do that for single-CPU policies too.
> >
> > That in particular makes the IO-wait boost mechanism work in the
> > cases when cpufreq policies correspond to individual CPUs.
> >
> > Fixes: 21ca6d2c52f8 (cpufreq: schedutil: Add iowait boosting)
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
"stable" checks Fixes: tags too, but yes.
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 3:57 ` Viresh Kumar
2017-03-20 8:26 ` Vincent Guittot
@ 2017-03-20 12:48 ` Rafael J. Wysocki
1 sibling, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-20 12:48 UTC (permalink / raw)
To: Viresh Kumar
Cc: Linux PM, LKML, Peter Zijlstra, Srinivas Pandruvada, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Monday, March 20, 2017 09:27:45 AM Viresh Kumar wrote:
> On 19-03-17, 14:34, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The PELT metric used by the schedutil governor underestimates the
> > CPU utilization in some cases. The reason for that may be time spent
> > in interrupt handlers and similar which is not accounted for by PELT.
> >
> > That can be easily demonstrated by running kernel compilation on
> > a Sandy Bridge Intel processor, running turbostat in parallel with
> > it and looking at the values written to the MSR_IA32_PERF_CTL
> > register. Namely, the expected result would be that when all CPUs
> > were 100% busy, all of them would be requested to run in the maximum
> > P-state, but observation shows that this clearly isn't the case.
> > The CPUs run in the maximum P-state for a while and then are
> > requested to run slower and go back to the maximum P-state after
> > a while again. That causes the actual frequency of the processor to
> > visibly oscillate below the sustainable maximum in a jittery fashion
> > which clearly is not desirable.
> >
> > To work around this issue use the observation that, from the
> > schedutil governor's perspective, CPUs that are never idle should
> > always run at the maximum frequency and make that happen.
> >
> > To that end, add a counter of idle calls to struct sugov_cpu and
> > modify cpuidle_idle_call() to increment that counter every time it
> > is about to put the given CPU into an idle state. Next, make the
> > schedutil governor look at that counter for the current CPU every
> > time before it is about to start heavy computations. If the counter
> > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> > the CPU has not been idle for at least that long and the governor
> > will choose the maximum frequency for it without looking at the PELT
> > metric at all.
>
> Looks like we are fixing a PELT problem with a schedutil Hack :)
Did you notice the "To work around this issue" phrase above? This is a
workaround, not a fix.
But it is an optimization too, because avoiding heavy computations if they
are not necessary should be a win in any case.
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > include/linux/sched/cpufreq.h | 6 ++++++
> > kernel/sched/cpufreq_schedutil.c | 38 ++++++++++++++++++++++++++++++++++++--
> > kernel/sched/idle.c | 3 +++
> > 3 files changed, 45 insertions(+), 2 deletions(-)
> >
> > Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> > ===================================================================
> > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> > +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> > @@ -20,6 +20,7 @@
> > #include "sched.h"
> >
> > #define SUGOV_KTHREAD_PRIORITY 50
> > +#define SUGOV_BUSY_THRESHOLD (50 * NSEC_PER_MSEC)
> >
> > struct sugov_tunables {
> > struct gov_attr_set attr_set;
> > @@ -55,6 +56,9 @@ struct sugov_cpu {
> >
> > unsigned long iowait_boost;
> > unsigned long iowait_boost_max;
> > + unsigned long idle_calls;
> > + unsigned long saved_idle_calls;
The above two could be unsigned int, BTW.
> > + u64 busy_start;
> > u64 last_update;
> >
> > /* The fields below are only needed when sharing a policy. */
> > @@ -192,6 +196,34 @@ static void sugov_iowait_boost(struct su
> > sg_cpu->iowait_boost >>= 1;
> > }
> >
> > +void cpufreq_schedutil_idle_call(void)
> > +{
> > + struct sugov_cpu *sg_cpu = this_cpu_ptr(&sugov_cpu);
> > +
> > + sg_cpu->idle_calls++;
> > +}
> > +
> > +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> > +{
> > + if (sg_cpu->idle_calls != sg_cpu->saved_idle_calls) {
> > + sg_cpu->busy_start = 0;
> > + return false;
> > + }
> > +
> > + if (!sg_cpu->busy_start) {
> > + sg_cpu->busy_start = sg_cpu->last_update;
> > + return false;
> > + }
> > +
> > + return sg_cpu->last_update - sg_cpu->busy_start > SUGOV_BUSY_THRESHOLD;
> > +}
> > +
> > +static void sugov_save_idle_calls(struct sugov_cpu *sg_cpu)
> > +{
> > + if (!sg_cpu->busy_start)
> > + sg_cpu->saved_idle_calls = sg_cpu->idle_calls;
>
> Why aren't we doing this in sugov_cpu_is_busy() itself ?
No, we aren't.
sugov_cpu_is_busy() may not be called at all sometimes.
> And isn't it possible for idle_calls to get incremented by this time?
No, it isn't.
The idle loop does that and after it has disabled interrupts. If this code is
running, we surely are not in there on the same CPU, are we?
> > +}
> > +
> > static void sugov_update_single(struct update_util_data *hook, u64 time,
> > unsigned int flags)
> > {
> > @@ -207,7 +239,7 @@ static void sugov_update_single(struct u
> > if (!sugov_should_update_freq(sg_policy, time))
> > return;
> >
> > - if (flags & SCHED_CPUFREQ_RT_DL) {
> > + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu)) {
> > next_f = policy->cpuinfo.max_freq;
> > } else {
> > sugov_get_util(&util, &max);
> > @@ -215,6 +247,7 @@ static void sugov_update_single(struct u
> > next_f = get_next_freq(sg_policy, util, max);
> > }
> > sugov_update_commit(sg_policy, time, next_f);
> > + sugov_save_idle_calls(sg_cpu);
> > }
> >
> > static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
> > @@ -278,12 +311,13 @@ static void sugov_update_shared(struct u
> > sg_cpu->last_update = time;
> >
> > if (sugov_should_update_freq(sg_policy, time)) {
> > - if (flags & SCHED_CPUFREQ_RT_DL)
> > + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu))
>
> What about others CPUs in this policy?
We'll check this for them when they get updated.
The point is that we can just bail out earlier here and avoid the heavy stuff,
but we don't have to. This is opportunistic.
> > next_f = sg_policy->policy->cpuinfo.max_freq;
> > else
> > next_f = sugov_next_freq_shared(sg_cpu);
> >
> > sugov_update_commit(sg_policy, time, next_f);
> > + sugov_save_idle_calls(sg_cpu);
> > }
> >
> > raw_spin_unlock(&sg_policy->update_lock);
>
>
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 12:35 ` Rafael J. Wysocki
@ 2017-03-20 12:50 ` Peter Zijlstra
2017-03-20 13:04 ` Rafael J. Wysocki
2017-03-20 13:06 ` Patrick Bellasi
0 siblings, 2 replies; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-20 12:50 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Mon, Mar 20, 2017 at 01:35:12PM +0100, Rafael J. Wysocki wrote:
> On Monday, March 20, 2017 11:36:45 AM Peter Zijlstra wrote:
> > On Sun, Mar 19, 2017 at 02:34:32PM +0100, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > The PELT metric used by the schedutil governor underestimates the
> > > CPU utilization in some cases. The reason for that may be time spent
> > > in interrupt handlers and similar which is not accounted for by PELT.
> > >
> > > That can be easily demonstrated by running kernel compilation on
> > > a Sandy Bridge Intel processor, running turbostat in parallel with
> > > it and looking at the values written to the MSR_IA32_PERF_CTL
> > > register. Namely, the expected result would be that when all CPUs
> > > were 100% busy, all of them would be requested to run in the maximum
> > > P-state, but observation shows that this clearly isn't the case.
> > > The CPUs run in the maximum P-state for a while and then are
> > > requested to run slower and go back to the maximum P-state after
> > > a while again. That causes the actual frequency of the processor to
> > > visibly oscillate below the sustainable maximum in a jittery fashion
> > > which clearly is not desirable.
> > >
> > > To work around this issue use the observation that, from the
> > > schedutil governor's perspective, CPUs that are never idle should
> > > always run at the maximum frequency and make that happen.
> > >
> > > To that end, add a counter of idle calls to struct sugov_cpu and
> > > modify cpuidle_idle_call() to increment that counter every time it
> > > is about to put the given CPU into an idle state. Next, make the
> > > schedutil governor look at that counter for the current CPU every
> > > time before it is about to start heavy computations. If the counter
> > > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> > > the CPU has not been idle for at least that long and the governor
> > > will choose the maximum frequency for it without looking at the PELT
> > > metric at all.
> >
> > Why the time limit?
>
> One iteration appeared to be a bit too aggressive, but honestly I think
> I need to check again if this thing is regarded as viable at all.
>
I don't hate the idea; if we don't hit idle; we shouldn't shift down. I
just wonder if we don't already keep a idle-seqcount somewhere; NOHZ and
RCU come to mind as things that might already use something like that.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 8:26 ` Vincent Guittot
2017-03-20 12:34 ` Patrick Bellasi
@ 2017-03-20 12:59 ` Rafael J. Wysocki
2017-03-20 13:20 ` Vincent Guittot
1 sibling, 1 reply; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-20 12:59 UTC (permalink / raw)
To: Vincent Guittot
Cc: Viresh Kumar, Linux PM, LKML, Peter Zijlstra,
Srinivas Pandruvada, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Monday, March 20, 2017 09:26:34 AM Vincent Guittot wrote:
> On 20 March 2017 at 04:57, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > On 19-03-17, 14:34, Rafael J. Wysocki wrote:
> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> >> The PELT metric used by the schedutil governor underestimates the
> >> CPU utilization in some cases. The reason for that may be time spent
> >> in interrupt handlers and similar which is not accounted for by PELT.
>
> Are you sure of the root cause described above (time stolen by irq
> handler) or is it just a hypotheses ? That would be good to be sure of
> the root cause
No, I'm not sure. That's why I said "may be". :-)
> Furthermore, IIRC the time spent in irq context is also accounted as
> run time for the running cfs task but not RT and deadline task running
> time
OK
Anyway, the problem is that we have a 100% busy CPU which quite evidently
is not reported as 100% busy by the metric we use.
Now, if I was sure about the root cause, I would fix it rather than suggest
workarounds.
> So I'm not really aligned with the description of your problem: PELT
> metric underestimates the load of the CPU. The PELT is just about
> tracking CFS task utilization but not whole CPU utilization and
> according to your description of the problem (time stolen by irq),
> your problem doesn't come from an underestimation of CFS task but from
> time spent in something else but not accounted in the value used by
> schedutil
That's fair enough, but the assumption was that PELT would be sufficient for
that. To me, it clearly isn't, so the assumption was incorrect.
> That would be good to be sure of what is running during this not
> accounted time and find a way to account them
Yes, I agree.
I'm not sure if I can carry out full investigation of that any time soon, however.
I sent this mostly to let everybody know that there was a problem and how it
could be worked around. That's why it is an RFC.
>
> >>
> >> That can be easily demonstrated by running kernel compilation on
> >> a Sandy Bridge Intel processor, running turbostat in parallel with
> >> it and looking at the values written to the MSR_IA32_PERF_CTL
> >> register. Namely, the expected result would be that when all CPUs
> >> were 100% busy, all of them would be requested to run in the maximum
> >> P-state, but observation shows that this clearly isn't the case.
> >> The CPUs run in the maximum P-state for a while and then are
> >> requested to run slower and go back to the maximum P-state after
> >> a while again. That causes the actual frequency of the processor to
> >> visibly oscillate below the sustainable maximum in a jittery fashion
> >> which clearly is not desirable.
> >>
> >> To work around this issue use the observation that, from the
> >> schedutil governor's perspective, CPUs that are never idle should
> >> always run at the maximum frequency and make that happen.
> >>
> >> To that end, add a counter of idle calls to struct sugov_cpu and
> >> modify cpuidle_idle_call() to increment that counter every time it
> >> is about to put the given CPU into an idle state. Next, make the
> >> schedutil governor look at that counter for the current CPU every
> >> time before it is about to start heavy computations. If the counter
> >> has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> >> the CPU has not been idle for at least that long and the governor
> >> will choose the maximum frequency for it without looking at the PELT
> >> metric at all.
> >
> > Looks like we are fixing a PELT problem with a schedutil Hack :)
>
> I would not say that it's a PELT problem (at least based on current
> description) but more that we don't track all
> activities of CPU
I generally agree. PELT does what it does.
However, using PELT the way we do that in schedutil turns out to be problematic.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 12:50 ` Peter Zijlstra
@ 2017-03-20 13:04 ` Rafael J. Wysocki
2017-03-20 13:06 ` Patrick Bellasi
1 sibling, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-20 13:04 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linux PM, LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Monday, March 20, 2017 01:50:09 PM Peter Zijlstra wrote:
> On Mon, Mar 20, 2017 at 01:35:12PM +0100, Rafael J. Wysocki wrote:
> > On Monday, March 20, 2017 11:36:45 AM Peter Zijlstra wrote:
> > > On Sun, Mar 19, 2017 at 02:34:32PM +0100, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > The PELT metric used by the schedutil governor underestimates the
> > > > CPU utilization in some cases. The reason for that may be time spent
> > > > in interrupt handlers and similar which is not accounted for by PELT.
> > > >
> > > > That can be easily demonstrated by running kernel compilation on
> > > > a Sandy Bridge Intel processor, running turbostat in parallel with
> > > > it and looking at the values written to the MSR_IA32_PERF_CTL
> > > > register. Namely, the expected result would be that when all CPUs
> > > > were 100% busy, all of them would be requested to run in the maximum
> > > > P-state, but observation shows that this clearly isn't the case.
> > > > The CPUs run in the maximum P-state for a while and then are
> > > > requested to run slower and go back to the maximum P-state after
> > > > a while again. That causes the actual frequency of the processor to
> > > > visibly oscillate below the sustainable maximum in a jittery fashion
> > > > which clearly is not desirable.
> > > >
> > > > To work around this issue use the observation that, from the
> > > > schedutil governor's perspective, CPUs that are never idle should
> > > > always run at the maximum frequency and make that happen.
> > > >
> > > > To that end, add a counter of idle calls to struct sugov_cpu and
> > > > modify cpuidle_idle_call() to increment that counter every time it
> > > > is about to put the given CPU into an idle state. Next, make the
> > > > schedutil governor look at that counter for the current CPU every
> > > > time before it is about to start heavy computations. If the counter
> > > > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> > > > the CPU has not been idle for at least that long and the governor
> > > > will choose the maximum frequency for it without looking at the PELT
> > > > metric at all.
> > >
> > > Why the time limit?
> >
> > One iteration appeared to be a bit too aggressive, but honestly I think
> > I need to check again if this thing is regarded as viable at all.
> >
>
> I don't hate the idea; if we don't hit idle; we shouldn't shift down.
OK
> I just wonder if we don't already keep a idle-seqcount somewhere; NOHZ and
> RCU come to mind as things that might already use something like that.
NOHZ does that, but I did't want this to artificially depend on NOHZ. That said,
yes, we can use that one too.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 13:06 ` Patrick Bellasi
@ 2017-03-20 13:05 ` Rafael J. Wysocki
2017-03-20 14:13 ` Patrick Bellasi
0 siblings, 1 reply; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-20 13:05 UTC (permalink / raw)
To: Patrick Bellasi
Cc: Peter Zijlstra, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Vincent Guittot, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Monday, March 20, 2017 01:06:15 PM Patrick Bellasi wrote:
> On 20-Mar 13:50, Peter Zijlstra wrote:
> > On Mon, Mar 20, 2017 at 01:35:12PM +0100, Rafael J. Wysocki wrote:
> > > On Monday, March 20, 2017 11:36:45 AM Peter Zijlstra wrote:
> > > > On Sun, Mar 19, 2017 at 02:34:32PM +0100, Rafael J. Wysocki wrote:
> > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > >
> > > > > The PELT metric used by the schedutil governor underestimates the
> > > > > CPU utilization in some cases. The reason for that may be time spent
> > > > > in interrupt handlers and similar which is not accounted for by PELT.
> > > > >
> > > > > That can be easily demonstrated by running kernel compilation on
> > > > > a Sandy Bridge Intel processor, running turbostat in parallel with
> > > > > it and looking at the values written to the MSR_IA32_PERF_CTL
> > > > > register. Namely, the expected result would be that when all CPUs
> > > > > were 100% busy, all of them would be requested to run in the maximum
> > > > > P-state, but observation shows that this clearly isn't the case.
> > > > > The CPUs run in the maximum P-state for a while and then are
> > > > > requested to run slower and go back to the maximum P-state after
> > > > > a while again. That causes the actual frequency of the processor to
> > > > > visibly oscillate below the sustainable maximum in a jittery fashion
> > > > > which clearly is not desirable.
> > > > >
> > > > > To work around this issue use the observation that, from the
> > > > > schedutil governor's perspective, CPUs that are never idle should
> > > > > always run at the maximum frequency and make that happen.
> > > > >
> > > > > To that end, add a counter of idle calls to struct sugov_cpu and
> > > > > modify cpuidle_idle_call() to increment that counter every time it
> > > > > is about to put the given CPU into an idle state. Next, make the
> > > > > schedutil governor look at that counter for the current CPU every
> > > > > time before it is about to start heavy computations. If the counter
> > > > > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> > > > > the CPU has not been idle for at least that long and the governor
> > > > > will choose the maximum frequency for it without looking at the PELT
> > > > > metric at all.
> > > >
> > > > Why the time limit?
> > >
> > > One iteration appeared to be a bit too aggressive, but honestly I think
> > > I need to check again if this thing is regarded as viable at all.
> > >
> >
> > I don't hate the idea; if we don't hit idle; we shouldn't shift down. I
> > just wonder if we don't already keep a idle-seqcount somewhere; NOHZ and
> > RCU come to mind as things that might already use something like that.
>
> Maybe the problem is not going down (e.g. when there are only small
> CFS tasks it makes perfectly sense) but instead not being fast enough
> on rampin-up when a new RT task is activated.
>
> And this boils down to two main point:
> 1) throttling for up transitions perhaps is only harmful
> 2) the call sites for schedutils updates are not properly positioned
> in specific scheduler decision points.
>
> The proposed patch is adding yet another throttling mechanism, perhaps
> on top of one which already needs to be improved.
It is not throttling anything.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 12:50 ` Peter Zijlstra
2017-03-20 13:04 ` Rafael J. Wysocki
@ 2017-03-20 13:06 ` Patrick Bellasi
2017-03-20 13:05 ` Rafael J. Wysocki
1 sibling, 1 reply; 71+ messages in thread
From: Patrick Bellasi @ 2017-03-20 13:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Vincent Guittot, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 20-Mar 13:50, Peter Zijlstra wrote:
> On Mon, Mar 20, 2017 at 01:35:12PM +0100, Rafael J. Wysocki wrote:
> > On Monday, March 20, 2017 11:36:45 AM Peter Zijlstra wrote:
> > > On Sun, Mar 19, 2017 at 02:34:32PM +0100, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > The PELT metric used by the schedutil governor underestimates the
> > > > CPU utilization in some cases. The reason for that may be time spent
> > > > in interrupt handlers and similar which is not accounted for by PELT.
> > > >
> > > > That can be easily demonstrated by running kernel compilation on
> > > > a Sandy Bridge Intel processor, running turbostat in parallel with
> > > > it and looking at the values written to the MSR_IA32_PERF_CTL
> > > > register. Namely, the expected result would be that when all CPUs
> > > > were 100% busy, all of them would be requested to run in the maximum
> > > > P-state, but observation shows that this clearly isn't the case.
> > > > The CPUs run in the maximum P-state for a while and then are
> > > > requested to run slower and go back to the maximum P-state after
> > > > a while again. That causes the actual frequency of the processor to
> > > > visibly oscillate below the sustainable maximum in a jittery fashion
> > > > which clearly is not desirable.
> > > >
> > > > To work around this issue use the observation that, from the
> > > > schedutil governor's perspective, CPUs that are never idle should
> > > > always run at the maximum frequency and make that happen.
> > > >
> > > > To that end, add a counter of idle calls to struct sugov_cpu and
> > > > modify cpuidle_idle_call() to increment that counter every time it
> > > > is about to put the given CPU into an idle state. Next, make the
> > > > schedutil governor look at that counter for the current CPU every
> > > > time before it is about to start heavy computations. If the counter
> > > > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> > > > the CPU has not been idle for at least that long and the governor
> > > > will choose the maximum frequency for it without looking at the PELT
> > > > metric at all.
> > >
> > > Why the time limit?
> >
> > One iteration appeared to be a bit too aggressive, but honestly I think
> > I need to check again if this thing is regarded as viable at all.
> >
>
> I don't hate the idea; if we don't hit idle; we shouldn't shift down. I
> just wonder if we don't already keep a idle-seqcount somewhere; NOHZ and
> RCU come to mind as things that might already use something like that.
Maybe the problem is not going down (e.g. when there are only small
CFS tasks it makes perfectly sense) but instead not being fast enough
on rampin-up when a new RT task is activated.
And this boils down to two main point:
1) throttling for up transitions perhaps is only harmful
2) the call sites for schedutils updates are not properly positioned
in specific scheduler decision points.
The proposed patch is adding yet another throttling mechanism, perhaps
on top of one which already needs to be improved.
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 12:59 ` Rafael J. Wysocki
@ 2017-03-20 13:20 ` Vincent Guittot
0 siblings, 0 replies; 71+ messages in thread
From: Vincent Guittot @ 2017-03-20 13:20 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Viresh Kumar, Linux PM, LKML, Peter Zijlstra,
Srinivas Pandruvada, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 20 March 2017 at 13:59, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Monday, March 20, 2017 09:26:34 AM Vincent Guittot wrote:
>> On 20 March 2017 at 04:57, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> > On 19-03-17, 14:34, Rafael J. Wysocki wrote:
>> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >>
>> >> The PELT metric used by the schedutil governor underestimates the
>> >> CPU utilization in some cases. The reason for that may be time spent
>> >> in interrupt handlers and similar which is not accounted for by PELT.
>>
>> Are you sure of the root cause described above (time stolen by irq
>> handler) or is it just a hypotheses ? That would be good to be sure of
>> the root cause
>
> No, I'm not sure. That's why I said "may be". :-)
>
>> Furthermore, IIRC the time spent in irq context is also accounted as
>> run time for the running cfs task but not RT and deadline task running
>> time
>
> OK
>
> Anyway, the problem is that we have a 100% busy CPU which quite evidently
> is not reported as 100% busy by the metric we use.
>
> Now, if I was sure about the root cause, I would fix it rather than suggest
> workarounds.
>
>> So I'm not really aligned with the description of your problem: PELT
>> metric underestimates the load of the CPU. The PELT is just about
>> tracking CFS task utilization but not whole CPU utilization and
>> according to your description of the problem (time stolen by irq),
>> your problem doesn't come from an underestimation of CFS task but from
>> time spent in something else but not accounted in the value used by
>> schedutil
>
> That's fair enough, but the assumption was that PELT would be sufficient for
> that. To me, it clearly isn't, so the assumption was incorrect.
So PELT is just about CFS utilization and we need metrics for other
sched class as well
I know that Juri and other people are working on the dealine part
In current kernel, the only available metric is rt_avg (and
scale_rt_capacity) which gives you the available compute capacity for
CFS and in some way the amount of capacity used by RT and deadline.
That would be interesting to see if the amount of capacity used by RT
tasks (i assume that you don't have deadline task in your use case)
is similar to the amount of underestimation of CFS
>
>> That would be good to be sure of what is running during this not
>> accounted time and find a way to account them
>
> Yes, I agree.
>
> I'm not sure if I can carry out full investigation of that any time soon, however.
If you can share traces as Patrick proposed, we could estimate how
much time/capacity is used by rt tasks and if this can explain the
underestimation.
>
> I sent this mostly to let everybody know that there was a problem and how it
> could be worked around. That's why it is an RFC.
Ok
>
>>
>> >>
>> >> That can be easily demonstrated by running kernel compilation on
>> >> a Sandy Bridge Intel processor, running turbostat in parallel with
>> >> it and looking at the values written to the MSR_IA32_PERF_CTL
>> >> register. Namely, the expected result would be that when all CPUs
>> >> were 100% busy, all of them would be requested to run in the maximum
>> >> P-state, but observation shows that this clearly isn't the case.
>> >> The CPUs run in the maximum P-state for a while and then are
>> >> requested to run slower and go back to the maximum P-state after
>> >> a while again. That causes the actual frequency of the processor to
>> >> visibly oscillate below the sustainable maximum in a jittery fashion
>> >> which clearly is not desirable.
>> >>
>> >> To work around this issue use the observation that, from the
>> >> schedutil governor's perspective, CPUs that are never idle should
>> >> always run at the maximum frequency and make that happen.
>> >>
>> >> To that end, add a counter of idle calls to struct sugov_cpu and
>> >> modify cpuidle_idle_call() to increment that counter every time it
>> >> is about to put the given CPU into an idle state. Next, make the
>> >> schedutil governor look at that counter for the current CPU every
>> >> time before it is about to start heavy computations. If the counter
>> >> has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
>> >> the CPU has not been idle for at least that long and the governor
>> >> will choose the maximum frequency for it without looking at the PELT
>> >> metric at all.
>> >
>> > Looks like we are fixing a PELT problem with a schedutil Hack :)
>>
>> I would not say that it's a PELT problem (at least based on current
>> description) but more that we don't track all
>> activities of CPU
>
> I generally agree. PELT does what it does.
>
> However, using PELT the way we do that in schedutil turns out to be problematic.
yes. PELT is not enough but just part of the equation
>
> Thanks,
> Rafael
>
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 13:05 ` Rafael J. Wysocki
@ 2017-03-20 14:13 ` Patrick Bellasi
0 siblings, 0 replies; 71+ messages in thread
From: Patrick Bellasi @ 2017-03-20 14:13 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Peter Zijlstra, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Vincent Guittot, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 20-Mar 14:05, Rafael J. Wysocki wrote:
> On Monday, March 20, 2017 01:06:15 PM Patrick Bellasi wrote:
> > On 20-Mar 13:50, Peter Zijlstra wrote:
> > > On Mon, Mar 20, 2017 at 01:35:12PM +0100, Rafael J. Wysocki wrote:
> > > > On Monday, March 20, 2017 11:36:45 AM Peter Zijlstra wrote:
> > > > > On Sun, Mar 19, 2017 at 02:34:32PM +0100, Rafael J. Wysocki wrote:
> > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > >
> > > > > > The PELT metric used by the schedutil governor underestimates the
> > > > > > CPU utilization in some cases. The reason for that may be time spent
> > > > > > in interrupt handlers and similar which is not accounted for by PELT.
> > > > > >
> > > > > > That can be easily demonstrated by running kernel compilation on
> > > > > > a Sandy Bridge Intel processor, running turbostat in parallel with
> > > > > > it and looking at the values written to the MSR_IA32_PERF_CTL
> > > > > > register. Namely, the expected result would be that when all CPUs
> > > > > > were 100% busy, all of them would be requested to run in the maximum
> > > > > > P-state, but observation shows that this clearly isn't the case.
> > > > > > The CPUs run in the maximum P-state for a while and then are
> > > > > > requested to run slower and go back to the maximum P-state after
> > > > > > a while again. That causes the actual frequency of the processor to
> > > > > > visibly oscillate below the sustainable maximum in a jittery fashion
> > > > > > which clearly is not desirable.
> > > > > >
> > > > > > To work around this issue use the observation that, from the
> > > > > > schedutil governor's perspective, CPUs that are never idle should
> > > > > > always run at the maximum frequency and make that happen.
> > > > > >
> > > > > > To that end, add a counter of idle calls to struct sugov_cpu and
> > > > > > modify cpuidle_idle_call() to increment that counter every time it
> > > > > > is about to put the given CPU into an idle state. Next, make the
> > > > > > schedutil governor look at that counter for the current CPU every
> > > > > > time before it is about to start heavy computations. If the counter
> > > > > > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> > > > > > the CPU has not been idle for at least that long and the governor
> > > > > > will choose the maximum frequency for it without looking at the PELT
> > > > > > metric at all.
> > > > >
> > > > > Why the time limit?
> > > >
> > > > One iteration appeared to be a bit too aggressive, but honestly I think
> > > > I need to check again if this thing is regarded as viable at all.
> > > >
> > >
> > > I don't hate the idea; if we don't hit idle; we shouldn't shift down. I
> > > just wonder if we don't already keep a idle-seqcount somewhere; NOHZ and
> > > RCU come to mind as things that might already use something like that.
> >
> > Maybe the problem is not going down (e.g. when there are only small
> > CFS tasks it makes perfectly sense) but instead not being fast enough
> > on rampin-up when a new RT task is activated.
> >
> > And this boils down to two main point:
> > 1) throttling for up transitions perhaps is only harmful
> > 2) the call sites for schedutils updates are not properly positioned
> > in specific scheduler decision points.
> >
> > The proposed patch is adding yet another throttling mechanism, perhaps
> > on top of one which already needs to be improved.
>
> It is not throttling anything.
It's a kind-of...
- if (flags & SCHED_CPUFREQ_RT_DL) {
+ if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu)) {
next_f = policy->cpuinfo.max_freq;
This check disregard any signal the scheduler can provide.
A 60% CFS task with a 100ms period, with such a policy will end up
running at the highest OPP for just 10% of its entire activation.
Moreover, when it completes, we are likely to enter an idle OPP while
still remaining at the highest OPP.
IMHO the ultimate goal of scheduitl should be that to be driven by the
scheduler, which has (or can have) all the required information to
support OPP selection.
If something is not working, well, then we should properly fix the
signals and/or provide (at least) a per-task tunable interface.
Adding an hardcoded threshold is an easy fix but it will ultimately
increase the complexity of the governor.
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 71+ messages in thread
* [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-19 13:34 ` [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Rafael J. Wysocki
` (2 preceding siblings ...)
2017-03-20 10:36 ` Peter Zijlstra
@ 2017-03-20 21:46 ` Rafael J. Wysocki
2017-03-21 6:40 ` Viresh Kumar
` (3 more replies)
3 siblings, 4 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-20 21:46 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
Juri Lelli, Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.
That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register. Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again. That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.
To work around this issue use the observation that, from the
schedutil governor's perspective, it does not make sense to decrease
the frequency of a CPU that doesn't enter idle and avoid decreasing
the frequency of busy CPUs.
To that end, use the counter of idle calls in the timekeeping code.
Namely, make the schedutil governor look at that counter for the
current CPU every time before it is about to set a new frequency
for that CPU's policy. If the counter has not changed since the
previous iteration, the CPU has been busy for all that time and
its frequency should not be decreased, so if the new frequency would
be lower than the one set previously, the governor will skip the
frequency update.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
This is a slightly different approach (avoid decreasing frequency for busy CPUs
instead of bumping if for them to the max upfront) and it works around the
original problem too.
I tried to address a few Peter's comments here and the result doesn't seem to
be too heavy-wieght.
Thanks,
Rafael
---
include/linux/tick.h | 1 +
kernel/sched/cpufreq_schedutil.c | 28 ++++++++++++++++++++++++----
kernel/time/tick-sched.c | 12 ++++++++++++
3 files changed, 37 insertions(+), 4 deletions(-)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -56,6 +56,9 @@ struct sugov_cpu {
unsigned long iowait_boost;
unsigned long iowait_boost_max;
u64 last_update;
+#ifdef CONFIG_NO_HZ_COMMON
+ unsigned long saved_idle_calls;
+#endif
/* The fields below are only needed when sharing a policy. */
unsigned long util;
@@ -88,11 +91,28 @@ static bool sugov_should_update_freq(str
return delta_ns >= sg_policy->freq_update_delay_ns;
}
-static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
- unsigned int next_freq)
+#ifdef CONFIG_NO_HZ_COMMON
+static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
+{
+ unsigned long idle_calls = tick_nohz_get_idle_calls();
+ bool ret = idle_calls == sg_cpu->saved_idle_calls;
+
+ sg_cpu->saved_idle_calls = idle_calls;
+ return ret;
+}
+#else
+static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
+#endif /* CONFIG_NO_HZ_COMMON */
+
+static void sugov_update_commit(struct sugov_cpu *sg_cpu,
+ struct sugov_policy *sg_policy,
+ u64 time, unsigned int next_freq)
{
struct cpufreq_policy *policy = sg_policy->policy;
+ if (sugov_cpu_is_busy(sg_cpu) && next_freq < sg_policy->next_freq)
+ next_freq = sg_policy->next_freq;
+
if (policy->fast_switch_enabled) {
if (sg_policy->next_freq == next_freq) {
trace_cpu_frequency(policy->cur, smp_processor_id());
@@ -214,7 +234,7 @@ static void sugov_update_single(struct u
sugov_iowait_boost(sg_cpu, &util, &max);
next_f = get_next_freq(sg_policy, util, max);
}
- sugov_update_commit(sg_policy, time, next_f);
+ sugov_update_commit(sg_cpu, sg_policy, time, next_f);
}
static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
@@ -283,7 +303,7 @@ static void sugov_update_shared(struct u
else
next_f = sugov_next_freq_shared(sg_cpu);
- sugov_update_commit(sg_policy, time, next_f);
+ sugov_update_commit(sg_cpu, sg_policy, time, next_f);
}
raw_spin_unlock(&sg_policy->update_lock);
Index: linux-pm/include/linux/tick.h
===================================================================
--- linux-pm.orig/include/linux/tick.h
+++ linux-pm/include/linux/tick.h
@@ -117,6 +117,7 @@ extern void tick_nohz_idle_enter(void);
extern void tick_nohz_idle_exit(void);
extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
+extern unsigned long tick_nohz_get_idle_calls(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
#else /* !CONFIG_NO_HZ_COMMON */
Index: linux-pm/kernel/time/tick-sched.c
===================================================================
--- linux-pm.orig/kernel/time/tick-sched.c
+++ linux-pm/kernel/time/tick-sched.c
@@ -993,6 +993,18 @@ ktime_t tick_nohz_get_sleep_length(void)
return ts->sleep_length;
}
+/**
+ * tick_nohz_get_idle_calls - return the current idle calls counter value
+ *
+ * Called from the schedutil frequency scaling governor in scheduler context.
+ */
+unsigned long tick_nohz_get_idle_calls(void)
+{
+ struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+
+ return ts->idle_calls;
+}
+
static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
{
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-20 21:46 ` [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of " Rafael J. Wysocki
@ 2017-03-21 6:40 ` Viresh Kumar
2017-03-21 12:30 ` Rafael J. Wysocki
2017-03-21 8:50 ` Vincent Guittot
` (2 subsequent siblings)
3 siblings, 1 reply; 71+ messages in thread
From: Viresh Kumar @ 2017-03-21 6:40 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Peter Zijlstra, Srinivas Pandruvada, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 20-03-17, 22:46, Rafael J. Wysocki wrote:
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> +static void sugov_update_commit(struct sugov_cpu *sg_cpu,
> + struct sugov_policy *sg_policy,
> + u64 time, unsigned int next_freq)
> {
> struct cpufreq_policy *policy = sg_policy->policy;
>
> + if (sugov_cpu_is_busy(sg_cpu) && next_freq < sg_policy->next_freq)
> + next_freq = sg_policy->next_freq;
> +
In the earlier version you said that we want to be opportunistic and
don't want to do heavy computation and so check only for current CPU.
But in this version, all those computations are already done by now.
Why shouldn't we check all CPUs in the policy now? I am asking as we
will still have the same problem, we are trying to work-around if the
current CPU isn't busy but others sharing the policy are.
Also, why not return directly from within the if block? To run
trace_cpu_frequency()?
I don't remember exactly, but why don't we run that for !fast-switch
case? We can simplify the code a bit if we check for no freq change at
the top of the routine.
--
viresh
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-20 21:46 ` [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of " Rafael J. Wysocki
2017-03-21 6:40 ` Viresh Kumar
@ 2017-03-21 8:50 ` Vincent Guittot
2017-03-21 11:56 ` Patrick Bellasi
2017-03-21 13:22 ` Peter Zijlstra
2017-03-21 11:50 ` Patrick Bellasi
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
3 siblings, 2 replies; 71+ messages in thread
From: Vincent Guittot @ 2017-03-21 8:50 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Peter Zijlstra, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The way the schedutil governor uses the PELT metric causes it to
> underestimate the CPU utilization in some cases.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
>
> To work around this issue use the observation that, from the
> schedutil governor's perspective, it does not make sense to decrease
> the frequency of a CPU that doesn't enter idle and avoid decreasing
> the frequency of busy CPUs.
I don't fully agree with that statement.
If there are 2 runnable tasks on CPU A and scheduler migrates the
waiting task to another CPU B so CPU A is less loaded now, it makes
sense to reduce the OPP. That's even for that purpose that we have
decided to use scheduler metrics in cpufreq governor so we can adjust
OPP immediately when tasks migrate.
That being said, i probably know why you see such OPP switches in your
use case. When we migrate a task, we also migrate/remove its
utilization from CPU.
If the CPU is not overloaded, it means that runnable tasks have all
computation that they need and don't have any reason to use more when
a task migrates to another CPU. so decreasing the OPP makes sense
because the utilzation is decreasing
If the CPU is overloaded, it means that runnable tasks have to share
CPU time and probably don't have all computations that they would like
so when a task migrate, the remaining tasks on the CPU will increase
their utilization and fill space left by the task that has just
migrated. So the CPU's utilization will decrease when a task migrates
(and as a result the OPP) but then its utilization will increase with
remaining tasks running more time as well as the OPP
So you need to make the difference between this 2 cases: Is a CPU
overloaded or not. You can't really rely on the utilization to detect
that but you could take advantage of the load which take into account
the waiting time of tasks
Vincent
>
> To that end, use the counter of idle calls in the timekeeping code.
> Namely, make the schedutil governor look at that counter for the
> current CPU every time before it is about to set a new frequency
> for that CPU's policy. If the counter has not changed since the
> previous iteration, the CPU has been busy for all that time and
> its frequency should not be decreased, so if the new frequency would
> be lower than the one set previously, the governor will skip the
> frequency update.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> This is a slightly different approach (avoid decreasing frequency for busy CPUs
> instead of bumping if for them to the max upfront) and it works around the
> original problem too.
>
> I tried to address a few Peter's comments here and the result doesn't seem to
> be too heavy-wieght.
>
> Thanks,
> Rafael
>
> ---
> include/linux/tick.h | 1 +
> kernel/sched/cpufreq_schedutil.c | 28 ++++++++++++++++++++++++----
> kernel/time/tick-sched.c | 12 ++++++++++++
> 3 files changed, 37 insertions(+), 4 deletions(-)
>
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -56,6 +56,9 @@ struct sugov_cpu {
> unsigned long iowait_boost;
> unsigned long iowait_boost_max;
> u64 last_update;
> +#ifdef CONFIG_NO_HZ_COMMON
> + unsigned long saved_idle_calls;
> +#endif
>
> /* The fields below are only needed when sharing a policy. */
> unsigned long util;
> @@ -88,11 +91,28 @@ static bool sugov_should_update_freq(str
> return delta_ns >= sg_policy->freq_update_delay_ns;
> }
>
> -static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> - unsigned int next_freq)
> +#ifdef CONFIG_NO_HZ_COMMON
> +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> +{
> + unsigned long idle_calls = tick_nohz_get_idle_calls();
> + bool ret = idle_calls == sg_cpu->saved_idle_calls;
> +
> + sg_cpu->saved_idle_calls = idle_calls;
> + return ret;
> +}
> +#else
> +static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
> +#endif /* CONFIG_NO_HZ_COMMON */
> +
> +static void sugov_update_commit(struct sugov_cpu *sg_cpu,
> + struct sugov_policy *sg_policy,
> + u64 time, unsigned int next_freq)
> {
> struct cpufreq_policy *policy = sg_policy->policy;
>
> + if (sugov_cpu_is_busy(sg_cpu) && next_freq < sg_policy->next_freq)
> + next_freq = sg_policy->next_freq;
> +
> if (policy->fast_switch_enabled) {
> if (sg_policy->next_freq == next_freq) {
> trace_cpu_frequency(policy->cur, smp_processor_id());
> @@ -214,7 +234,7 @@ static void sugov_update_single(struct u
> sugov_iowait_boost(sg_cpu, &util, &max);
> next_f = get_next_freq(sg_policy, util, max);
> }
> - sugov_update_commit(sg_policy, time, next_f);
> + sugov_update_commit(sg_cpu, sg_policy, time, next_f);
> }
>
> static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
> @@ -283,7 +303,7 @@ static void sugov_update_shared(struct u
> else
> next_f = sugov_next_freq_shared(sg_cpu);
>
> - sugov_update_commit(sg_policy, time, next_f);
> + sugov_update_commit(sg_cpu, sg_policy, time, next_f);
> }
>
> raw_spin_unlock(&sg_policy->update_lock);
> Index: linux-pm/include/linux/tick.h
> ===================================================================
> --- linux-pm.orig/include/linux/tick.h
> +++ linux-pm/include/linux/tick.h
> @@ -117,6 +117,7 @@ extern void tick_nohz_idle_enter(void);
> extern void tick_nohz_idle_exit(void);
> extern void tick_nohz_irq_exit(void);
> extern ktime_t tick_nohz_get_sleep_length(void);
> +extern unsigned long tick_nohz_get_idle_calls(void);
> extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> #else /* !CONFIG_NO_HZ_COMMON */
> Index: linux-pm/kernel/time/tick-sched.c
> ===================================================================
> --- linux-pm.orig/kernel/time/tick-sched.c
> +++ linux-pm/kernel/time/tick-sched.c
> @@ -993,6 +993,18 @@ ktime_t tick_nohz_get_sleep_length(void)
> return ts->sleep_length;
> }
>
> +/**
> + * tick_nohz_get_idle_calls - return the current idle calls counter value
> + *
> + * Called from the schedutil frequency scaling governor in scheduler context.
> + */
> +unsigned long tick_nohz_get_idle_calls(void)
> +{
> + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> +
> + return ts->idle_calls;
> +}
> +
> static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
> {
> #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-20 21:46 ` [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of " Rafael J. Wysocki
2017-03-21 6:40 ` Viresh Kumar
2017-03-21 8:50 ` Vincent Guittot
@ 2017-03-21 11:50 ` Patrick Bellasi
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
3 siblings, 0 replies; 71+ messages in thread
From: Patrick Bellasi @ 2017-03-21 11:50 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Peter Zijlstra, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Vincent Guittot, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 20-Mar 22:46, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The way the schedutil governor uses the PELT metric causes it to
> underestimate the CPU utilization in some cases.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
>
> To work around this issue use the observation that, from the
> schedutil governor's perspective, it does not make sense to decrease
> the frequency of a CPU that doesn't enter idle and avoid decreasing
> the frequency of busy CPUs.
>
> To that end, use the counter of idle calls in the timekeeping code.
> Namely, make the schedutil governor look at that counter for the
> current CPU every time before it is about to set a new frequency
> for that CPU's policy. If the counter has not changed since the
> previous iteration, the CPU has been busy for all that time and
> its frequency should not be decreased, so if the new frequency would
> be lower than the one set previously, the governor will skip the
> frequency update.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> This is a slightly different approach (avoid decreasing frequency for busy CPUs
> instead of bumping if for them to the max upfront) and it works around the
> original problem too.
I like much better this version where we do not enforce max frequency
as well as we removed the hardcoded time threshold. ;-)
Makes sense to me also to avoid down scaling until we don't hit an IDLE.
However, I also agree with Vincent's observation: this constraint should be
there only for "overutilized" CPUs... but unfortunately, in mainline we are
still missing that flag and thus we should probably use the
"overloaded" one for the time being.
> I tried to address a few Peter's comments here and the result doesn't seem to
> be too heavy-wieght.
Nice!
> Thanks,
> Rafael
>
> ---
> include/linux/tick.h | 1 +
> kernel/sched/cpufreq_schedutil.c | 28 ++++++++++++++++++++++++----
> kernel/time/tick-sched.c | 12 ++++++++++++
> 3 files changed, 37 insertions(+), 4 deletions(-)
>
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -56,6 +56,9 @@ struct sugov_cpu {
> unsigned long iowait_boost;
> unsigned long iowait_boost_max;
> u64 last_update;
> +#ifdef CONFIG_NO_HZ_COMMON
> + unsigned long saved_idle_calls;
> +#endif
>
> /* The fields below are only needed when sharing a policy. */
> unsigned long util;
> @@ -88,11 +91,28 @@ static bool sugov_should_update_freq(str
> return delta_ns >= sg_policy->freq_update_delay_ns;
> }
>
> -static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> - unsigned int next_freq)
> +#ifdef CONFIG_NO_HZ_COMMON
> +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> +{
> + unsigned long idle_calls = tick_nohz_get_idle_calls();
> + bool ret = idle_calls == sg_cpu->saved_idle_calls;
> +
> + sg_cpu->saved_idle_calls = idle_calls;
> + return ret;
> +}
> +#else
> +static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
> +#endif /* CONFIG_NO_HZ_COMMON */
> +
> +static void sugov_update_commit(struct sugov_cpu *sg_cpu,
> + struct sugov_policy *sg_policy,
> + u64 time, unsigned int next_freq)
> {
> struct cpufreq_policy *policy = sg_policy->policy;
>
> + if (sugov_cpu_is_busy(sg_cpu) && next_freq < sg_policy->next_freq)
> + next_freq = sg_policy->next_freq;
> +
> if (policy->fast_switch_enabled) {
> if (sg_policy->next_freq == next_freq) {
> trace_cpu_frequency(policy->cur, smp_processor_id());
> @@ -214,7 +234,7 @@ static void sugov_update_single(struct u
> sugov_iowait_boost(sg_cpu, &util, &max);
> next_f = get_next_freq(sg_policy, util, max);
> }
> - sugov_update_commit(sg_policy, time, next_f);
> + sugov_update_commit(sg_cpu, sg_policy, time, next_f);
> }
>
> static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
> @@ -283,7 +303,7 @@ static void sugov_update_shared(struct u
> else
> next_f = sugov_next_freq_shared(sg_cpu);
>
> - sugov_update_commit(sg_policy, time, next_f);
> + sugov_update_commit(sg_cpu, sg_policy, time, next_f);
> }
>
> raw_spin_unlock(&sg_policy->update_lock);
> Index: linux-pm/include/linux/tick.h
> ===================================================================
> --- linux-pm.orig/include/linux/tick.h
> +++ linux-pm/include/linux/tick.h
> @@ -117,6 +117,7 @@ extern void tick_nohz_idle_enter(void);
> extern void tick_nohz_idle_exit(void);
> extern void tick_nohz_irq_exit(void);
> extern ktime_t tick_nohz_get_sleep_length(void);
> +extern unsigned long tick_nohz_get_idle_calls(void);
> extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> #else /* !CONFIG_NO_HZ_COMMON */
> Index: linux-pm/kernel/time/tick-sched.c
> ===================================================================
> --- linux-pm.orig/kernel/time/tick-sched.c
> +++ linux-pm/kernel/time/tick-sched.c
> @@ -993,6 +993,18 @@ ktime_t tick_nohz_get_sleep_length(void)
> return ts->sleep_length;
> }
>
> +/**
> + * tick_nohz_get_idle_calls - return the current idle calls counter value
> + *
> + * Called from the schedutil frequency scaling governor in scheduler context.
> + */
> +unsigned long tick_nohz_get_idle_calls(void)
> +{
> + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> +
> + return ts->idle_calls;
> +}
> +
> static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
> {
> #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 8:50 ` Vincent Guittot
@ 2017-03-21 11:56 ` Patrick Bellasi
2017-03-21 13:22 ` Peter Zijlstra
1 sibling, 0 replies; 71+ messages in thread
From: Patrick Bellasi @ 2017-03-21 11:56 UTC (permalink / raw)
To: Vincent Guittot
Cc: Rafael J. Wysocki, Linux PM, LKML, Peter Zijlstra,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 21-Mar 09:50, Vincent Guittot wrote:
> On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The way the schedutil governor uses the PELT metric causes it to
> > underestimate the CPU utilization in some cases.
> >
> > That can be easily demonstrated by running kernel compilation on
> > a Sandy Bridge Intel processor, running turbostat in parallel with
> > it and looking at the values written to the MSR_IA32_PERF_CTL
> > register. Namely, the expected result would be that when all CPUs
> > were 100% busy, all of them would be requested to run in the maximum
> > P-state, but observation shows that this clearly isn't the case.
> > The CPUs run in the maximum P-state for a while and then are
> > requested to run slower and go back to the maximum P-state after
> > a while again. That causes the actual frequency of the processor to
> > visibly oscillate below the sustainable maximum in a jittery fashion
> > which clearly is not desirable.
> >
> > To work around this issue use the observation that, from the
> > schedutil governor's perspective, it does not make sense to decrease
> > the frequency of a CPU that doesn't enter idle and avoid decreasing
> > the frequency of busy CPUs.
>
> I don't fully agree with that statement.
> If there are 2 runnable tasks on CPU A and scheduler migrates the
> waiting task to another CPU B so CPU A is less loaded now, it makes
> sense to reduce the OPP. That's even for that purpose that we have
> decided to use scheduler metrics in cpufreq governor so we can adjust
> OPP immediately when tasks migrate.
> That being said, i probably know why you see such OPP switches in your
> use case. When we migrate a task, we also migrate/remove its
> utilization from CPU.
> If the CPU is not overloaded, it means that runnable tasks have all
> computation that they need and don't have any reason to use more when
> a task migrates to another CPU. so decreasing the OPP makes sense
> because the utilzation is decreasing
> If the CPU is overloaded, it means that runnable tasks have to share
> CPU time and probably don't have all computations that they would like
> so when a task migrate, the remaining tasks on the CPU will increase
> their utilization and fill space left by the task that has just
> migrated. So the CPU's utilization will decrease when a task migrates
> (and as a result the OPP) but then its utilization will increase with
> remaining tasks running more time as well as the OPP
>
> So you need to make the difference between this 2 cases: Is a CPU
> overloaded or not. You can't really rely on the utilization to detect
> that but you could take advantage of the load which take into account
> the waiting time of tasks
Right, we can use "overloaded" for the time being until we push the
"overutilized" bits.
[...]
> > +#ifdef CONFIG_NO_HZ_COMMON
> > +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> > +{
> > + unsigned long idle_calls = tick_nohz_get_idle_calls();
> > + bool ret = idle_calls == sg_cpu->saved_idle_calls;
Vincent: are you proposing something like this?
+ if (this_rq()->rd->overload)
+ return false;
> > +
> > + sg_cpu->saved_idle_calls = idle_calls;
> > + return ret;
> > +}
> > +#else
> > +static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
> > +#endif /* CONFIG_NO_HZ_COMMON */
> > +
> > +static void sugov_update_commit(struct sugov_cpu *sg_cpu,
> > + struct sugov_policy *sg_policy,
> > + u64 time, unsigned int next_freq)
> > {
> > struct cpufreq_policy *policy = sg_policy->policy;
> >
> > + if (sugov_cpu_is_busy(sg_cpu) && next_freq < sg_policy->next_freq)
> > + next_freq = sg_policy->next_freq;
> > +
> > if (policy->fast_switch_enabled) {
> > if (sg_policy->next_freq == next_freq) {
> > trace_cpu_frequency(policy->cur, smp_processor_id());
> > @@ -214,7 +234,7 @@ static void sugov_update_single(struct u
> > sugov_iowait_boost(sg_cpu, &util, &max);
> > next_f = get_next_freq(sg_policy, util, max);
> > }
> > - sugov_update_commit(sg_policy, time, next_f);
> > + sugov_update_commit(sg_cpu, sg_policy, time, next_f);
> > }
> >
[...]
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 6:40 ` Viresh Kumar
@ 2017-03-21 12:30 ` Rafael J. Wysocki
0 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-21 12:30 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael J. Wysocki, Linux PM, LKML, Peter Zijlstra,
Srinivas Pandruvada, Juri Lelli, Vincent Guittot,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar
On Tue, Mar 21, 2017 at 7:40 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 20-03-17, 22:46, Rafael J. Wysocki wrote:
>> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
>
>> +static void sugov_update_commit(struct sugov_cpu *sg_cpu,
>> + struct sugov_policy *sg_policy,
>> + u64 time, unsigned int next_freq)
>> {
>> struct cpufreq_policy *policy = sg_policy->policy;
>>
>> + if (sugov_cpu_is_busy(sg_cpu) && next_freq < sg_policy->next_freq)
>> + next_freq = sg_policy->next_freq;
>> +
>
> In the earlier version you said that we want to be opportunistic and
> don't want to do heavy computation and so check only for current CPU.
>
> But in this version, all those computations are already done by now.
> Why shouldn't we check all CPUs in the policy now? I am asking as we
> will still have the same problem, we are trying to work-around if the
> current CPU isn't busy but others sharing the policy are.
This isn't the way I'm looking at that.
This is an easy (and relatively cheap) check to make for the *current*
*CPU* and our frequency selection algorithm turns out to have
problems, so it would be kind of unreasonable to not use the
opportunity to fix up the value coming from it - if we can do that
easily enough.
For the other CPUs in the policy that would require extra
synchronization etc., so not that easy any more.
> Also, why not return directly from within the if block? To run
> trace_cpu_frequency()?
Yes.
> I don't remember exactly, but why don't we run that for !fast-switch
> case?
That's an interesting question.
We do that in the fast switch case, because otherwise utilities get
confused if the frequency is not updated for a long enough time.
I'm not really sure why they don't get confused in the other case,
though. [In that case the core calls trace_cpu_frequency() for us,
but only if we actually run the async work.]
It looks like it wouldn't hurt to always run trace_cpu_frequency()
when we want to bail out early for next_freq == sg_policy->next_freq.
Let me prepare a patch for that. :-)
> We can simplify the code a bit if we check for no freq change at
> the top of the routine.
Right.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 8:50 ` Vincent Guittot
2017-03-21 11:56 ` Patrick Bellasi
@ 2017-03-21 13:22 ` Peter Zijlstra
2017-03-21 13:37 ` Vincent Guittot
1 sibling, 1 reply; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-21 13:22 UTC (permalink / raw)
To: Vincent Guittot
Cc: Rafael J. Wysocki, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tue, Mar 21, 2017 at 09:50:28AM +0100, Vincent Guittot wrote:
> On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > To work around this issue use the observation that, from the
> > schedutil governor's perspective, it does not make sense to decrease
> > the frequency of a CPU that doesn't enter idle and avoid decreasing
> > the frequency of busy CPUs.
>
> I don't fully agree with that statement.
> If there are 2 runnable tasks on CPU A and scheduler migrates the
> waiting task to another CPU B so CPU A is less loaded now, it makes
> sense to reduce the OPP. That's even for that purpose that we have
> decided to use scheduler metrics in cpufreq governor so we can adjust
> OPP immediately when tasks migrate.
> That being said, i probably know why you see such OPP switches in your
> use case. When we migrate a task, we also migrate/remove its
> utilization from CPU.
> If the CPU is not overloaded, it means that runnable tasks have all
> computation that they need and don't have any reason to use more when
> a task migrates to another CPU. so decreasing the OPP makes sense
> because the utilzation is decreasing
> If the CPU is overloaded, it means that runnable tasks have to share
> CPU time and probably don't have all computations that they would like
> so when a task migrate, the remaining tasks on the CPU will increase
> their utilization and fill space left by the task that has just
> migrated. So the CPU's utilization will decrease when a task migrates
> (and as a result the OPP) but then its utilization will increase with
> remaining tasks running more time as well as the OPP
>
> So you need to make the difference between this 2 cases: Is a CPU
> overloaded or not. You can't really rely on the utilization to detect
> that but you could take advantage of the load which take into account
> the waiting time of tasks
I'm confused. What two cases? You only list the overloaded case, but he
does exactly that. Note that the lack of idle time is an exact
equivalent of 100% utilized.
So even while we cannot currently detect the 100% utilized state through
the running state tracking; because averages etc.. we can detect the
lack of idle time.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 13:22 ` Peter Zijlstra
@ 2017-03-21 13:37 ` Vincent Guittot
2017-03-21 14:03 ` Peter Zijlstra
2017-03-21 14:26 ` Rafael J. Wysocki
0 siblings, 2 replies; 71+ messages in thread
From: Vincent Guittot @ 2017-03-21 13:37 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 21, 2017 at 09:50:28AM +0100, Vincent Guittot wrote:
>> On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
>> > To work around this issue use the observation that, from the
>> > schedutil governor's perspective, it does not make sense to decrease
>> > the frequency of a CPU that doesn't enter idle and avoid decreasing
>> > the frequency of busy CPUs.
>>
>> I don't fully agree with that statement.
>> If there are 2 runnable tasks on CPU A and scheduler migrates the
>> waiting task to another CPU B so CPU A is less loaded now, it makes
>> sense to reduce the OPP. That's even for that purpose that we have
>> decided to use scheduler metrics in cpufreq governor so we can adjust
>> OPP immediately when tasks migrate.
>> That being said, i probably know why you see such OPP switches in your
>> use case. When we migrate a task, we also migrate/remove its
>> utilization from CPU.
>> If the CPU is not overloaded, it means that runnable tasks have all
>> computation that they need and don't have any reason to use more when
>> a task migrates to another CPU. so decreasing the OPP makes sense
>> because the utilzation is decreasing
>> If the CPU is overloaded, it means that runnable tasks have to share
>> CPU time and probably don't have all computations that they would like
>> so when a task migrate, the remaining tasks on the CPU will increase
>> their utilization and fill space left by the task that has just
>> migrated. So the CPU's utilization will decrease when a task migrates
>> (and as a result the OPP) but then its utilization will increase with
>> remaining tasks running more time as well as the OPP
>>
>> So you need to make the difference between this 2 cases: Is a CPU
>> overloaded or not. You can't really rely on the utilization to detect
>> that but you could take advantage of the load which take into account
>> the waiting time of tasks
>
> I'm confused. What two cases? You only list the overloaded case, but he
overloaded vs not overloaded use case.
For the not overloaded case, it makes sense to immediately update to
OPP to be aligned with the new utilization of the CPU even if it was
not idle in the past couple of ticks
> does exactly that. Note that the lack of idle time is an exact
> equivalent of 100% utilized.
>
> So even while we cannot currently detect the 100% utilized state through
> the running state tracking; because averages etc.. we can detect the
> lack of idle time.
But after how much lack of idle time do we consider that we are overloaded ?
>
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 13:37 ` Vincent Guittot
@ 2017-03-21 14:03 ` Peter Zijlstra
2017-03-21 14:18 ` Vincent Guittot
` (2 more replies)
2017-03-21 14:26 ` Rafael J. Wysocki
1 sibling, 3 replies; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-21 14:03 UTC (permalink / raw)
To: Vincent Guittot
Cc: Rafael J. Wysocki, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tue, Mar 21, 2017 at 02:37:08PM +0100, Vincent Guittot wrote:
> On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> For the not overloaded case, it makes sense to immediately update to
> OPP to be aligned with the new utilization of the CPU even if it was
> not idle in the past couple of ticks
Yeah, but we cannot know. Also, who cares?
> > does exactly that. Note that the lack of idle time is an exact
> > equivalent of 100% utilized.
> >
> > So even while we cannot currently detect the 100% utilized state through
> > the running state tracking; because averages etc.. we can detect the
> > lack of idle time.
>
> But after how much lack of idle time do we consider that we are overloaded ?
0 :-)
Note that utilization is an absolute metric, not a windowed one. That
is, there is no actual time associated with it. Now, for practical
purposes we end up using windowed things in many places,
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 14:03 ` Peter Zijlstra
@ 2017-03-21 14:18 ` Vincent Guittot
2017-03-21 14:25 ` Patrick Bellasi
[not found] ` <CAKfTPtALorn7HNpz4LOfWWSc3u+9y5iHB5byzfTHGQXDA+tVJQ@mail.gmail.com>
2 siblings, 0 replies; 71+ messages in thread
From: Vincent Guittot @ 2017-03-21 14:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 21 March 2017 at 15:03, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 21, 2017 at 02:37:08PM +0100, Vincent Guittot wrote:
>> On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
>
>> For the not overloaded case, it makes sense to immediately update to
>> OPP to be aligned with the new utilization of the CPU even if it was
>> not idle in the past couple of ticks
>
> Yeah, but we cannot know. Also, who cares?
embedded system that doesn't want to stay at higest OPP if significant
part of the utilzation has moved away as an example
AFAICT, schedutil tries to select the best OPP according to the
current utilization of the CPU so if the utilization decreases, the
OPP should also decrease
>
>> > does exactly that. Note that the lack of idle time is an exact
>> > equivalent of 100% utilized.
>> >
>> > So even while we cannot currently detect the 100% utilized state through
>> > the running state tracking; because averages etc.. we can detect the
>> > lack of idle time.
>>
>> But after how much lack of idle time do we consider that we are overloaded ?
>
> 0 :-)
>
> Note that utilization is an absolute metric, not a windowed one. That
> is, there is no actual time associated with it. Now, for practical
> purposes we end up using windowed things in many places,
>
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 14:03 ` Peter Zijlstra
2017-03-21 14:18 ` Vincent Guittot
@ 2017-03-21 14:25 ` Patrick Bellasi
[not found] ` <CAKfTPtALorn7HNpz4LOfWWSc3u+9y5iHB5byzfTHGQXDA+tVJQ@mail.gmail.com>
2 siblings, 0 replies; 71+ messages in thread
From: Patrick Bellasi @ 2017-03-21 14:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Vincent Guittot, Rafael J. Wysocki, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 21-Mar 15:03, Peter Zijlstra wrote:
> On Tue, Mar 21, 2017 at 02:37:08PM +0100, Vincent Guittot wrote:
> > On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
>
> > For the not overloaded case, it makes sense to immediately update to
> > OPP to be aligned with the new utilization of the CPU even if it was
> > not idle in the past couple of ticks
>
> Yeah, but we cannot know. Also, who cares?
>
> > > does exactly that. Note that the lack of idle time is an exact
> > > equivalent of 100% utilized.
> > >
> > > So even while we cannot currently detect the 100% utilized state through
> > > the running state tracking; because averages etc.. we can detect the
> > > lack of idle time.
> >
> > But after how much lack of idle time do we consider that we are overloaded ?
>
> 0 :-)
If we should use "utilization" this time can be non 0 and it depends
for example on how long PELT takes to build up a utilization value
which marks the CPU as "overutilized"... thus we already have a
suitable time at least for CFS tasks.
> Note that utilization is an absolute metric, not a windowed one. That
> is, there is no actual time associated with it. Now, for practical
> purposes we end up using windowed things in many places,
>
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 13:37 ` Vincent Guittot
2017-03-21 14:03 ` Peter Zijlstra
@ 2017-03-21 14:26 ` Rafael J. Wysocki
2017-03-21 14:38 ` Patrick Bellasi
2017-03-21 15:02 ` Peter Zijlstra
1 sibling, 2 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-21 14:26 UTC (permalink / raw)
To: Vincent Guittot, Peter Zijlstra
Cc: Linux PM, LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar
On Tuesday, March 21, 2017 02:37:08 PM Vincent Guittot wrote:
> On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, Mar 21, 2017 at 09:50:28AM +0100, Vincent Guittot wrote:
> >> On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> >> > To work around this issue use the observation that, from the
> >> > schedutil governor's perspective, it does not make sense to decrease
> >> > the frequency of a CPU that doesn't enter idle and avoid decreasing
> >> > the frequency of busy CPUs.
> >>
> >> I don't fully agree with that statement.
> >> If there are 2 runnable tasks on CPU A and scheduler migrates the
> >> waiting task to another CPU B so CPU A is less loaded now, it makes
> >> sense to reduce the OPP. That's even for that purpose that we have
> >> decided to use scheduler metrics in cpufreq governor so we can adjust
> >> OPP immediately when tasks migrate.
> >> That being said, i probably know why you see such OPP switches in your
> >> use case. When we migrate a task, we also migrate/remove its
> >> utilization from CPU.
> >> If the CPU is not overloaded, it means that runnable tasks have all
> >> computation that they need and don't have any reason to use more when
> >> a task migrates to another CPU. so decreasing the OPP makes sense
> >> because the utilzation is decreasing
> >> If the CPU is overloaded, it means that runnable tasks have to share
> >> CPU time and probably don't have all computations that they would like
> >> so when a task migrate, the remaining tasks on the CPU will increase
> >> their utilization and fill space left by the task that has just
> >> migrated. So the CPU's utilization will decrease when a task migrates
> >> (and as a result the OPP) but then its utilization will increase with
> >> remaining tasks running more time as well as the OPP
> >>
> >> So you need to make the difference between this 2 cases: Is a CPU
> >> overloaded or not. You can't really rely on the utilization to detect
> >> that but you could take advantage of the load which take into account
> >> the waiting time of tasks
> >
> > I'm confused. What two cases? You only list the overloaded case, but he
>
> overloaded vs not overloaded use case.
> For the not overloaded case, it makes sense to immediately update to
> OPP to be aligned with the new utilization of the CPU even if it was
> not idle in the past couple of ticks
Yes, if the OPP (or P-state if you will) can be changed immediately. If it can't,
conditions may change by the time we actually update it and in that case It'd
be better to wait and see IMO.
In any case, the theory about migrating tasks made sense to me, so below is
what I tested. It works, and besides it has a nice feature that I don't need
to fetch for the timekeeping data. :-)
I only wonder if we want to do this or only prevent the frequency from
decreasing in the overloaded case?
---
kernel/sched/cpufreq_schedutil.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -61,6 +61,7 @@ struct sugov_cpu {
unsigned long util;
unsigned long max;
unsigned int flags;
+ bool overload;
};
static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
@@ -207,7 +208,7 @@ static void sugov_update_single(struct u
if (!sugov_should_update_freq(sg_policy, time))
return;
- if (flags & SCHED_CPUFREQ_RT_DL) {
+ if ((flags & SCHED_CPUFREQ_RT_DL) || this_rq()->rd->overload) {
next_f = policy->cpuinfo.max_freq;
} else {
sugov_get_util(&util, &max);
@@ -242,7 +243,7 @@ static unsigned int sugov_next_freq_shar
j_sg_cpu->iowait_boost = 0;
continue;
}
- if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
+ if ((j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL) || j_sg_cpu->overload)
return policy->cpuinfo.max_freq;
j_util = j_sg_cpu->util;
@@ -273,12 +274,13 @@ static void sugov_update_shared(struct u
sg_cpu->util = util;
sg_cpu->max = max;
sg_cpu->flags = flags;
+ sg_cpu->overload = this_rq()->rd->overload;
sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
if (sugov_should_update_freq(sg_policy, time)) {
- if (flags & SCHED_CPUFREQ_RT_DL)
+ if ((flags & SCHED_CPUFREQ_RT_DL) || sg_cpu->overload)
next_f = sg_policy->policy->cpuinfo.max_freq;
else
next_f = sugov_next_freq_shared(sg_cpu);
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 14:26 ` Rafael J. Wysocki
@ 2017-03-21 14:38 ` Patrick Bellasi
2017-03-21 14:46 ` Rafael J. Wysocki
2017-03-21 15:02 ` Peter Zijlstra
1 sibling, 1 reply; 71+ messages in thread
From: Patrick Bellasi @ 2017-03-21 14:38 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Vincent Guittot, Peter Zijlstra, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 21-Mar 15:26, Rafael J. Wysocki wrote:
> On Tuesday, March 21, 2017 02:37:08 PM Vincent Guittot wrote:
> > On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Tue, Mar 21, 2017 at 09:50:28AM +0100, Vincent Guittot wrote:
> > >> On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > >> > To work around this issue use the observation that, from the
> > >> > schedutil governor's perspective, it does not make sense to decrease
> > >> > the frequency of a CPU that doesn't enter idle and avoid decreasing
> > >> > the frequency of busy CPUs.
> > >>
> > >> I don't fully agree with that statement.
> > >> If there are 2 runnable tasks on CPU A and scheduler migrates the
> > >> waiting task to another CPU B so CPU A is less loaded now, it makes
> > >> sense to reduce the OPP. That's even for that purpose that we have
> > >> decided to use scheduler metrics in cpufreq governor so we can adjust
> > >> OPP immediately when tasks migrate.
> > >> That being said, i probably know why you see such OPP switches in your
> > >> use case. When we migrate a task, we also migrate/remove its
> > >> utilization from CPU.
> > >> If the CPU is not overloaded, it means that runnable tasks have all
> > >> computation that they need and don't have any reason to use more when
> > >> a task migrates to another CPU. so decreasing the OPP makes sense
> > >> because the utilzation is decreasing
> > >> If the CPU is overloaded, it means that runnable tasks have to share
> > >> CPU time and probably don't have all computations that they would like
> > >> so when a task migrate, the remaining tasks on the CPU will increase
> > >> their utilization and fill space left by the task that has just
> > >> migrated. So the CPU's utilization will decrease when a task migrates
> > >> (and as a result the OPP) but then its utilization will increase with
> > >> remaining tasks running more time as well as the OPP
> > >>
> > >> So you need to make the difference between this 2 cases: Is a CPU
> > >> overloaded or not. You can't really rely on the utilization to detect
> > >> that but you could take advantage of the load which take into account
> > >> the waiting time of tasks
> > >
> > > I'm confused. What two cases? You only list the overloaded case, but he
> >
> > overloaded vs not overloaded use case.
> > For the not overloaded case, it makes sense to immediately update to
> > OPP to be aligned with the new utilization of the CPU even if it was
> > not idle in the past couple of ticks
>
> Yes, if the OPP (or P-state if you will) can be changed immediately. If it can't,
> conditions may change by the time we actually update it and in that case It'd
> be better to wait and see IMO.
>
> In any case, the theory about migrating tasks made sense to me, so below is
> what I tested. It works, and besides it has a nice feature that I don't need
> to fetch for the timekeeping data. :-)
>
> I only wonder if we want to do this or only prevent the frequency from
> decreasing in the overloaded case?
>
> ---
> kernel/sched/cpufreq_schedutil.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -61,6 +61,7 @@ struct sugov_cpu {
> unsigned long util;
> unsigned long max;
> unsigned int flags;
> + bool overload;
> };
>
> static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> @@ -207,7 +208,7 @@ static void sugov_update_single(struct u
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> - if (flags & SCHED_CPUFREQ_RT_DL) {
> + if ((flags & SCHED_CPUFREQ_RT_DL) || this_rq()->rd->overload) {
> next_f = policy->cpuinfo.max_freq;
Isn't this going to max OPP every time we have more than 1 task in
that CPU?
In that case it will not fit the case: we have two 10% tasks on that CPU.
Previous solution was better IMO, apart from using overloaded instead
of overutilized (which is not yet there) :-/
> } else {
> sugov_get_util(&util, &max);
> @@ -242,7 +243,7 @@ static unsigned int sugov_next_freq_shar
> j_sg_cpu->iowait_boost = 0;
> continue;
> }
> - if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
> + if ((j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL) || j_sg_cpu->overload)
> return policy->cpuinfo.max_freq;
>
> j_util = j_sg_cpu->util;
> @@ -273,12 +274,13 @@ static void sugov_update_shared(struct u
> sg_cpu->util = util;
> sg_cpu->max = max;
> sg_cpu->flags = flags;
> + sg_cpu->overload = this_rq()->rd->overload;
>
> sugov_set_iowait_boost(sg_cpu, time, flags);
> sg_cpu->last_update = time;
>
> if (sugov_should_update_freq(sg_policy, time)) {
> - if (flags & SCHED_CPUFREQ_RT_DL)
> + if ((flags & SCHED_CPUFREQ_RT_DL) || sg_cpu->overload)
> next_f = sg_policy->policy->cpuinfo.max_freq;
> else
> next_f = sugov_next_freq_shared(sg_cpu);
>
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 14:38 ` Patrick Bellasi
@ 2017-03-21 14:46 ` Rafael J. Wysocki
2017-03-21 14:50 ` Rafael J. Wysocki
` (2 more replies)
0 siblings, 3 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-21 14:46 UTC (permalink / raw)
To: Patrick Bellasi
Cc: Vincent Guittot, Peter Zijlstra, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tuesday, March 21, 2017 02:38:42 PM Patrick Bellasi wrote:
> On 21-Mar 15:26, Rafael J. Wysocki wrote:
> > On Tuesday, March 21, 2017 02:37:08 PM Vincent Guittot wrote:
> > > On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > On Tue, Mar 21, 2017 at 09:50:28AM +0100, Vincent Guittot wrote:
> > > >> On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > >> > To work around this issue use the observation that, from the
> > > >> > schedutil governor's perspective, it does not make sense to decrease
> > > >> > the frequency of a CPU that doesn't enter idle and avoid decreasing
> > > >> > the frequency of busy CPUs.
> > > >>
> > > >> I don't fully agree with that statement.
> > > >> If there are 2 runnable tasks on CPU A and scheduler migrates the
> > > >> waiting task to another CPU B so CPU A is less loaded now, it makes
> > > >> sense to reduce the OPP. That's even for that purpose that we have
> > > >> decided to use scheduler metrics in cpufreq governor so we can adjust
> > > >> OPP immediately when tasks migrate.
> > > >> That being said, i probably know why you see such OPP switches in your
> > > >> use case. When we migrate a task, we also migrate/remove its
> > > >> utilization from CPU.
> > > >> If the CPU is not overloaded, it means that runnable tasks have all
> > > >> computation that they need and don't have any reason to use more when
> > > >> a task migrates to another CPU. so decreasing the OPP makes sense
> > > >> because the utilzation is decreasing
> > > >> If the CPU is overloaded, it means that runnable tasks have to share
> > > >> CPU time and probably don't have all computations that they would like
> > > >> so when a task migrate, the remaining tasks on the CPU will increase
> > > >> their utilization and fill space left by the task that has just
> > > >> migrated. So the CPU's utilization will decrease when a task migrates
> > > >> (and as a result the OPP) but then its utilization will increase with
> > > >> remaining tasks running more time as well as the OPP
> > > >>
> > > >> So you need to make the difference between this 2 cases: Is a CPU
> > > >> overloaded or not. You can't really rely on the utilization to detect
> > > >> that but you could take advantage of the load which take into account
> > > >> the waiting time of tasks
> > > >
> > > > I'm confused. What two cases? You only list the overloaded case, but he
> > >
> > > overloaded vs not overloaded use case.
> > > For the not overloaded case, it makes sense to immediately update to
> > > OPP to be aligned with the new utilization of the CPU even if it was
> > > not idle in the past couple of ticks
> >
> > Yes, if the OPP (or P-state if you will) can be changed immediately. If it can't,
> > conditions may change by the time we actually update it and in that case It'd
> > be better to wait and see IMO.
> >
> > In any case, the theory about migrating tasks made sense to me, so below is
> > what I tested. It works, and besides it has a nice feature that I don't need
> > to fetch for the timekeeping data. :-)
> >
> > I only wonder if we want to do this or only prevent the frequency from
> > decreasing in the overloaded case?
> >
> > ---
> > kernel/sched/cpufreq_schedutil.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> > ===================================================================
> > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> > +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> > @@ -61,6 +61,7 @@ struct sugov_cpu {
> > unsigned long util;
> > unsigned long max;
> > unsigned int flags;
> > + bool overload;
> > };
> >
> > static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> > @@ -207,7 +208,7 @@ static void sugov_update_single(struct u
> > if (!sugov_should_update_freq(sg_policy, time))
> > return;
> >
> > - if (flags & SCHED_CPUFREQ_RT_DL) {
> > + if ((flags & SCHED_CPUFREQ_RT_DL) || this_rq()->rd->overload) {
> > next_f = policy->cpuinfo.max_freq;
>
> Isn't this going to max OPP every time we have more than 1 task in
> that CPU?
>
> In that case it will not fit the case: we have two 10% tasks on that CPU.
Good point.
> Previous solution was better IMO, apart from using overloaded instead
> of overutilized (which is not yet there) :-/
OK, so the one below works too.
---
kernel/sched/cpufreq_schedutil.c | 11 +++++++++++
1 file changed, 11 insertions(+)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -37,6 +37,7 @@ struct sugov_policy {
s64 freq_update_delay_ns;
unsigned int next_freq;
unsigned int cached_raw_freq;
+ bool overload;
/* The next fields are only needed if fast switch cannot be used. */
struct irq_work irq_work;
@@ -61,6 +62,7 @@ struct sugov_cpu {
unsigned long util;
unsigned long max;
unsigned int flags;
+ bool overload;
};
static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
@@ -93,6 +95,9 @@ static void sugov_update_commit(struct s
{
struct cpufreq_policy *policy = sg_policy->policy;
+ if (sg_policy->overload && next_freq < sg_policy->next_freq)
+ next_freq = sg_policy->next_freq;
+
if (policy->fast_switch_enabled) {
if (sg_policy->next_freq == next_freq) {
trace_cpu_frequency(policy->cur, smp_processor_id());
@@ -207,6 +212,8 @@ static void sugov_update_single(struct u
if (!sugov_should_update_freq(sg_policy, time))
return;
+ sg_policy->overload = this_rq()->rd->overload;
+
if (flags & SCHED_CPUFREQ_RT_DL) {
next_f = policy->cpuinfo.max_freq;
} else {
@@ -225,6 +232,8 @@ static unsigned int sugov_next_freq_shar
unsigned long util = 0, max = 1;
unsigned int j;
+ sg_policy->overload = false;
+
for_each_cpu(j, policy->cpus) {
struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
unsigned long j_util, j_max;
@@ -253,6 +262,7 @@ static unsigned int sugov_next_freq_shar
}
sugov_iowait_boost(j_sg_cpu, &util, &max);
+ sg_policy->overload = sg_policy->overload || sg_cpu->overload;
}
return get_next_freq(sg_policy, util, max);
@@ -273,6 +283,7 @@ static void sugov_update_shared(struct u
sg_cpu->util = util;
sg_cpu->max = max;
sg_cpu->flags = flags;
+ sg_cpu->overload = this_rq()->rd->overload;
sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 14:46 ` Rafael J. Wysocki
@ 2017-03-21 14:50 ` Rafael J. Wysocki
2017-03-21 15:04 ` Peter Zijlstra
2017-03-21 15:08 ` Patrick Bellasi
2 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-21 14:50 UTC (permalink / raw)
To: Patrick Bellasi
Cc: Vincent Guittot, Peter Zijlstra, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tuesday, March 21, 2017 03:46:07 PM Rafael J. Wysocki wrote:
> On Tuesday, March 21, 2017 02:38:42 PM Patrick Bellasi wrote:
> > On 21-Mar 15:26, Rafael J. Wysocki wrote:
> > > On Tuesday, March 21, 2017 02:37:08 PM Vincent Guittot wrote:
> > > > On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > On Tue, Mar 21, 2017 at 09:50:28AM +0100, Vincent Guittot wrote:
> > > > >> On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >
> > > > >> > To work around this issue use the observation that, from the
> > > > >> > schedutil governor's perspective, it does not make sense to decrease
> > > > >> > the frequency of a CPU that doesn't enter idle and avoid decreasing
> > > > >> > the frequency of busy CPUs.
> > > > >>
> > > > >> I don't fully agree with that statement.
> > > > >> If there are 2 runnable tasks on CPU A and scheduler migrates the
> > > > >> waiting task to another CPU B so CPU A is less loaded now, it makes
> > > > >> sense to reduce the OPP. That's even for that purpose that we have
> > > > >> decided to use scheduler metrics in cpufreq governor so we can adjust
> > > > >> OPP immediately when tasks migrate.
> > > > >> That being said, i probably know why you see such OPP switches in your
> > > > >> use case. When we migrate a task, we also migrate/remove its
> > > > >> utilization from CPU.
> > > > >> If the CPU is not overloaded, it means that runnable tasks have all
> > > > >> computation that they need and don't have any reason to use more when
> > > > >> a task migrates to another CPU. so decreasing the OPP makes sense
> > > > >> because the utilzation is decreasing
> > > > >> If the CPU is overloaded, it means that runnable tasks have to share
> > > > >> CPU time and probably don't have all computations that they would like
> > > > >> so when a task migrate, the remaining tasks on the CPU will increase
> > > > >> their utilization and fill space left by the task that has just
> > > > >> migrated. So the CPU's utilization will decrease when a task migrates
> > > > >> (and as a result the OPP) but then its utilization will increase with
> > > > >> remaining tasks running more time as well as the OPP
> > > > >>
> > > > >> So you need to make the difference between this 2 cases: Is a CPU
> > > > >> overloaded or not. You can't really rely on the utilization to detect
> > > > >> that but you could take advantage of the load which take into account
> > > > >> the waiting time of tasks
> > > > >
> > > > > I'm confused. What two cases? You only list the overloaded case, but he
> > > >
> > > > overloaded vs not overloaded use case.
> > > > For the not overloaded case, it makes sense to immediately update to
> > > > OPP to be aligned with the new utilization of the CPU even if it was
> > > > not idle in the past couple of ticks
> > >
> > > Yes, if the OPP (or P-state if you will) can be changed immediately. If it can't,
> > > conditions may change by the time we actually update it and in that case It'd
> > > be better to wait and see IMO.
> > >
> > > In any case, the theory about migrating tasks made sense to me, so below is
> > > what I tested. It works, and besides it has a nice feature that I don't need
> > > to fetch for the timekeeping data. :-)
> > >
> > > I only wonder if we want to do this or only prevent the frequency from
> > > decreasing in the overloaded case?
> > >
> > > ---
> > > kernel/sched/cpufreq_schedutil.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> > > ===================================================================
> > > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> > > +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> > > @@ -61,6 +61,7 @@ struct sugov_cpu {
> > > unsigned long util;
> > > unsigned long max;
> > > unsigned int flags;
> > > + bool overload;
> > > };
> > >
> > > static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> > > @@ -207,7 +208,7 @@ static void sugov_update_single(struct u
> > > if (!sugov_should_update_freq(sg_policy, time))
> > > return;
> > >
> > > - if (flags & SCHED_CPUFREQ_RT_DL) {
> > > + if ((flags & SCHED_CPUFREQ_RT_DL) || this_rq()->rd->overload) {
> > > next_f = policy->cpuinfo.max_freq;
> >
> > Isn't this going to max OPP every time we have more than 1 task in
> > that CPU?
> >
> > In that case it will not fit the case: we have two 10% tasks on that CPU.
>
> Good point.
>
> > Previous solution was better IMO, apart from using overloaded instead
> > of overutilized (which is not yet there) :-/
>
> OK, so the one below works too.
Admittedly, we could check the idle condition and the overload flag at the same
time, though.
Let me try that too.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
[not found] ` <CAKfTPtALorn7HNpz4LOfWWSc3u+9y5iHB5byzfTHGQXDA+tVJQ@mail.gmail.com>
@ 2017-03-21 14:58 ` Peter Zijlstra
2017-03-21 17:00 ` Vincent Guittot
0 siblings, 1 reply; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-21 14:58 UTC (permalink / raw)
To: Vincent Guittot
Cc: Rafael J. Wysocki, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tue, Mar 21, 2017 at 03:16:19PM +0100, Vincent Guittot wrote:
> On 21 March 2017 at 15:03, Peter Zijlstra <peterz@infradead.org> wrote:
>
> > On Tue, Mar 21, 2017 at 02:37:08PM +0100, Vincent Guittot wrote:
> > > On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > For the not overloaded case, it makes sense to immediately update to
> > > OPP to be aligned with the new utilization of the CPU even if it was
> > > not idle in the past couple of ticks
> >
> > Yeah, but we cannot know. Also, who cares?
> >
>
> embedded system that doesn't want to stay at higest OPP if significant part
> of the utilzation has moved away as an example
> AFAICT, schedutil tries to select the best OPP according to the current
> utilization of the CPU so if the utilization decreases, the OPP should also
> decrease
Sure I get that; but given the lack of crystal ball instructions we
cannot know if this is the case or not.
And if we really dropped below 100% utilization, we should hit idle
fairly soon.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 14:26 ` Rafael J. Wysocki
2017-03-21 14:38 ` Patrick Bellasi
@ 2017-03-21 15:02 ` Peter Zijlstra
1 sibling, 0 replies; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-21 15:02 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Vincent Guittot, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tue, Mar 21, 2017 at 03:26:06PM +0100, Rafael J. Wysocki wrote:
> + if ((flags & SCHED_CPUFREQ_RT_DL) || this_rq()->rd->overload) {
> next_f = policy->cpuinfo.max_freq;
So this I think is wrong; rd->overload is set if _any_ of the CPUs in the
root domain is overloaded. And given the root domain is typically the
_entire_ machine, this would have a tendency to run at max_freq far too
often.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 14:46 ` Rafael J. Wysocki
2017-03-21 14:50 ` Rafael J. Wysocki
@ 2017-03-21 15:04 ` Peter Zijlstra
2017-03-21 15:18 ` Rafael J. Wysocki
2017-03-21 15:08 ` Patrick Bellasi
2 siblings, 1 reply; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-21 15:04 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Patrick Bellasi, Vincent Guittot, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tue, Mar 21, 2017 at 03:46:07PM +0100, Rafael J. Wysocki wrote:
> @@ -207,6 +212,8 @@ static void sugov_update_single(struct u
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> + sg_policy->overload = this_rq()->rd->overload;
> +
Same problem as before; rd->overload is set if _any_ CPU in the root
domain has more than 1 runnable task at a random point in history (when
we ran the load balance tick -- and since that is the same tick used for
timers, there's a bias to over-account there).
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 14:46 ` Rafael J. Wysocki
2017-03-21 14:50 ` Rafael J. Wysocki
2017-03-21 15:04 ` Peter Zijlstra
@ 2017-03-21 15:08 ` Patrick Bellasi
2017-03-21 15:18 ` Peter Zijlstra
2 siblings, 1 reply; 71+ messages in thread
From: Patrick Bellasi @ 2017-03-21 15:08 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Vincent Guittot, Peter Zijlstra, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 21-Mar 15:46, Rafael J. Wysocki wrote:
> On Tuesday, March 21, 2017 02:38:42 PM Patrick Bellasi wrote:
> > On 21-Mar 15:26, Rafael J. Wysocki wrote:
> > > On Tuesday, March 21, 2017 02:37:08 PM Vincent Guittot wrote:
> > > > On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > On Tue, Mar 21, 2017 at 09:50:28AM +0100, Vincent Guittot wrote:
> > > > >> On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >
> > > > >> > To work around this issue use the observation that, from the
> > > > >> > schedutil governor's perspective, it does not make sense to decrease
> > > > >> > the frequency of a CPU that doesn't enter idle and avoid decreasing
> > > > >> > the frequency of busy CPUs.
> > > > >>
> > > > >> I don't fully agree with that statement.
> > > > >> If there are 2 runnable tasks on CPU A and scheduler migrates the
> > > > >> waiting task to another CPU B so CPU A is less loaded now, it makes
> > > > >> sense to reduce the OPP. That's even for that purpose that we have
> > > > >> decided to use scheduler metrics in cpufreq governor so we can adjust
> > > > >> OPP immediately when tasks migrate.
> > > > >> That being said, i probably know why you see such OPP switches in your
> > > > >> use case. When we migrate a task, we also migrate/remove its
> > > > >> utilization from CPU.
> > > > >> If the CPU is not overloaded, it means that runnable tasks have all
> > > > >> computation that they need and don't have any reason to use more when
> > > > >> a task migrates to another CPU. so decreasing the OPP makes sense
> > > > >> because the utilzation is decreasing
> > > > >> If the CPU is overloaded, it means that runnable tasks have to share
> > > > >> CPU time and probably don't have all computations that they would like
> > > > >> so when a task migrate, the remaining tasks on the CPU will increase
> > > > >> their utilization and fill space left by the task that has just
> > > > >> migrated. So the CPU's utilization will decrease when a task migrates
> > > > >> (and as a result the OPP) but then its utilization will increase with
> > > > >> remaining tasks running more time as well as the OPP
> > > > >>
> > > > >> So you need to make the difference between this 2 cases: Is a CPU
> > > > >> overloaded or not. You can't really rely on the utilization to detect
> > > > >> that but you could take advantage of the load which take into account
> > > > >> the waiting time of tasks
> > > > >
> > > > > I'm confused. What two cases? You only list the overloaded case, but he
> > > >
> > > > overloaded vs not overloaded use case.
> > > > For the not overloaded case, it makes sense to immediately update to
> > > > OPP to be aligned with the new utilization of the CPU even if it was
> > > > not idle in the past couple of ticks
> > >
> > > Yes, if the OPP (or P-state if you will) can be changed immediately. If it can't,
> > > conditions may change by the time we actually update it and in that case It'd
> > > be better to wait and see IMO.
> > >
> > > In any case, the theory about migrating tasks made sense to me, so below is
> > > what I tested. It works, and besides it has a nice feature that I don't need
> > > to fetch for the timekeeping data. :-)
> > >
> > > I only wonder if we want to do this or only prevent the frequency from
> > > decreasing in the overloaded case?
> > >
> > > ---
> > > kernel/sched/cpufreq_schedutil.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> > > ===================================================================
> > > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> > > +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> > > @@ -61,6 +61,7 @@ struct sugov_cpu {
> > > unsigned long util;
> > > unsigned long max;
> > > unsigned int flags;
> > > + bool overload;
> > > };
> > >
> > > static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> > > @@ -207,7 +208,7 @@ static void sugov_update_single(struct u
> > > if (!sugov_should_update_freq(sg_policy, time))
> > > return;
> > >
> > > - if (flags & SCHED_CPUFREQ_RT_DL) {
> > > + if ((flags & SCHED_CPUFREQ_RT_DL) || this_rq()->rd->overload) {
> > > next_f = policy->cpuinfo.max_freq;
> >
> > Isn't this going to max OPP every time we have more than 1 task in
> > that CPU?
> >
> > In that case it will not fit the case: we have two 10% tasks on that CPU.
>
> Good point.
>
> > Previous solution was better IMO, apart from using overloaded instead
> > of overutilized (which is not yet there) :-/
>
> OK, so the one below works too.
Better... just one minor comment.
> ---
> kernel/sched/cpufreq_schedutil.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -37,6 +37,7 @@ struct sugov_policy {
> s64 freq_update_delay_ns;
> unsigned int next_freq;
> unsigned int cached_raw_freq;
> + bool overload;
Can we avoid using "overloaded" in favor of a more generic and
schedutil specific name. Mainly because in the future we would
probably like to switch from "overloaded" to "overutilized".
What about something like: "busy" ?
>
> /* The next fields are only needed if fast switch cannot be used. */
> struct irq_work irq_work;
> @@ -61,6 +62,7 @@ struct sugov_cpu {
> unsigned long util;
> unsigned long max;
> unsigned int flags;
> + bool overload;
> };
>
> static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> @@ -93,6 +95,9 @@ static void sugov_update_commit(struct s
> {
> struct cpufreq_policy *policy = sg_policy->policy;
>
> + if (sg_policy->overload && next_freq < sg_policy->next_freq)
> + next_freq = sg_policy->next_freq;
> +
> if (policy->fast_switch_enabled) {
> if (sg_policy->next_freq == next_freq) {
> trace_cpu_frequency(policy->cur, smp_processor_id());
> @@ -207,6 +212,8 @@ static void sugov_update_single(struct u
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> + sg_policy->overload = this_rq()->rd->overload;
And than we can move this bit into an inline function, something like e.g.:
static inline bool sugov_this_cpu_is_busy()
{
return this_rq()->rd->overloaded
}
Where in future we can easily switch from usage of "overloaded" to
usage of "utilization".
> +
> if (flags & SCHED_CPUFREQ_RT_DL) {
> next_f = policy->cpuinfo.max_freq;
> } else {
> @@ -225,6 +232,8 @@ static unsigned int sugov_next_freq_shar
> unsigned long util = 0, max = 1;
> unsigned int j;
>
> + sg_policy->overload = false;
> +
> for_each_cpu(j, policy->cpus) {
> struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
> unsigned long j_util, j_max;
> @@ -253,6 +262,7 @@ static unsigned int sugov_next_freq_shar
> }
>
> sugov_iowait_boost(j_sg_cpu, &util, &max);
> + sg_policy->overload = sg_policy->overload || sg_cpu->overload;
> }
>
> return get_next_freq(sg_policy, util, max);
> @@ -273,6 +283,7 @@ static void sugov_update_shared(struct u
> sg_cpu->util = util;
> sg_cpu->max = max;
> sg_cpu->flags = flags;
> + sg_cpu->overload = this_rq()->rd->overload;
>
> sugov_set_iowait_boost(sg_cpu, time, flags);
> sg_cpu->last_update = time;
>
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 15:08 ` Patrick Bellasi
@ 2017-03-21 15:18 ` Peter Zijlstra
2017-03-21 19:28 ` Patrick Bellasi
0 siblings, 1 reply; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-21 15:18 UTC (permalink / raw)
To: Patrick Bellasi
Cc: Rafael J. Wysocki, Vincent Guittot, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
Seriously people, trim your replies.
On Tue, Mar 21, 2017 at 03:08:20PM +0000, Patrick Bellasi wrote:
> And than we can move this bit into an inline function, something like e.g.:
>
> static inline bool sugov_this_cpu_is_busy()
> {
> return this_rq()->rd->overloaded
> }
No, that's just entirely and utterly wrong. It being in rd means its
very much not about _this_ CPU in any way.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 15:04 ` Peter Zijlstra
@ 2017-03-21 15:18 ` Rafael J. Wysocki
2017-03-21 17:00 ` Peter Zijlstra
0 siblings, 1 reply; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-21 15:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Patrick Bellasi, Vincent Guittot, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tuesday, March 21, 2017 04:04:03 PM Peter Zijlstra wrote:
> On Tue, Mar 21, 2017 at 03:46:07PM +0100, Rafael J. Wysocki wrote:
> > @@ -207,6 +212,8 @@ static void sugov_update_single(struct u
> > if (!sugov_should_update_freq(sg_policy, time))
> > return;
> >
> > + sg_policy->overload = this_rq()->rd->overload;
> > +
>
> Same problem as before; rd->overload is set if _any_ CPU in the root
> domain has more than 1 runnable task at a random point in history (when
> we ran the load balance tick -- and since that is the same tick used for
> timers, there's a bias to over-account there).
OK
What about the one below then?
It checks both the idle calls count and overload and only then it will prevent
the frequency from being decreased.
It is sufficient for the case at hand.
I guess if rd->overload is not set, this means that none of the CPUs is
oversubscribed and we just saturate the capacity in a one-task-per-CPU kind
of fashion. Right?
---
include/linux/tick.h | 1 +
kernel/sched/cpufreq_schedutil.c | 27 +++++++++++++++++++++++++++
kernel/time/tick-sched.c | 12 ++++++++++++
3 files changed, 40 insertions(+)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -37,6 +37,7 @@ struct sugov_policy {
s64 freq_update_delay_ns;
unsigned int next_freq;
unsigned int cached_raw_freq;
+ bool busy;
/* The next fields are only needed if fast switch cannot be used. */
struct irq_work irq_work;
@@ -56,11 +57,15 @@ struct sugov_cpu {
unsigned long iowait_boost;
unsigned long iowait_boost_max;
u64 last_update;
+#ifdef CONFIG_NO_HZ_COMMON
+ unsigned long saved_idle_calls;
+#endif
/* The fields below are only needed when sharing a policy. */
unsigned long util;
unsigned long max;
unsigned int flags;
+ bool busy;
};
static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
@@ -93,6 +98,9 @@ static void sugov_update_commit(struct s
{
struct cpufreq_policy *policy = sg_policy->policy;
+ if (sg_policy->busy && next_freq < sg_policy->next_freq)
+ next_freq = sg_policy->next_freq;
+
if (policy->fast_switch_enabled) {
if (sg_policy->next_freq == next_freq) {
trace_cpu_frequency(policy->cur, smp_processor_id());
@@ -192,6 +200,19 @@ static void sugov_iowait_boost(struct su
sg_cpu->iowait_boost >>= 1;
}
+#ifdef CONFIG_NO_HZ_COMMON
+static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
+{
+ unsigned long idle_calls = tick_nohz_get_idle_calls();
+ bool not_idle = idle_calls == sg_cpu->saved_idle_calls;
+
+ sg_cpu->saved_idle_calls = idle_calls;
+ return not_idle && this_rq()->rd->overload;
+}
+#else
+static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
+#endif /* CONFIG_NO_HZ_COMMON */
+
static void sugov_update_single(struct update_util_data *hook, u64 time,
unsigned int flags)
{
@@ -207,6 +228,8 @@ static void sugov_update_single(struct u
if (!sugov_should_update_freq(sg_policy, time))
return;
+ sg_policy->busy = sugov_cpu_is_busy(sg_cpu);
+
if (flags & SCHED_CPUFREQ_RT_DL) {
next_f = policy->cpuinfo.max_freq;
} else {
@@ -225,6 +248,8 @@ static unsigned int sugov_next_freq_shar
unsigned long util = 0, max = 1;
unsigned int j;
+ sg_policy->busy = false;
+
for_each_cpu(j, policy->cpus) {
struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
unsigned long j_util, j_max;
@@ -253,6 +278,7 @@ static unsigned int sugov_next_freq_shar
}
sugov_iowait_boost(j_sg_cpu, &util, &max);
+ sg_policy->busy = sg_policy->busy || sg_cpu->busy;
}
return get_next_freq(sg_policy, util, max);
@@ -273,6 +299,7 @@ static void sugov_update_shared(struct u
sg_cpu->util = util;
sg_cpu->max = max;
sg_cpu->flags = flags;
+ sg_cpu->busy = sugov_cpu_is_busy(sg_cpu);
sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
Index: linux-pm/include/linux/tick.h
===================================================================
--- linux-pm.orig/include/linux/tick.h
+++ linux-pm/include/linux/tick.h
@@ -117,6 +117,7 @@ extern void tick_nohz_idle_enter(void);
extern void tick_nohz_idle_exit(void);
extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
+extern unsigned long tick_nohz_get_idle_calls(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
#else /* !CONFIG_NO_HZ_COMMON */
Index: linux-pm/kernel/time/tick-sched.c
===================================================================
--- linux-pm.orig/kernel/time/tick-sched.c
+++ linux-pm/kernel/time/tick-sched.c
@@ -993,6 +993,18 @@ ktime_t tick_nohz_get_sleep_length(void)
return ts->sleep_length;
}
+/**
+ * tick_nohz_get_idle_calls - return the current idle calls counter value
+ *
+ * Called from the schedutil frequency scaling governor in scheduler context.
+ */
+unsigned long tick_nohz_get_idle_calls(void)
+{
+ struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+
+ return ts->idle_calls;
+}
+
static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
{
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 15:18 ` Rafael J. Wysocki
@ 2017-03-21 17:00 ` Peter Zijlstra
2017-03-21 17:17 ` Rafael J. Wysocki
0 siblings, 1 reply; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-21 17:00 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Patrick Bellasi, Vincent Guittot, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tue, Mar 21, 2017 at 04:18:52PM +0100, Rafael J. Wysocki wrote:
> +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> +{
> + unsigned long idle_calls = tick_nohz_get_idle_calls();
> + bool not_idle = idle_calls == sg_cpu->saved_idle_calls;
> +
> + sg_cpu->saved_idle_calls = idle_calls;
> + return not_idle && this_rq()->rd->overload;
> +}
So I really don't understand the rd->overload thing. What is it supposed
to do here?
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 14:58 ` Peter Zijlstra
@ 2017-03-21 17:00 ` Vincent Guittot
2017-03-21 17:01 ` Vincent Guittot
0 siblings, 1 reply; 71+ messages in thread
From: Vincent Guittot @ 2017-03-21 17:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 21 March 2017 at 15:58, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Mar 21, 2017 at 03:16:19PM +0100, Vincent Guittot wrote:
> > On 21 March 2017 at 15:03, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > On Tue, Mar 21, 2017 at 02:37:08PM +0100, Vincent Guittot wrote:
> > > > On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > > For the not overloaded case, it makes sense to immediately update to
> > > > OPP to be aligned with the new utilization of the CPU even if it was
> > > > not idle in the past couple of ticks
> > >
> > > Yeah, but we cannot know. Also, who cares?
> > >
> >
> > embedded system that doesn't want to stay at higest OPP if significant part
> > of the utilzation has moved away as an example
> > AFAICT, schedutil tries to select the best OPP according to the current
> > utilization of the CPU so if the utilization decreases, the OPP should also
> > decrease
>
> Sure I get that; but given the lack of crystal ball instructions we
> cannot know if this is the case or not.
cfs_rq->avg.load_avg account the waiting time of CPU (in addition to
the weight of task) so i was wondering if we can't use it to detect if
we are in the overloaded case or not even if utilization is not mac
capacity because we have just migrated a task (and its utilization)
out
>
> And if we really dropped below 100% utilization, we should hit idle
> fairly soon.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 17:00 ` Vincent Guittot
@ 2017-03-21 17:01 ` Vincent Guittot
0 siblings, 0 replies; 71+ messages in thread
From: Vincent Guittot @ 2017-03-21 17:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Linux PM, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 21 March 2017 at 18:00, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> On 21 March 2017 at 15:58, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Tue, Mar 21, 2017 at 03:16:19PM +0100, Vincent Guittot wrote:
>> > On 21 March 2017 at 15:03, Peter Zijlstra <peterz@infradead.org> wrote:
>> >
>> > > On Tue, Mar 21, 2017 at 02:37:08PM +0100, Vincent Guittot wrote:
>> > > > On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
>> > >
>> > > > For the not overloaded case, it makes sense to immediately update to
>> > > > OPP to be aligned with the new utilization of the CPU even if it was
>> > > > not idle in the past couple of ticks
>> > >
>> > > Yeah, but we cannot know. Also, who cares?
>> > >
>> >
>> > embedded system that doesn't want to stay at higest OPP if significant part
>> > of the utilzation has moved away as an example
>> > AFAICT, schedutil tries to select the best OPP according to the current
>> > utilization of the CPU so if the utilization decreases, the OPP should also
>> > decrease
>>
>> Sure I get that; but given the lack of crystal ball instructions we
>> cannot know if this is the case or not.
>
> cfs_rq->avg.load_avg account the waiting time of CPU (in addition to
sorry i wanted to say the waiting time of tasks on the CPU
> the weight of task) so i was wondering if we can't use it to detect if
> we are in the overloaded case or not even if utilization is not mac
> capacity because we have just migrated a task (and its utilization)
> out
>
>
>
>>
>> And if we really dropped below 100% utilization, we should hit idle
>> fairly soon.
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 17:00 ` Peter Zijlstra
@ 2017-03-21 17:17 ` Rafael J. Wysocki
0 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-21 17:17 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Patrick Bellasi, Vincent Guittot, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On Tuesday, March 21, 2017 06:00:17 PM Peter Zijlstra wrote:
> On Tue, Mar 21, 2017 at 04:18:52PM +0100, Rafael J. Wysocki wrote:
> > +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> > +{
> > + unsigned long idle_calls = tick_nohz_get_idle_calls();
> > + bool not_idle = idle_calls == sg_cpu->saved_idle_calls;
> > +
> > + sg_cpu->saved_idle_calls = idle_calls;
> > + return not_idle && this_rq()->rd->overload;
> > +}
>
> So I really don't understand the rd->overload thing. What is it supposed
> to do here?
The idea was that if the CPU was running one task saturating the capacity which
then was migrated out of it, the frequency should still be reduced.
And since rd->overload covers all CPUs (in general) it kind of tells us whether
or not there are other tasks to replace the migrated one any time soon.
However, if there are no tasks to replace the migrated one, the CPU will go
idle quickly (as there are no taks to run on it), in which case keeping the
current frequency on it shouldn't matter.
In all of the other cases keeping the current frequency is the right thing to
do IMO.
So, it looks like checking this_rq()->rd->overload doesn't really help after all. :-)
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
2017-03-21 15:18 ` Peter Zijlstra
@ 2017-03-21 19:28 ` Patrick Bellasi
0 siblings, 0 replies; 71+ messages in thread
From: Patrick Bellasi @ 2017-03-21 19:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Vincent Guittot, Linux PM, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Joel Fernandes,
Morten Rasmussen, Ingo Molnar
On 21-Mar 16:18, Peter Zijlstra wrote:
> On Tue, Mar 21, 2017 at 03:08:20PM +0000, Patrick Bellasi wrote:
>
> > And than we can move this bit into an inline function, something like e.g.:
> >
> > static inline bool sugov_this_cpu_is_busy()
> > {
> > return this_rq()->rd->overloaded
> > }
>
> No, that's just entirely and utterly wrong. It being in rd means its
> very much not about _this_ CPU in any way.
You right (of course), we cannot really use "this_" in the name of
a function with such a code.
The suggestion here was at least to factor out whatever code we want
to use to check if the current CPU has to be subject to a down-scaling
constraint.
However, using rd->overload is not the best option, for the many reasons
you explained in your previous comment. Thus, we should probably stay
with the idle time tracking solution initially proposed by Rafael.
Sorry for the noise :-(
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 71+ messages in thread
* [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-20 21:46 ` [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of " Rafael J. Wysocki
` (2 preceding siblings ...)
2017-03-21 11:50 ` Patrick Bellasi
@ 2017-03-21 23:08 ` Rafael J. Wysocki
2017-03-22 9:26 ` Peter Zijlstra
` (5 more replies)
3 siblings, 6 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-21 23:08 UTC (permalink / raw)
To: Linux PM, Peter Zijlstra
Cc: LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar, Thomas Gleixner
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.
That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register. Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again. That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.
That has been attributed to CPU utilization metric updates on task
migration that cause the total utilization value for the CPU to be
reduced by the utilization of the migrated task. If that happens,
the schedutil governor may see a CPU utilization reduction and will
attempt to reduce the CPU frequency accordingly right away. That
may be premature, though, for example if the system is generally
busy and there are other runnable tasks waiting to be run on that
CPU already.
This is unlikely to be an issue on systems where cpufreq policies are
shared between multiple CPUs, because in those cases the policy
utilization is computed as the maximum of the CPU utilization values
over the whole policy and if that turns out to be low, reducing the
frequency for the policy most likely is a good idea anyway. On
systems with one CPU per policy, however, it may affect performance
adversely and even lead to increased energy consumption in some cases.
On those systems it may be addressed by taking another utilization
metric into consideration, like whether or not the CPU whose
frequency is about to be reduced has been idle recently, because if
that's not the case, the CPU is likely to be busy in the near future
and its frequency should not be reduced.
To that end, use the counter of idle calls in the timekeeping code.
Namely, make the schedutil governor look at that counter for the
current CPU every time before its frequency is about to be reduced.
If the counter has not changed since the previous iteration of the
governor computations for that CPU, the CPU has been busy for all
that time and its frequency should not be decreased, so if the new
frequency would be lower than the one set previously, the governor
will skip the frequency update.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
include/linux/tick.h | 1 +
kernel/sched/cpufreq_schedutil.c | 27 +++++++++++++++++++++++++++
kernel/time/tick-sched.c | 12 ++++++++++++
3 files changed, 40 insertions(+)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -61,6 +61,11 @@ struct sugov_cpu {
unsigned long util;
unsigned long max;
unsigned int flags;
+
+ /* The field below is for single-CPU policies only. */
+#ifdef CONFIG_NO_HZ_COMMON
+ unsigned long saved_idle_calls;
+#endif
};
static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
@@ -192,6 +197,19 @@ static void sugov_iowait_boost(struct su
sg_cpu->iowait_boost >>= 1;
}
+#ifdef CONFIG_NO_HZ_COMMON
+static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
+{
+ unsigned long idle_calls = tick_nohz_get_idle_calls();
+ bool ret = idle_calls == sg_cpu->saved_idle_calls;
+
+ sg_cpu->saved_idle_calls = idle_calls;
+ return ret;
+}
+#else
+static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
+#endif /* CONFIG_NO_HZ_COMMON */
+
static void sugov_update_single(struct update_util_data *hook, u64 time,
unsigned int flags)
{
@@ -200,6 +218,7 @@ static void sugov_update_single(struct u
struct cpufreq_policy *policy = sg_policy->policy;
unsigned long util, max;
unsigned int next_f;
+ bool busy;
sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
@@ -207,12 +226,20 @@ static void sugov_update_single(struct u
if (!sugov_should_update_freq(sg_policy, time))
return;
+ busy = sugov_cpu_is_busy(sg_cpu);
+
if (flags & SCHED_CPUFREQ_RT_DL) {
next_f = policy->cpuinfo.max_freq;
} else {
sugov_get_util(&util, &max);
sugov_iowait_boost(sg_cpu, &util, &max);
next_f = get_next_freq(sg_policy, util, max);
+ /*
+ * Do not reduce the frequency if the CPU has not been idle
+ * recently, as the reduction is likely to be premature then.
+ */
+ if (busy && next_f < sg_policy->next_freq)
+ next_f = sg_policy->next_freq;
}
sugov_update_commit(sg_policy, time, next_f);
}
Index: linux-pm/include/linux/tick.h
===================================================================
--- linux-pm.orig/include/linux/tick.h
+++ linux-pm/include/linux/tick.h
@@ -117,6 +117,7 @@ extern void tick_nohz_idle_enter(void);
extern void tick_nohz_idle_exit(void);
extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
+extern unsigned long tick_nohz_get_idle_calls(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
#else /* !CONFIG_NO_HZ_COMMON */
Index: linux-pm/kernel/time/tick-sched.c
===================================================================
--- linux-pm.orig/kernel/time/tick-sched.c
+++ linux-pm/kernel/time/tick-sched.c
@@ -993,6 +993,18 @@ ktime_t tick_nohz_get_sleep_length(void)
return ts->sleep_length;
}
+/**
+ * tick_nohz_get_idle_calls - return the current idle calls counter value
+ *
+ * Called from the schedutil frequency scaling governor in scheduler context.
+ */
+unsigned long tick_nohz_get_idle_calls(void)
+{
+ struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+
+ return ts->idle_calls;
+}
+
static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
{
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
@ 2017-03-22 9:26 ` Peter Zijlstra
2017-03-22 9:54 ` Viresh Kumar
` (4 subsequent siblings)
5 siblings, 0 replies; 71+ messages in thread
From: Peter Zijlstra @ 2017-03-22 9:26 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar, Thomas Gleixner
On Wed, Mar 22, 2017 at 12:08:50AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The way the schedutil governor uses the PELT metric causes it to
> underestimate the CPU utilization in some cases.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
>
> That has been attributed to CPU utilization metric updates on task
> migration that cause the total utilization value for the CPU to be
> reduced by the utilization of the migrated task. If that happens,
> the schedutil governor may see a CPU utilization reduction and will
> attempt to reduce the CPU frequency accordingly right away. That
> may be premature, though, for example if the system is generally
> busy and there are other runnable tasks waiting to be run on that
> CPU already.
>
> This is unlikely to be an issue on systems where cpufreq policies are
> shared between multiple CPUs, because in those cases the policy
> utilization is computed as the maximum of the CPU utilization values
> over the whole policy and if that turns out to be low, reducing the
> frequency for the policy most likely is a good idea anyway. On
> systems with one CPU per policy, however, it may affect performance
> adversely and even lead to increased energy consumption in some cases.
>
> On those systems it may be addressed by taking another utilization
> metric into consideration, like whether or not the CPU whose
> frequency is about to be reduced has been idle recently, because if
> that's not the case, the CPU is likely to be busy in the near future
> and its frequency should not be reduced.
>
> To that end, use the counter of idle calls in the timekeeping code.
> Namely, make the schedutil governor look at that counter for the
> current CPU every time before its frequency is about to be reduced.
> If the counter has not changed since the previous iteration of the
> governor computations for that CPU, the CPU has been busy for all
> that time and its frequency should not be decreased, so if the new
> frequency would be lower than the one set previously, the governor
> will skip the frequency update.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Right; this makes sense to me. Of course it would be good to have some
more measurements on this, but in principle:
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
2017-03-22 9:26 ` Peter Zijlstra
@ 2017-03-22 9:54 ` Viresh Kumar
2017-03-23 1:04 ` Joel Fernandes
` (3 subsequent siblings)
5 siblings, 0 replies; 71+ messages in thread
From: Viresh Kumar @ 2017-03-22 9:54 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, Peter Zijlstra, LKML, Srinivas Pandruvada, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar, Thomas Gleixner
On 22-03-17, 00:08, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The way the schedutil governor uses the PELT metric causes it to
> underestimate the CPU utilization in some cases.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
>
> That has been attributed to CPU utilization metric updates on task
> migration that cause the total utilization value for the CPU to be
> reduced by the utilization of the migrated task. If that happens,
> the schedutil governor may see a CPU utilization reduction and will
> attempt to reduce the CPU frequency accordingly right away. That
> may be premature, though, for example if the system is generally
> busy and there are other runnable tasks waiting to be run on that
> CPU already.
>
> This is unlikely to be an issue on systems where cpufreq policies are
> shared between multiple CPUs, because in those cases the policy
> utilization is computed as the maximum of the CPU utilization values
> over the whole policy and if that turns out to be low, reducing the
> frequency for the policy most likely is a good idea anyway. On
> systems with one CPU per policy, however, it may affect performance
> adversely and even lead to increased energy consumption in some cases.
>
> On those systems it may be addressed by taking another utilization
> metric into consideration, like whether or not the CPU whose
> frequency is about to be reduced has been idle recently, because if
> that's not the case, the CPU is likely to be busy in the near future
> and its frequency should not be reduced.
>
> To that end, use the counter of idle calls in the timekeeping code.
> Namely, make the schedutil governor look at that counter for the
> current CPU every time before its frequency is about to be reduced.
> If the counter has not changed since the previous iteration of the
> governor computations for that CPU, the CPU has been busy for all
> that time and its frequency should not be decreased, so if the new
> frequency would be lower than the one set previously, the governor
> will skip the frequency update.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> include/linux/tick.h | 1 +
> kernel/sched/cpufreq_schedutil.c | 27 +++++++++++++++++++++++++++
> kernel/time/tick-sched.c | 12 ++++++++++++
> 3 files changed, 40 insertions(+)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
--
viresh
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-20 12:34 ` Patrick Bellasi
@ 2017-03-22 23:56 ` Joel Fernandes
2017-03-23 22:08 ` Vincent Guittot
0 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2017-03-22 23:56 UTC (permalink / raw)
To: Patrick Bellasi
Cc: Rafael J. Wysocki, Viresh Kumar, Vincent Guittot, Linux PM, LKML,
Peter Zijlstra, Srinivas Pandruvada, Juri Lelli,
Morten Rasmussen, Ingo Molnar
On Mon, Mar 20, 2017 at 5:34 AM, Patrick Bellasi
<patrick.bellasi@arm.com> wrote:
> On 20-Mar 09:26, Vincent Guittot wrote:
>> On 20 March 2017 at 04:57, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> > On 19-03-17, 14:34, Rafael J. Wysocki wrote:
>> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >>
>> >> The PELT metric used by the schedutil governor underestimates the
>> >> CPU utilization in some cases. The reason for that may be time spent
>> >> in interrupt handlers and similar which is not accounted for by PELT.
>>
>> Are you sure of the root cause described above (time stolen by irq
>> handler) or is it just a hypotheses ? That would be good to be sure of
>> the root cause
>> Furthermore, IIRC the time spent in irq context is also accounted as
>> run time for the running cfs task but not RT and deadline task running
>> time
>
> As long as the IRQ processing does not generate a context switch,
> which is happening (eventually) if the top half schedule some deferred
> work to be executed by a bottom half.
>
> Thus, me too I would say that all the top half time is accounted in
> PELT, since the current task is still RUNNABLE/RUNNING.
Sorry if I'm missing something but doesn't this depend on whether you
have CONFIG_IRQ_TIME_ACCOUNTING enabled?
__update_load_avg uses rq->clock_task for deltas which I think
shouldn't account IRQ time with that config option. So it should be
quite possible for IRQ time spent to reduce the PELT signal right?
>
>> So I'm not really aligned with the description of your problem: PELT
>> metric underestimates the load of the CPU. The PELT is just about
>> tracking CFS task utilization but not whole CPU utilization and
>> according to your description of the problem (time stolen by irq),
>> your problem doesn't come from an underestimation of CFS task but from
>> time spent in something else but not accounted in the value used by
>> schedutil
>
> Quite likely. Indeed, it can really be that the CFS task is preempted
> because of some RT activity generated by the IRQ handler.
>
> More in general, I've also noticed many suboptimal freq switches when
> RT tasks interleave with CFS ones, because of:
> - relatively long down _and up_ throttling times
> - the way schedutil's flags are tracked and updated
> - the callsites from where we call schedutil updates
>
> For example it can really happen that we are running at the highest
> OPP because of some RT activity. Then we switch back to a relatively
> low utilization CFS workload and then:
> 1. a tick happens which produces a frequency drop
Any idea why this frequency drop would happen? Say a running CFS task
gets preempted by RT task, the PELT signal shouldn't drop for the
duration the CFS task is preempted because the task is runnable, so
once the CFS task gets CPU back, schedutil should still maintain the
capacity right?
Regards,
Joel
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
2017-03-22 9:26 ` Peter Zijlstra
2017-03-22 9:54 ` Viresh Kumar
@ 2017-03-23 1:04 ` Joel Fernandes
2017-03-23 19:26 ` Sai Gurrappadi
` (2 subsequent siblings)
5 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2017-03-23 1:04 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, Peter Zijlstra, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Vincent Guittot, Patrick Bellasi,
Morten Rasmussen, Ingo Molnar, Thomas Gleixner
On Tue, Mar 21, 2017 at 4:08 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The way the schedutil governor uses the PELT metric causes it to
> underestimate the CPU utilization in some cases.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
>
> That has been attributed to CPU utilization metric updates on task
> migration that cause the total utilization value for the CPU to be
> reduced by the utilization of the migrated task. If that happens,
> the schedutil governor may see a CPU utilization reduction and will
> attempt to reduce the CPU frequency accordingly right away. That
> may be premature, though, for example if the system is generally
> busy and there are other runnable tasks waiting to be run on that
> CPU already.
>
> This is unlikely to be an issue on systems where cpufreq policies are
> shared between multiple CPUs, because in those cases the policy
> utilization is computed as the maximum of the CPU utilization values
> over the whole policy and if that turns out to be low, reducing the
> frequency for the policy most likely is a good idea anyway. On
> systems with one CPU per policy, however, it may affect performance
> adversely and even lead to increased energy consumption in some cases.
>
> On those systems it may be addressed by taking another utilization
> metric into consideration, like whether or not the CPU whose
> frequency is about to be reduced has been idle recently, because if
> that's not the case, the CPU is likely to be busy in the near future
> and its frequency should not be reduced.
>
> To that end, use the counter of idle calls in the timekeeping code.
> Namely, make the schedutil governor look at that counter for the
> current CPU every time before its frequency is about to be reduced.
> If the counter has not changed since the previous iteration of the
> governor computations for that CPU, the CPU has been busy for all
> that time and its frequency should not be decreased, so if the new
> frequency would be lower than the one set previously, the governor
> will skip the frequency update.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Makes sense,
Reviewed-by: Joel Fernandes <joelaf@google.com>
Thanks,
Joel
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
` (2 preceding siblings ...)
2017-03-23 1:04 ` Joel Fernandes
@ 2017-03-23 19:26 ` Sai Gurrappadi
2017-03-23 20:48 ` Sai Gurrappadi
2017-03-24 1:39 ` Rafael J. Wysocki
2017-03-25 1:14 ` Sai Gurrappadi
2017-05-08 3:49 ` Wanpeng Li
5 siblings, 2 replies; 71+ messages in thread
From: Sai Gurrappadi @ 2017-03-23 19:26 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM, Peter Zijlstra
Cc: LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar, Thomas Gleixner,
Peter Boonstoppel
Hi Rafael,
On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
<snip>
>
> That has been attributed to CPU utilization metric updates on task
> migration that cause the total utilization value for the CPU to be
> reduced by the utilization of the migrated task. If that happens,
> the schedutil governor may see a CPU utilization reduction and will
> attempt to reduce the CPU frequency accordingly right away. That
> may be premature, though, for example if the system is generally
> busy and there are other runnable tasks waiting to be run on that
> CPU already.
>
> This is unlikely to be an issue on systems where cpufreq policies are
> shared between multiple CPUs, because in those cases the policy
> utilization is computed as the maximum of the CPU utilization values
> over the whole policy and if that turns out to be low, reducing the
> frequency for the policy most likely is a good idea anyway. On
I have observed this issue even in the shared policy case (one clock domain for many CPUs). On migrate, the actual load update is split into two updates:
1. Add to removed_load on src_cpu (cpu_util(src_cpu) not updated yet)
2. Do wakeup on dst_cpu, add load to dst_cpu
Now if src_cpu manages to do a PELT update before 2. happens, ex: say a small periodic task woke up on src_cpu, it'll end up subtracting the removed_load from its utilization and issue a frequency update before 2. happens.
This causes a premature dip in frequency which doesn't get corrected until the next util update that fires after rate_limit_us. The dst_cpu freq. update from step 2. above gets rate limited in this scenario.
> systems with one CPU per policy, however, it may affect performance
> adversely and even lead to increased energy consumption in some cases.
>
> On those systems it may be addressed by taking another utilization
> metric into consideration, like whether or not the CPU whose
> frequency is about to be reduced has been idle recently, because if
> that's not the case, the CPU is likely to be busy in the near future
> and its frequency should not be reduced.
>
> To that end, use the counter of idle calls in the timekeeping code.
> Namely, make the schedutil governor look at that counter for the
> current CPU every time before its frequency is about to be reduced.
> If the counter has not changed since the previous iteration of the
> governor computations for that CPU, the CPU has been busy for all
> that time and its frequency should not be decreased, so if the new
> frequency would be lower than the one set previously, the governor
> will skip the frequency update.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> include/linux/tick.h | 1 +
> kernel/sched/cpufreq_schedutil.c | 27 +++++++++++++++++++++++++++
> kernel/time/tick-sched.c | 12 ++++++++++++
> 3 files changed, 40 insertions(+)
>
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -61,6 +61,11 @@ struct sugov_cpu {
> unsigned long util;
> unsigned long max;
> unsigned int flags;
> +
> + /* The field below is for single-CPU policies only. */
> +#ifdef CONFIG_NO_HZ_COMMON
> + unsigned long saved_idle_calls;
> +#endif
> };
>
> static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> @@ -192,6 +197,19 @@ static void sugov_iowait_boost(struct su
> sg_cpu->iowait_boost >>= 1;
> }
>
> +#ifdef CONFIG_NO_HZ_COMMON
> +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> +{
> + unsigned long idle_calls = tick_nohz_get_idle_calls();
> + bool ret = idle_calls == sg_cpu->saved_idle_calls;
> +
> + sg_cpu->saved_idle_calls = idle_calls;
> + return ret;
> +}
Hm, sorry I am a bit confused perhaps you could help me understand the problem/solution better :)
Say we have the this simple case of only a single periodic task running on one CPU, wouldn't the PELT update on wakeup cause a frequency update which updates the sg_cpu->saved_idle_calls value here? That would then cause the frequency update on idle entry to always skip dropping frequency right?
If I am reading this correctly, the PELT update on the dequeue for the periodic task (in the scenario above) happens _before_ the idle_calls++ which is in tick_nohz_idle_enter.
Thanks!
-Sai
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-23 19:26 ` Sai Gurrappadi
@ 2017-03-23 20:48 ` Sai Gurrappadi
2017-03-24 1:39 ` Rafael J. Wysocki
1 sibling, 0 replies; 71+ messages in thread
From: Sai Gurrappadi @ 2017-03-23 20:48 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM, Peter Zijlstra
Cc: LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar, Thomas Gleixner,
Peter Boonstoppel
On 03/23/2017 12:26 PM, Sai Gurrappadi wrote:
>
> Hm, sorry I am a bit confused perhaps you could help me understand the problem/solution better :)
>
> Say we have the this simple case of only a single periodic task running on one CPU, wouldn't the PELT update on wakeup cause a frequency update which updates the sg_cpu->saved_idle_calls value here? That would then cause the frequency update on idle entry to always skip dropping frequency right?
Apologies, this example here is flawed because on task dequeue, its utilization isn't removed. There is no problem in this case...
-Sai
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-22 23:56 ` Joel Fernandes
@ 2017-03-23 22:08 ` Vincent Guittot
2017-03-25 3:48 ` Joel Fernandes
0 siblings, 1 reply; 71+ messages in thread
From: Vincent Guittot @ 2017-03-23 22:08 UTC (permalink / raw)
To: Joel Fernandes
Cc: Patrick Bellasi, Rafael J. Wysocki, Viresh Kumar, Linux PM, LKML,
Peter Zijlstra, Srinivas Pandruvada, Juri Lelli,
Morten Rasmussen, Ingo Molnar
On 23 March 2017 at 00:56, Joel Fernandes <joelaf@google.com> wrote:
> On Mon, Mar 20, 2017 at 5:34 AM, Patrick Bellasi
> <patrick.bellasi@arm.com> wrote:
>> On 20-Mar 09:26, Vincent Guittot wrote:
>>> On 20 March 2017 at 04:57, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>> > On 19-03-17, 14:34, Rafael J. Wysocki wrote:
>>> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> >>
>>> >> The PELT metric used by the schedutil governor underestimates the
>>> >> CPU utilization in some cases. The reason for that may be time spent
>>> >> in interrupt handlers and similar which is not accounted for by PELT.
>>>
>>> Are you sure of the root cause described above (time stolen by irq
>>> handler) or is it just a hypotheses ? That would be good to be sure of
>>> the root cause
>>> Furthermore, IIRC the time spent in irq context is also accounted as
>>> run time for the running cfs task but not RT and deadline task running
>>> time
>>
>> As long as the IRQ processing does not generate a context switch,
>> which is happening (eventually) if the top half schedule some deferred
>> work to be executed by a bottom half.
>>
>> Thus, me too I would say that all the top half time is accounted in
>> PELT, since the current task is still RUNNABLE/RUNNING.
>
> Sorry if I'm missing something but doesn't this depend on whether you
> have CONFIG_IRQ_TIME_ACCOUNTING enabled?
>
> __update_load_avg uses rq->clock_task for deltas which I think
> shouldn't account IRQ time with that config option. So it should be
> quite possible for IRQ time spent to reduce the PELT signal right?
>
>>
>>> So I'm not really aligned with the description of your problem: PELT
>>> metric underestimates the load of the CPU. The PELT is just about
>>> tracking CFS task utilization but not whole CPU utilization and
>>> according to your description of the problem (time stolen by irq),
>>> your problem doesn't come from an underestimation of CFS task but from
>>> time spent in something else but not accounted in the value used by
>>> schedutil
>>
>> Quite likely. Indeed, it can really be that the CFS task is preempted
>> because of some RT activity generated by the IRQ handler.
>>
>> More in general, I've also noticed many suboptimal freq switches when
>> RT tasks interleave with CFS ones, because of:
>> - relatively long down _and up_ throttling times
>> - the way schedutil's flags are tracked and updated
>> - the callsites from where we call schedutil updates
>>
>> For example it can really happen that we are running at the highest
>> OPP because of some RT activity. Then we switch back to a relatively
>> low utilization CFS workload and then:
>> 1. a tick happens which produces a frequency drop
>
> Any idea why this frequency drop would happen? Say a running CFS task
> gets preempted by RT task, the PELT signal shouldn't drop for the
> duration the CFS task is preempted because the task is runnable, so
utilization only tracks the running state but not runnable state.
Runnable state is tracked in load_avg
> once the CFS task gets CPU back, schedutil should still maintain the
> capacity right?
>
> Regards,
> Joel
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-23 19:26 ` Sai Gurrappadi
2017-03-23 20:48 ` Sai Gurrappadi
@ 2017-03-24 1:39 ` Rafael J. Wysocki
2017-03-24 19:08 ` Sai Gurrappadi
1 sibling, 1 reply; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-24 1:39 UTC (permalink / raw)
To: Sai Gurrappadi
Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Vincent Guittot,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar,
Thomas Gleixner, Peter Boonstoppel
On Thu, Mar 23, 2017 at 8:26 PM, Sai Gurrappadi <sgurrappadi@nvidia.com> wrote:
> Hi Rafael,
Hi,
> On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> <snip>
>
>>
>> That has been attributed to CPU utilization metric updates on task
>> migration that cause the total utilization value for the CPU to be
>> reduced by the utilization of the migrated task. If that happens,
>> the schedutil governor may see a CPU utilization reduction and will
>> attempt to reduce the CPU frequency accordingly right away. That
>> may be premature, though, for example if the system is generally
>> busy and there are other runnable tasks waiting to be run on that
>> CPU already.
>>
>> This is unlikely to be an issue on systems where cpufreq policies are
>> shared between multiple CPUs, because in those cases the policy
>> utilization is computed as the maximum of the CPU utilization values
>> over the whole policy and if that turns out to be low, reducing the
>> frequency for the policy most likely is a good idea anyway. On
>
> I have observed this issue even in the shared policy case (one clock domain for many CPUs). On migrate, the actual load update is split into two updates:
>
> 1. Add to removed_load on src_cpu (cpu_util(src_cpu) not updated yet)
> 2. Do wakeup on dst_cpu, add load to dst_cpu
>
> Now if src_cpu manages to do a PELT update before 2. happens, ex: say a small periodic task woke up on src_cpu, it'll end up subtracting the removed_load from its utilization and issue a frequency update before 2. happens.
>
> This causes a premature dip in frequency which doesn't get corrected until the next util update that fires after rate_limit_us. The dst_cpu freq. update from step 2. above gets rate limited in this scenario.
Interesting, and this seems to be related to last_freq_update_time
being per-policy (which it has to be, because frequency updates are
per-policy too and that's what we need to rate-limit).
Does this happen often enough to be a real concern in practice on
those configurations, though?
The other CPUs in the policy need to be either idle (so schedutil
doesn't take them into account at all) or lightly utilized for that to
happen, so that would affect workloads with one CPU hog type of task
that is migrated from one CPU to another within a policy and that
doesn't happen too often AFAICS.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-24 1:39 ` Rafael J. Wysocki
@ 2017-03-24 19:08 ` Sai Gurrappadi
0 siblings, 0 replies; 71+ messages in thread
From: Sai Gurrappadi @ 2017-03-24 19:08 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Vincent Guittot,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar,
Thomas Gleixner, Peter Boonstoppel
On 03/23/2017 06:39 PM, Rafael J. Wysocki wrote:
> On Thu, Mar 23, 2017 at 8:26 PM, Sai Gurrappadi <sgurrappadi@nvidia.com> wrote:
>> Hi Rafael,
>
> Hi,
>
>> On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> <snip>
>>
>>>
>>> That has been attributed to CPU utilization metric updates on task
>>> migration that cause the total utilization value for the CPU to be
>>> reduced by the utilization of the migrated task. If that happens,
>>> the schedutil governor may see a CPU utilization reduction and will
>>> attempt to reduce the CPU frequency accordingly right away. That
>>> may be premature, though, for example if the system is generally
>>> busy and there are other runnable tasks waiting to be run on that
>>> CPU already.
>>>
>>> This is unlikely to be an issue on systems where cpufreq policies are
>>> shared between multiple CPUs, because in those cases the policy
>>> utilization is computed as the maximum of the CPU utilization values
>>> over the whole policy and if that turns out to be low, reducing the
>>> frequency for the policy most likely is a good idea anyway. On
>>
>> I have observed this issue even in the shared policy case (one clock domain for many CPUs). On migrate, the actual load update is split into two updates:
>>
>> 1. Add to removed_load on src_cpu (cpu_util(src_cpu) not updated yet)
>> 2. Do wakeup on dst_cpu, add load to dst_cpu
>>
>> Now if src_cpu manages to do a PELT update before 2. happens, ex: say a small periodic task woke up on src_cpu, it'll end up subtracting the removed_load from its utilization and issue a frequency update before 2. happens.
>>
>> This causes a premature dip in frequency which doesn't get corrected until the next util update that fires after rate_limit_us. The dst_cpu freq. update from step 2. above gets rate limited in this scenario.
>
> Interesting, and this seems to be related to last_freq_update_time
> being per-policy (which it has to be, because frequency updates are
> per-policy too and that's what we need to rate-limit).
>
Correct.
> Does this happen often enough to be a real concern in practice on
> those configurations, though?
>
> The other CPUs in the policy need to be either idle (so schedutil
> doesn't take them into account at all) or lightly utilized for that to
> happen, so that would affect workloads with one CPU hog type of task
> that is migrated from one CPU to another within a policy and that
> doesn't happen too often AFAICS.
So it is possible, even likely in some cases for a heavy CPU task to migrate on wakeup between the policy->cpus via select_idle_sibling() if the prev_cpu it was on was !idle on wakeup.
This style of heavy thread + lots of light work is a common pattern on Android (games, browsing, etc.) given how Android does its threading for ipc (Binder stuff) + its rendering/audio pipelines.
I unfortunately don't have any numbers atm though.
-Sai
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
` (3 preceding siblings ...)
2017-03-23 19:26 ` Sai Gurrappadi
@ 2017-03-25 1:14 ` Sai Gurrappadi
2017-03-25 1:39 ` Rafael J. Wysocki
2017-03-27 7:04 ` Vincent Guittot
2017-05-08 3:49 ` Wanpeng Li
5 siblings, 2 replies; 71+ messages in thread
From: Sai Gurrappadi @ 2017-03-25 1:14 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM, Peter Zijlstra
Cc: LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar, Thomas Gleixner,
Peter Boonstoppel
Hi Rafael,
On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The way the schedutil governor uses the PELT metric causes it to
> underestimate the CPU utilization in some cases.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
>
> That has been attributed to CPU utilization metric updates on task
> migration that cause the total utilization value for the CPU to be
> reduced by the utilization of the migrated task. If that happens,
> the schedutil governor may see a CPU utilization reduction and will
> attempt to reduce the CPU frequency accordingly right away. That
> may be premature, though, for example if the system is generally
> busy and there are other runnable tasks waiting to be run on that
> CPU already.
>
Thinking out loud a bit, I wonder if what you really want to do is basically:
schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg);
Where total_cpu_util_avg tracks the average utilization of the CPU itself over time (% of time the CPU was busy) in the same PELT like manner. The difference here is that it doesn't change instantaneously as tasks migrate in/out but it decays/accumulates just like the per-entity util_avgs.
Over time, total_cpu_util_avg and cfs_rq(cpu)->util_avg will tend towards each other the lesser the amount of 'overlap' / overloading.
Yes, the above metric would 'overestimate' in case all tasks have migrated away and we are left with an idle CPU. A fix for that could be to just use the PELT value like so:
schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, idle_cpu(cpu) ? 0 : total_cpu_util_avg);
Note that the problem described here in the commit message doesn't need fully runnable threads, it just needs two threads to execute in parallel on the same CPU for a period of time. I don't think looking at just idle_calls necessarily covers all cases.
Thoughts?
Thanks,
-Sai
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-25 1:14 ` Sai Gurrappadi
@ 2017-03-25 1:39 ` Rafael J. Wysocki
2017-03-27 7:04 ` Vincent Guittot
1 sibling, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-25 1:39 UTC (permalink / raw)
To: Sai Gurrappadi
Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Vincent Guittot,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar,
Thomas Gleixner, Peter Boonstoppel
On Sat, Mar 25, 2017 at 2:14 AM, Sai Gurrappadi <sgurrappadi@nvidia.com> wrote:
> Hi Rafael,
>
> On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> The way the schedutil governor uses the PELT metric causes it to
>> underestimate the CPU utilization in some cases.
>>
>> That can be easily demonstrated by running kernel compilation on
>> a Sandy Bridge Intel processor, running turbostat in parallel with
>> it and looking at the values written to the MSR_IA32_PERF_CTL
>> register. Namely, the expected result would be that when all CPUs
>> were 100% busy, all of them would be requested to run in the maximum
>> P-state, but observation shows that this clearly isn't the case.
>> The CPUs run in the maximum P-state for a while and then are
>> requested to run slower and go back to the maximum P-state after
>> a while again. That causes the actual frequency of the processor to
>> visibly oscillate below the sustainable maximum in a jittery fashion
>> which clearly is not desirable.
>>
>> That has been attributed to CPU utilization metric updates on task
>> migration that cause the total utilization value for the CPU to be
>> reduced by the utilization of the migrated task. If that happens,
>> the schedutil governor may see a CPU utilization reduction and will
>> attempt to reduce the CPU frequency accordingly right away. That
>> may be premature, though, for example if the system is generally
>> busy and there are other runnable tasks waiting to be run on that
>> CPU already.
>>
>
> Thinking out loud a bit, I wonder if what you really want to do is basically:
>
> schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg);
>
> Where total_cpu_util_avg tracks the average utilization of the CPU itself over time (% of time the CPU was busy) in the same PELT like manner. The difference here is that it doesn't change instantaneously as tasks migrate in/out but it decays/accumulates just like the per-entity util_avgs.
>
> Over time, total_cpu_util_avg and cfs_rq(cpu)->util_avg will tend towards each other the lesser the amount of 'overlap' / overloading.
>
> Yes, the above metric would 'overestimate' in case all tasks have migrated away and we are left with an idle CPU. A fix for that could be to just use the PELT value like so:
>
> schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, idle_cpu(cpu) ? 0 : total_cpu_util_avg);
>
> Note that the problem described here in the commit message doesn't need fully runnable threads, it just needs two threads to execute in parallel on the same CPU for a period of time. I don't think looking at just idle_calls necessarily covers all cases.
>
> Thoughts?
Well, is the total_cpu_util_avg metric readily available?
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-23 22:08 ` Vincent Guittot
@ 2017-03-25 3:48 ` Joel Fernandes
2017-03-27 6:59 ` Vincent Guittot
0 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2017-03-25 3:48 UTC (permalink / raw)
To: Vincent Guittot
Cc: Patrick Bellasi, Rafael J. Wysocki, Viresh Kumar, Linux PM, LKML,
Peter Zijlstra, Srinivas Pandruvada, Juri Lelli,
Morten Rasmussen, Ingo Molnar
Hi Vincent,
On Thu, Mar 23, 2017 at 3:08 PM, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
[..]
>>>
>>>> So I'm not really aligned with the description of your problem: PELT
>>>> metric underestimates the load of the CPU. The PELT is just about
>>>> tracking CFS task utilization but not whole CPU utilization and
>>>> according to your description of the problem (time stolen by irq),
>>>> your problem doesn't come from an underestimation of CFS task but from
>>>> time spent in something else but not accounted in the value used by
>>>> schedutil
>>>
>>> Quite likely. Indeed, it can really be that the CFS task is preempted
>>> because of some RT activity generated by the IRQ handler.
>>>
>>> More in general, I've also noticed many suboptimal freq switches when
>>> RT tasks interleave with CFS ones, because of:
>>> - relatively long down _and up_ throttling times
>>> - the way schedutil's flags are tracked and updated
>>> - the callsites from where we call schedutil updates
>>>
>>> For example it can really happen that we are running at the highest
>>> OPP because of some RT activity. Then we switch back to a relatively
>>> low utilization CFS workload and then:
>>> 1. a tick happens which produces a frequency drop
>>
>> Any idea why this frequency drop would happen? Say a running CFS task
>> gets preempted by RT task, the PELT signal shouldn't drop for the
>> duration the CFS task is preempted because the task is runnable, so
>
> utilization only tracks the running state but not runnable state.
> Runnable state is tracked in load_avg
Thanks. I got it now.
Correct me if I'm wrong but strictly speaking utilization for a cfs_rq
(which drives the frequency for CFS) still tracks the blocked/runnable
time of tasks although its decayed as time moves forward. Only when we
migrate the rq of a cfs task is the util_avg contribution removed from
the rq. But I can see now why running RT can decay this load tracking
signal.
Regards,
Joel
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
2017-03-25 3:48 ` Joel Fernandes
@ 2017-03-27 6:59 ` Vincent Guittot
0 siblings, 0 replies; 71+ messages in thread
From: Vincent Guittot @ 2017-03-27 6:59 UTC (permalink / raw)
To: Joel Fernandes
Cc: Patrick Bellasi, Rafael J. Wysocki, Viresh Kumar, Linux PM, LKML,
Peter Zijlstra, Srinivas Pandruvada, Juri Lelli,
Morten Rasmussen, Ingo Molnar
On 25 March 2017 at 04:48, Joel Fernandes <joelaf@google.com> wrote:
> Hi Vincent,
>
> On Thu, Mar 23, 2017 at 3:08 PM, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> [..]
>>>>
>>>>> So I'm not really aligned with the description of your problem: PELT
>>>>> metric underestimates the load of the CPU. The PELT is just about
>>>>> tracking CFS task utilization but not whole CPU utilization and
>>>>> according to your description of the problem (time stolen by irq),
>>>>> your problem doesn't come from an underestimation of CFS task but from
>>>>> time spent in something else but not accounted in the value used by
>>>>> schedutil
>>>>
>>>> Quite likely. Indeed, it can really be that the CFS task is preempted
>>>> because of some RT activity generated by the IRQ handler.
>>>>
>>>> More in general, I've also noticed many suboptimal freq switches when
>>>> RT tasks interleave with CFS ones, because of:
>>>> - relatively long down _and up_ throttling times
>>>> - the way schedutil's flags are tracked and updated
>>>> - the callsites from where we call schedutil updates
>>>>
>>>> For example it can really happen that we are running at the highest
>>>> OPP because of some RT activity. Then we switch back to a relatively
>>>> low utilization CFS workload and then:
>>>> 1. a tick happens which produces a frequency drop
>>>
>>> Any idea why this frequency drop would happen? Say a running CFS task
>>> gets preempted by RT task, the PELT signal shouldn't drop for the
>>> duration the CFS task is preempted because the task is runnable, so
>>
>> utilization only tracks the running state but not runnable state.
>> Runnable state is tracked in load_avg
>
> Thanks. I got it now.
>
> Correct me if I'm wrong but strictly speaking utilization for a cfs_rq
> (which drives the frequency for CFS) still tracks the blocked/runnable
> time of tasks although its decayed as time moves forward. Only when we
> migrate the rq of a cfs task is the util_avg contribution removed from
> the rq. But I can see now why running RT can decay this load tracking
> signal.
Yes. you're right
>
> Regards,
> Joel
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-25 1:14 ` Sai Gurrappadi
2017-03-25 1:39 ` Rafael J. Wysocki
@ 2017-03-27 7:04 ` Vincent Guittot
2017-03-27 21:01 ` Sai Gurrappadi
1 sibling, 1 reply; 71+ messages in thread
From: Vincent Guittot @ 2017-03-27 7:04 UTC (permalink / raw)
To: Sai Gurrappadi
Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Patrick Bellasi,
Joel Fernandes, Morten Rasmussen, Ingo Molnar, Thomas Gleixner,
Peter Boonstoppel
On 25 March 2017 at 02:14, Sai Gurrappadi <sgurrappadi@nvidia.com> wrote:
> Hi Rafael,
>
> On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> The way the schedutil governor uses the PELT metric causes it to
>> underestimate the CPU utilization in some cases.
>>
>> That can be easily demonstrated by running kernel compilation on
>> a Sandy Bridge Intel processor, running turbostat in parallel with
>> it and looking at the values written to the MSR_IA32_PERF_CTL
>> register. Namely, the expected result would be that when all CPUs
>> were 100% busy, all of them would be requested to run in the maximum
>> P-state, but observation shows that this clearly isn't the case.
>> The CPUs run in the maximum P-state for a while and then are
>> requested to run slower and go back to the maximum P-state after
>> a while again. That causes the actual frequency of the processor to
>> visibly oscillate below the sustainable maximum in a jittery fashion
>> which clearly is not desirable.
>>
>> That has been attributed to CPU utilization metric updates on task
>> migration that cause the total utilization value for the CPU to be
>> reduced by the utilization of the migrated task. If that happens,
>> the schedutil governor may see a CPU utilization reduction and will
>> attempt to reduce the CPU frequency accordingly right away. That
>> may be premature, though, for example if the system is generally
>> busy and there are other runnable tasks waiting to be run on that
>> CPU already.
>>
>
> Thinking out loud a bit, I wonder if what you really want to do is basically:
>
> schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg);
>
> Where total_cpu_util_avg tracks the average utilization of the CPU itself over time (% of time the CPU was busy) in the same PELT like manner. The difference here is that it doesn't change instantaneously as tasks migrate in/out but it decays/accumulates just like the per-entity util_avgs.
But we loose the interest of immediate decrease when tasks migrate.
Instead of total_cpu_util_avg we should better track RT utilization in
the same manner so with ongoing work for deadline we will have :
total_utilization = cfs.util_avg + rt's util_avg + deadline's util avg
and we still take advantage of task migration effect
>
> Over time, total_cpu_util_avg and cfs_rq(cpu)->util_avg will tend towards each other the lesser the amount of 'overlap' / overloading.
>
> Yes, the above metric would 'overestimate' in case all tasks have migrated away and we are left with an idle CPU. A fix for that could be to just use the PELT value like so:
>
> schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, idle_cpu(cpu) ? 0 : total_cpu_util_avg);
>
> Note that the problem described here in the commit message doesn't need fully runnable threads, it just needs two threads to execute in parallel on the same CPU for a period of time. I don't think looking at just idle_calls necessarily covers all cases.
>
> Thoughts?
>
> Thanks,
> -Sai
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-27 7:04 ` Vincent Guittot
@ 2017-03-27 21:01 ` Sai Gurrappadi
2017-03-27 21:11 ` Rafael J. Wysocki
0 siblings, 1 reply; 71+ messages in thread
From: Sai Gurrappadi @ 2017-03-27 21:01 UTC (permalink / raw)
To: Vincent Guittot
Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra, LKML,
Srinivas Pandruvada, Viresh Kumar, Juri Lelli, Patrick Bellasi,
Joel Fernandes, Morten Rasmussen, Ingo Molnar, Thomas Gleixner,
Peter Boonstoppel
Hi Vincent,
On 03/27/2017 12:04 AM, Vincent Guittot wrote:
> On 25 March 2017 at 02:14, Sai Gurrappadi <sgurrappadi@nvidia.com> wrote:
>> Hi Rafael,
>>
>> On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> The way the schedutil governor uses the PELT metric causes it to
>>> underestimate the CPU utilization in some cases.
>>>
>>> That can be easily demonstrated by running kernel compilation on
>>> a Sandy Bridge Intel processor, running turbostat in parallel with
>>> it and looking at the values written to the MSR_IA32_PERF_CTL
>>> register. Namely, the expected result would be that when all CPUs
>>> were 100% busy, all of them would be requested to run in the maximum
>>> P-state, but observation shows that this clearly isn't the case.
>>> The CPUs run in the maximum P-state for a while and then are
>>> requested to run slower and go back to the maximum P-state after
>>> a while again. That causes the actual frequency of the processor to
>>> visibly oscillate below the sustainable maximum in a jittery fashion
>>> which clearly is not desirable.
>>>
>>> That has been attributed to CPU utilization metric updates on task
>>> migration that cause the total utilization value for the CPU to be
>>> reduced by the utilization of the migrated task. If that happens,
>>> the schedutil governor may see a CPU utilization reduction and will
>>> attempt to reduce the CPU frequency accordingly right away. That
>>> may be premature, though, for example if the system is generally
>>> busy and there are other runnable tasks waiting to be run on that
>>> CPU already.
>>>
>>
>> Thinking out loud a bit, I wonder if what you really want to do is basically:
>>
>> schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg);
>>
>> Where total_cpu_util_avg tracks the average utilization of the CPU itself over time (% of time the CPU was busy) in the same PELT like manner. The difference here is that it doesn't change instantaneously as tasks migrate in/out but it decays/accumulates just like the per-entity util_avgs.
>
> But we loose the interest of immediate decrease when tasks migrate.
Indeed, this is not ideal.
> Instead of total_cpu_util_avg we should better track RT utilization in
> the same manner so with ongoing work for deadline we will have :
> total_utilization = cfs.util_avg + rt's util_avg + deadline's util avg
> and we still take advantage of task migration effect
I agree that we need better tracking for RT and DL tasks but that doesn't solve the overloaded case with more than one CFS thread sharing a CPU.
In the overloaded case, we care not just about the instant where the migrate happens but also subsequent windows where the PELT metric is slowly ramping up to reflect the real utilization of a task now that it has a CPU to itself.
Maybe there are better ways to solve that though :-)
Thanks,
-Sai
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-27 21:01 ` Sai Gurrappadi
@ 2017-03-27 21:11 ` Rafael J. Wysocki
0 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-03-27 21:11 UTC (permalink / raw)
To: Sai Gurrappadi
Cc: Vincent Guittot, Rafael J. Wysocki, Linux PM, Peter Zijlstra,
LKML, Srinivas Pandruvada, Viresh Kumar, Juri Lelli,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar,
Thomas Gleixner, Peter Boonstoppel
On Mon, Mar 27, 2017 at 11:01 PM, Sai Gurrappadi <sgurrappadi@nvidia.com> wrote:
> Hi Vincent,
>
> On 03/27/2017 12:04 AM, Vincent Guittot wrote:
>> On 25 March 2017 at 02:14, Sai Gurrappadi <sgurrappadi@nvidia.com> wrote:
>>> Hi Rafael,
>>>
>>> On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote:
>>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>
>>>> The way the schedutil governor uses the PELT metric causes it to
>>>> underestimate the CPU utilization in some cases.
>>>>
>>>> That can be easily demonstrated by running kernel compilation on
>>>> a Sandy Bridge Intel processor, running turbostat in parallel with
>>>> it and looking at the values written to the MSR_IA32_PERF_CTL
>>>> register. Namely, the expected result would be that when all CPUs
>>>> were 100% busy, all of them would be requested to run in the maximum
>>>> P-state, but observation shows that this clearly isn't the case.
>>>> The CPUs run in the maximum P-state for a while and then are
>>>> requested to run slower and go back to the maximum P-state after
>>>> a while again. That causes the actual frequency of the processor to
>>>> visibly oscillate below the sustainable maximum in a jittery fashion
>>>> which clearly is not desirable.
>>>>
>>>> That has been attributed to CPU utilization metric updates on task
>>>> migration that cause the total utilization value for the CPU to be
>>>> reduced by the utilization of the migrated task. If that happens,
>>>> the schedutil governor may see a CPU utilization reduction and will
>>>> attempt to reduce the CPU frequency accordingly right away. That
>>>> may be premature, though, for example if the system is generally
>>>> busy and there are other runnable tasks waiting to be run on that
>>>> CPU already.
>>>>
>>>
>>> Thinking out loud a bit, I wonder if what you really want to do is basically:
>>>
>>> schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg);
>>>
>>> Where total_cpu_util_avg tracks the average utilization of the CPU itself over time (% of time the CPU was busy) in the same PELT like manner. The difference here is that it doesn't change instantaneously as tasks migrate in/out but it decays/accumulates just like the per-entity util_avgs.
>>
>> But we loose the interest of immediate decrease when tasks migrate.
>
> Indeed, this is not ideal.
>
>> Instead of total_cpu_util_avg we should better track RT utilization in
>> the same manner so with ongoing work for deadline we will have :
>> total_utilization = cfs.util_avg + rt's util_avg + deadline's util avg
>> and we still take advantage of task migration effect
>
> I agree that we need better tracking for RT and DL tasks but that doesn't solve the overloaded case with more than one CFS thread sharing a CPU.
>
> In the overloaded case, we care not just about the instant where the migrate happens but also subsequent windows where the PELT metric is slowly ramping up to reflect the real utilization of a task now that it has a CPU to itself.
>
> Maybe there are better ways to solve that though :-)
I wonder if it's viable to postpone the utilization update on both the
source and target runqueues until the task has been fully migrated?
That would make the artificial utilization reductions go away.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
` (4 preceding siblings ...)
2017-03-25 1:14 ` Sai Gurrappadi
@ 2017-05-08 3:49 ` Wanpeng Li
2017-05-08 4:01 ` Viresh Kumar
5 siblings, 1 reply; 71+ messages in thread
From: Wanpeng Li @ 2017-05-08 3:49 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, Peter Zijlstra, LKML, Srinivas Pandruvada,
Viresh Kumar, Juri Lelli, Vincent Guittot, Patrick Bellasi,
Joel Fernandes, Morten Rasmussen, Ingo Molnar, Thomas Gleixner
Hi Rafael,
2017-03-22 7:08 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The way the schedutil governor uses the PELT metric causes it to
> underestimate the CPU utilization in some cases.
>
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register. Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again. That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
>
> That has been attributed to CPU utilization metric updates on task
> migration that cause the total utilization value for the CPU to be
> reduced by the utilization of the migrated task. If that happens,
> the schedutil governor may see a CPU utilization reduction and will
> attempt to reduce the CPU frequency accordingly right away. That
> may be premature, though, for example if the system is generally
> busy and there are other runnable tasks waiting to be run on that
> CPU already.
>
> This is unlikely to be an issue on systems where cpufreq policies are
> shared between multiple CPUs, because in those cases the policy
> utilization is computed as the maximum of the CPU utilization values
Sorry for one question maybe not associated with this patch. If the
cpufreq policy is shared between multiple CPUs, the function
intel_cpufreq_target() just updates IA32_PERF_CTL MSR of the cpu
which is managing this policy, I wonder whether other cpus which are
affected should also update their per-logical cpu's IA32_PERF_CTL MSR?
Regards,
Wanpeng Li
> over the whole policy and if that turns out to be low, reducing the
> frequency for the policy most likely is a good idea anyway. On
> systems with one CPU per policy, however, it may affect performance
> adversely and even lead to increased energy consumption in some cases.
>
> On those systems it may be addressed by taking another utilization
> metric into consideration, like whether or not the CPU whose
> frequency is about to be reduced has been idle recently, because if
> that's not the case, the CPU is likely to be busy in the near future
> and its frequency should not be reduced.
>
> To that end, use the counter of idle calls in the timekeeping code.
> Namely, make the schedutil governor look at that counter for the
> current CPU every time before its frequency is about to be reduced.
> If the counter has not changed since the previous iteration of the
> governor computations for that CPU, the CPU has been busy for all
> that time and its frequency should not be decreased, so if the new
> frequency would be lower than the one set previously, the governor
> will skip the frequency update.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> include/linux/tick.h | 1 +
> kernel/sched/cpufreq_schedutil.c | 27 +++++++++++++++++++++++++++
> kernel/time/tick-sched.c | 12 ++++++++++++
> 3 files changed, 40 insertions(+)
>
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -61,6 +61,11 @@ struct sugov_cpu {
> unsigned long util;
> unsigned long max;
> unsigned int flags;
> +
> + /* The field below is for single-CPU policies only. */
> +#ifdef CONFIG_NO_HZ_COMMON
> + unsigned long saved_idle_calls;
> +#endif
> };
>
> static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> @@ -192,6 +197,19 @@ static void sugov_iowait_boost(struct su
> sg_cpu->iowait_boost >>= 1;
> }
>
> +#ifdef CONFIG_NO_HZ_COMMON
> +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> +{
> + unsigned long idle_calls = tick_nohz_get_idle_calls();
> + bool ret = idle_calls == sg_cpu->saved_idle_calls;
> +
> + sg_cpu->saved_idle_calls = idle_calls;
> + return ret;
> +}
> +#else
> +static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
> +#endif /* CONFIG_NO_HZ_COMMON */
> +
> static void sugov_update_single(struct update_util_data *hook, u64 time,
> unsigned int flags)
> {
> @@ -200,6 +218,7 @@ static void sugov_update_single(struct u
> struct cpufreq_policy *policy = sg_policy->policy;
> unsigned long util, max;
> unsigned int next_f;
> + bool busy;
>
> sugov_set_iowait_boost(sg_cpu, time, flags);
> sg_cpu->last_update = time;
> @@ -207,12 +226,20 @@ static void sugov_update_single(struct u
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> + busy = sugov_cpu_is_busy(sg_cpu);
> +
> if (flags & SCHED_CPUFREQ_RT_DL) {
> next_f = policy->cpuinfo.max_freq;
> } else {
> sugov_get_util(&util, &max);
> sugov_iowait_boost(sg_cpu, &util, &max);
> next_f = get_next_freq(sg_policy, util, max);
> + /*
> + * Do not reduce the frequency if the CPU has not been idle
> + * recently, as the reduction is likely to be premature then.
> + */
> + if (busy && next_f < sg_policy->next_freq)
> + next_f = sg_policy->next_freq;
> }
> sugov_update_commit(sg_policy, time, next_f);
> }
> Index: linux-pm/include/linux/tick.h
> ===================================================================
> --- linux-pm.orig/include/linux/tick.h
> +++ linux-pm/include/linux/tick.h
> @@ -117,6 +117,7 @@ extern void tick_nohz_idle_enter(void);
> extern void tick_nohz_idle_exit(void);
> extern void tick_nohz_irq_exit(void);
> extern ktime_t tick_nohz_get_sleep_length(void);
> +extern unsigned long tick_nohz_get_idle_calls(void);
> extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> #else /* !CONFIG_NO_HZ_COMMON */
> Index: linux-pm/kernel/time/tick-sched.c
> ===================================================================
> --- linux-pm.orig/kernel/time/tick-sched.c
> +++ linux-pm/kernel/time/tick-sched.c
> @@ -993,6 +993,18 @@ ktime_t tick_nohz_get_sleep_length(void)
> return ts->sleep_length;
> }
>
> +/**
> + * tick_nohz_get_idle_calls - return the current idle calls counter value
> + *
> + * Called from the schedutil frequency scaling governor in scheduler context.
> + */
> +unsigned long tick_nohz_get_idle_calls(void)
> +{
> + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> +
> + return ts->idle_calls;
> +}
> +
> static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
> {
> #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-05-08 3:49 ` Wanpeng Li
@ 2017-05-08 4:01 ` Viresh Kumar
2017-05-08 5:15 ` Wanpeng Li
2017-05-08 22:16 ` Rafael J. Wysocki
0 siblings, 2 replies; 71+ messages in thread
From: Viresh Kumar @ 2017-05-08 4:01 UTC (permalink / raw)
To: Wanpeng Li
Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra, LKML,
Srinivas Pandruvada, Juri Lelli, Vincent Guittot,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar,
Thomas Gleixner
On 08-05-17, 11:49, Wanpeng Li wrote:
> Hi Rafael,
> 2017-03-22 7:08 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The way the schedutil governor uses the PELT metric causes it to
> > underestimate the CPU utilization in some cases.
> >
> > That can be easily demonstrated by running kernel compilation on
> > a Sandy Bridge Intel processor, running turbostat in parallel with
> > it and looking at the values written to the MSR_IA32_PERF_CTL
> > register. Namely, the expected result would be that when all CPUs
> > were 100% busy, all of them would be requested to run in the maximum
> > P-state, but observation shows that this clearly isn't the case.
> > The CPUs run in the maximum P-state for a while and then are
> > requested to run slower and go back to the maximum P-state after
> > a while again. That causes the actual frequency of the processor to
> > visibly oscillate below the sustainable maximum in a jittery fashion
> > which clearly is not desirable.
> >
> > That has been attributed to CPU utilization metric updates on task
> > migration that cause the total utilization value for the CPU to be
> > reduced by the utilization of the migrated task. If that happens,
> > the schedutil governor may see a CPU utilization reduction and will
> > attempt to reduce the CPU frequency accordingly right away. That
> > may be premature, though, for example if the system is generally
> > busy and there are other runnable tasks waiting to be run on that
> > CPU already.
> >
> > This is unlikely to be an issue on systems where cpufreq policies are
> > shared between multiple CPUs, because in those cases the policy
> > utilization is computed as the maximum of the CPU utilization values
>
> Sorry for one question maybe not associated with this patch. If the
> cpufreq policy is shared between multiple CPUs, the function
> intel_cpufreq_target() just updates IA32_PERF_CTL MSR of the cpu
> which is managing this policy, I wonder whether other cpus which are
> affected should also update their per-logical cpu's IA32_PERF_CTL MSR?
The CPUs share the policy when they share their freq/voltage rails and so
changing perf state of one CPU should result in that changing for all the CPUs
in that policy. Otherwise, they can't be considered to be part of the same
policy.
That's why this code is changing it only for policy->cpu alone.
--
viresh
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-05-08 4:01 ` Viresh Kumar
@ 2017-05-08 5:15 ` Wanpeng Li
2017-05-08 22:16 ` Rafael J. Wysocki
1 sibling, 0 replies; 71+ messages in thread
From: Wanpeng Li @ 2017-05-08 5:15 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra, LKML,
Srinivas Pandruvada, Juri Lelli, Vincent Guittot,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar,
Thomas Gleixner
2017-05-08 12:01 GMT+08:00 Viresh Kumar <viresh.kumar@linaro.org>:
> On 08-05-17, 11:49, Wanpeng Li wrote:
>> Hi Rafael,
>> 2017-03-22 7:08 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >
>> > The way the schedutil governor uses the PELT metric causes it to
>> > underestimate the CPU utilization in some cases.
>> >
>> > That can be easily demonstrated by running kernel compilation on
>> > a Sandy Bridge Intel processor, running turbostat in parallel with
>> > it and looking at the values written to the MSR_IA32_PERF_CTL
>> > register. Namely, the expected result would be that when all CPUs
>> > were 100% busy, all of them would be requested to run in the maximum
>> > P-state, but observation shows that this clearly isn't the case.
>> > The CPUs run in the maximum P-state for a while and then are
>> > requested to run slower and go back to the maximum P-state after
>> > a while again. That causes the actual frequency of the processor to
>> > visibly oscillate below the sustainable maximum in a jittery fashion
>> > which clearly is not desirable.
>> >
>> > That has been attributed to CPU utilization metric updates on task
>> > migration that cause the total utilization value for the CPU to be
>> > reduced by the utilization of the migrated task. If that happens,
>> > the schedutil governor may see a CPU utilization reduction and will
>> > attempt to reduce the CPU frequency accordingly right away. That
>> > may be premature, though, for example if the system is generally
>> > busy and there are other runnable tasks waiting to be run on that
>> > CPU already.
>> >
>> > This is unlikely to be an issue on systems where cpufreq policies are
>> > shared between multiple CPUs, because in those cases the policy
>> > utilization is computed as the maximum of the CPU utilization values
>>
>> Sorry for one question maybe not associated with this patch. If the
>> cpufreq policy is shared between multiple CPUs, the function
>> intel_cpufreq_target() just updates IA32_PERF_CTL MSR of the cpu
>> which is managing this policy, I wonder whether other cpus which are
>> affected should also update their per-logical cpu's IA32_PERF_CTL MSR?
>
> The CPUs share the policy when they share their freq/voltage rails and so
> changing perf state of one CPU should result in that changing for all the CPUs
> in that policy. Otherwise, they can't be considered to be part of the same
> policy.
>
> That's why this code is changing it only for policy->cpu alone.
I see, thanks for the explanation.
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-05-08 4:01 ` Viresh Kumar
2017-05-08 5:15 ` Wanpeng Li
@ 2017-05-08 22:16 ` Rafael J. Wysocki
2017-05-08 22:36 ` Wanpeng Li
1 sibling, 1 reply; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-05-08 22:16 UTC (permalink / raw)
To: Viresh Kumar
Cc: Wanpeng Li, Linux PM, Peter Zijlstra, LKML, Srinivas Pandruvada,
Juri Lelli, Vincent Guittot, Patrick Bellasi, Joel Fernandes,
Morten Rasmussen, Ingo Molnar, Thomas Gleixner
On Monday, May 08, 2017 09:31:19 AM Viresh Kumar wrote:
> On 08-05-17, 11:49, Wanpeng Li wrote:
> > Hi Rafael,
> > 2017-03-22 7:08 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > The way the schedutil governor uses the PELT metric causes it to
> > > underestimate the CPU utilization in some cases.
> > >
> > > That can be easily demonstrated by running kernel compilation on
> > > a Sandy Bridge Intel processor, running turbostat in parallel with
> > > it and looking at the values written to the MSR_IA32_PERF_CTL
> > > register. Namely, the expected result would be that when all CPUs
> > > were 100% busy, all of them would be requested to run in the maximum
> > > P-state, but observation shows that this clearly isn't the case.
> > > The CPUs run in the maximum P-state for a while and then are
> > > requested to run slower and go back to the maximum P-state after
> > > a while again. That causes the actual frequency of the processor to
> > > visibly oscillate below the sustainable maximum in a jittery fashion
> > > which clearly is not desirable.
> > >
> > > That has been attributed to CPU utilization metric updates on task
> > > migration that cause the total utilization value for the CPU to be
> > > reduced by the utilization of the migrated task. If that happens,
> > > the schedutil governor may see a CPU utilization reduction and will
> > > attempt to reduce the CPU frequency accordingly right away. That
> > > may be premature, though, for example if the system is generally
> > > busy and there are other runnable tasks waiting to be run on that
> > > CPU already.
> > >
> > > This is unlikely to be an issue on systems where cpufreq policies are
> > > shared between multiple CPUs, because in those cases the policy
> > > utilization is computed as the maximum of the CPU utilization values
> >
> > Sorry for one question maybe not associated with this patch. If the
> > cpufreq policy is shared between multiple CPUs, the function
> > intel_cpufreq_target() just updates IA32_PERF_CTL MSR of the cpu
> > which is managing this policy, I wonder whether other cpus which are
> > affected should also update their per-logical cpu's IA32_PERF_CTL MSR?
>
> The CPUs share the policy when they share their freq/voltage rails and so
> changing perf state of one CPU should result in that changing for all the CPUs
> in that policy. Otherwise, they can't be considered to be part of the same
> policy.
To be entirely precise, this depends on the granularity of the HW interface.
If the interface is per-logical-CPU, we will use it this way for efficiency
reasons and even if there is some coordination on the HW side, the information
on how exactly it works usually is limited.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-05-08 22:16 ` Rafael J. Wysocki
@ 2017-05-08 22:36 ` Wanpeng Li
2017-05-08 23:01 ` Rafael J. Wysocki
0 siblings, 1 reply; 71+ messages in thread
From: Wanpeng Li @ 2017-05-08 22:36 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Viresh Kumar, Linux PM, Peter Zijlstra, LKML,
Srinivas Pandruvada, Juri Lelli, Vincent Guittot,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar,
Thomas Gleixner
2017-05-09 6:16 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> On Monday, May 08, 2017 09:31:19 AM Viresh Kumar wrote:
>> On 08-05-17, 11:49, Wanpeng Li wrote:
>> > Hi Rafael,
>> > 2017-03-22 7:08 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
>> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > >
>> > > The way the schedutil governor uses the PELT metric causes it to
>> > > underestimate the CPU utilization in some cases.
>> > >
>> > > That can be easily demonstrated by running kernel compilation on
>> > > a Sandy Bridge Intel processor, running turbostat in parallel with
>> > > it and looking at the values written to the MSR_IA32_PERF_CTL
>> > > register. Namely, the expected result would be that when all CPUs
>> > > were 100% busy, all of them would be requested to run in the maximum
>> > > P-state, but observation shows that this clearly isn't the case.
>> > > The CPUs run in the maximum P-state for a while and then are
>> > > requested to run slower and go back to the maximum P-state after
>> > > a while again. That causes the actual frequency of the processor to
>> > > visibly oscillate below the sustainable maximum in a jittery fashion
>> > > which clearly is not desirable.
>> > >
>> > > That has been attributed to CPU utilization metric updates on task
>> > > migration that cause the total utilization value for the CPU to be
>> > > reduced by the utilization of the migrated task. If that happens,
>> > > the schedutil governor may see a CPU utilization reduction and will
>> > > attempt to reduce the CPU frequency accordingly right away. That
>> > > may be premature, though, for example if the system is generally
>> > > busy and there are other runnable tasks waiting to be run on that
>> > > CPU already.
>> > >
>> > > This is unlikely to be an issue on systems where cpufreq policies are
>> > > shared between multiple CPUs, because in those cases the policy
>> > > utilization is computed as the maximum of the CPU utilization values
>> >
>> > Sorry for one question maybe not associated with this patch. If the
>> > cpufreq policy is shared between multiple CPUs, the function
>> > intel_cpufreq_target() just updates IA32_PERF_CTL MSR of the cpu
>> > which is managing this policy, I wonder whether other cpus which are
>> > affected should also update their per-logical cpu's IA32_PERF_CTL MSR?
>>
>> The CPUs share the policy when they share their freq/voltage rails and so
>> changing perf state of one CPU should result in that changing for all the CPUs
>> in that policy. Otherwise, they can't be considered to be part of the same
>> policy.
>
> To be entirely precise, this depends on the granularity of the HW interface.
>
> If the interface is per-logical-CPU, we will use it this way for efficiency
> reasons and even if there is some coordination on the HW side, the information
> on how exactly it works usually is limited.
I check it on several Xeon servers on hand, however, I didn't find
/sys/devices/system/cpu/cpufreq/policyx/affected_cpus can affect more
than one logical cpu, so I guess most of Xeon servers are not support
shared cpufreq policy, then which kind of boxes support that?
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
2017-05-08 22:36 ` Wanpeng Li
@ 2017-05-08 23:01 ` Rafael J. Wysocki
0 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2017-05-08 23:01 UTC (permalink / raw)
To: Wanpeng Li
Cc: Viresh Kumar, Linux PM, Peter Zijlstra, LKML,
Srinivas Pandruvada, Juri Lelli, Vincent Guittot,
Patrick Bellasi, Joel Fernandes, Morten Rasmussen, Ingo Molnar,
Thomas Gleixner
On Tuesday, May 09, 2017 06:36:14 AM Wanpeng Li wrote:
> 2017-05-09 6:16 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> > On Monday, May 08, 2017 09:31:19 AM Viresh Kumar wrote:
> >> On 08-05-17, 11:49, Wanpeng Li wrote:
> >> > Hi Rafael,
> >> > 2017-03-22 7:08 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> >> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> > >
> >> > > The way the schedutil governor uses the PELT metric causes it to
> >> > > underestimate the CPU utilization in some cases.
> >> > >
> >> > > That can be easily demonstrated by running kernel compilation on
> >> > > a Sandy Bridge Intel processor, running turbostat in parallel with
> >> > > it and looking at the values written to the MSR_IA32_PERF_CTL
> >> > > register. Namely, the expected result would be that when all CPUs
> >> > > were 100% busy, all of them would be requested to run in the maximum
> >> > > P-state, but observation shows that this clearly isn't the case.
> >> > > The CPUs run in the maximum P-state for a while and then are
> >> > > requested to run slower and go back to the maximum P-state after
> >> > > a while again. That causes the actual frequency of the processor to
> >> > > visibly oscillate below the sustainable maximum in a jittery fashion
> >> > > which clearly is not desirable.
> >> > >
> >> > > That has been attributed to CPU utilization metric updates on task
> >> > > migration that cause the total utilization value for the CPU to be
> >> > > reduced by the utilization of the migrated task. If that happens,
> >> > > the schedutil governor may see a CPU utilization reduction and will
> >> > > attempt to reduce the CPU frequency accordingly right away. That
> >> > > may be premature, though, for example if the system is generally
> >> > > busy and there are other runnable tasks waiting to be run on that
> >> > > CPU already.
> >> > >
> >> > > This is unlikely to be an issue on systems where cpufreq policies are
> >> > > shared between multiple CPUs, because in those cases the policy
> >> > > utilization is computed as the maximum of the CPU utilization values
> >> >
> >> > Sorry for one question maybe not associated with this patch. If the
> >> > cpufreq policy is shared between multiple CPUs, the function
> >> > intel_cpufreq_target() just updates IA32_PERF_CTL MSR of the cpu
> >> > which is managing this policy, I wonder whether other cpus which are
> >> > affected should also update their per-logical cpu's IA32_PERF_CTL MSR?
> >>
> >> The CPUs share the policy when they share their freq/voltage rails and so
> >> changing perf state of one CPU should result in that changing for all the CPUs
> >> in that policy. Otherwise, they can't be considered to be part of the same
> >> policy.
> >
> > To be entirely precise, this depends on the granularity of the HW interface.
> >
> > If the interface is per-logical-CPU, we will use it this way for efficiency
> > reasons and even if there is some coordination on the HW side, the information
> > on how exactly it works usually is limited.
>
> I check it on several Xeon servers on hand, however, I didn't find
> /sys/devices/system/cpu/cpufreq/policyx/affected_cpus can affect more
> than one logical cpu, so I guess most of Xeon servers are not support
> shared cpufreq policy, then which kind of boxes support that?
On Intel the interface for performance scaling is per-logical-CPU in general.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 71+ messages in thread
end of thread, other threads:[~2017-05-08 23:08 UTC | newest]
Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-19 13:21 [PATCH 0/2] cpufreq: schedutil: Fix and optimization Rafael J. Wysocki
2017-03-19 13:30 ` [PATCH 1/2] cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start() Rafael J. Wysocki
2017-03-20 3:28 ` Viresh Kumar
2017-03-20 12:36 ` Rafael J. Wysocki
2017-03-19 13:34 ` [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Rafael J. Wysocki
2017-03-19 21:24 ` Rafael J. Wysocki
2017-03-19 21:42 ` Rafael J. Wysocki
2017-03-20 10:38 ` Peter Zijlstra
2017-03-20 12:31 ` Rafael J. Wysocki
2017-03-20 3:57 ` Viresh Kumar
2017-03-20 8:26 ` Vincent Guittot
2017-03-20 12:34 ` Patrick Bellasi
2017-03-22 23:56 ` Joel Fernandes
2017-03-23 22:08 ` Vincent Guittot
2017-03-25 3:48 ` Joel Fernandes
2017-03-27 6:59 ` Vincent Guittot
2017-03-20 12:59 ` Rafael J. Wysocki
2017-03-20 13:20 ` Vincent Guittot
2017-03-20 12:48 ` Rafael J. Wysocki
2017-03-20 10:36 ` Peter Zijlstra
2017-03-20 12:35 ` Rafael J. Wysocki
2017-03-20 12:50 ` Peter Zijlstra
2017-03-20 13:04 ` Rafael J. Wysocki
2017-03-20 13:06 ` Patrick Bellasi
2017-03-20 13:05 ` Rafael J. Wysocki
2017-03-20 14:13 ` Patrick Bellasi
2017-03-20 21:46 ` [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of " Rafael J. Wysocki
2017-03-21 6:40 ` Viresh Kumar
2017-03-21 12:30 ` Rafael J. Wysocki
2017-03-21 8:50 ` Vincent Guittot
2017-03-21 11:56 ` Patrick Bellasi
2017-03-21 13:22 ` Peter Zijlstra
2017-03-21 13:37 ` Vincent Guittot
2017-03-21 14:03 ` Peter Zijlstra
2017-03-21 14:18 ` Vincent Guittot
2017-03-21 14:25 ` Patrick Bellasi
[not found] ` <CAKfTPtALorn7HNpz4LOfWWSc3u+9y5iHB5byzfTHGQXDA+tVJQ@mail.gmail.com>
2017-03-21 14:58 ` Peter Zijlstra
2017-03-21 17:00 ` Vincent Guittot
2017-03-21 17:01 ` Vincent Guittot
2017-03-21 14:26 ` Rafael J. Wysocki
2017-03-21 14:38 ` Patrick Bellasi
2017-03-21 14:46 ` Rafael J. Wysocki
2017-03-21 14:50 ` Rafael J. Wysocki
2017-03-21 15:04 ` Peter Zijlstra
2017-03-21 15:18 ` Rafael J. Wysocki
2017-03-21 17:00 ` Peter Zijlstra
2017-03-21 17:17 ` Rafael J. Wysocki
2017-03-21 15:08 ` Patrick Bellasi
2017-03-21 15:18 ` Peter Zijlstra
2017-03-21 19:28 ` Patrick Bellasi
2017-03-21 15:02 ` Peter Zijlstra
2017-03-21 11:50 ` Patrick Bellasi
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
2017-03-22 9:26 ` Peter Zijlstra
2017-03-22 9:54 ` Viresh Kumar
2017-03-23 1:04 ` Joel Fernandes
2017-03-23 19:26 ` Sai Gurrappadi
2017-03-23 20:48 ` Sai Gurrappadi
2017-03-24 1:39 ` Rafael J. Wysocki
2017-03-24 19:08 ` Sai Gurrappadi
2017-03-25 1:14 ` Sai Gurrappadi
2017-03-25 1:39 ` Rafael J. Wysocki
2017-03-27 7:04 ` Vincent Guittot
2017-03-27 21:01 ` Sai Gurrappadi
2017-03-27 21:11 ` Rafael J. Wysocki
2017-05-08 3:49 ` Wanpeng Li
2017-05-08 4:01 ` Viresh Kumar
2017-05-08 5:15 ` Wanpeng Li
2017-05-08 22:16 ` Rafael J. Wysocki
2017-05-08 22:36 ` Wanpeng Li
2017-05-08 23:01 ` Rafael J. Wysocki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.