* [PATCH 0/4] intel_pstate fixes for v3.16 @ 2014-05-29 16:32 dirk.brandewie 2014-05-29 16:32 ` [PATCH 1/4] intel_pstate: Remove C0 tracking dirk.brandewie ` (4 more replies) 0 siblings, 5 replies; 10+ messages in thread From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw) To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, Dirk Brandewie From: Dirk Brandewie <dirk.j.brandewie@intel.com> Based on Rafael's linux-next branch at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git Commit 0c932c3ecbba60bc5f9e08d579891cdbedfdd1c1 Fixes the regression where intel_pstate was being too reactive reducing the P state caused by the C0 tracking which was added to ensure that on an idle system intel_pstate converge on the lowest reasonable P state. The C0 tracking mechanism is replaced with a mechanism that looks at sample time and scales the busy calculation only if the sample time is *much* larger than the requested sample time. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75121 https://lkml.org/lkml/2014/5/6/597 Dirk Brandewie (3): intel_pstate: Remove C0 tracking intel_pstate: Correct rounding in busy calculation intel_pstate: add sample time scaling Doug Smythies (1): intel_pstate: Improve initial busy calculation drivers/cpufreq/intel_pstate.c | 52 +++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 21 deletions(-) -- 1.9.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/4] intel_pstate: Remove C0 tracking 2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie @ 2014-05-29 16:32 ` dirk.brandewie 2014-05-29 16:32 ` [PATCH 2/4] intel_pstate: Correct rounding in busy calculation dirk.brandewie ` (3 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw) To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, Dirk Brandewie, stable From: Dirk Brandewie <dirk.j.brandewie@intel.com> Commit fcb6a15c intel_pstate: Take core C0 time into account for core busy introduced a regression referenced below. The issue with "lockup" after suspend that this commit was addressing is now dealt with in the suspend path. References: https://bugzilla.kernel.org/show_bug.cgi?id=66581 https://bugzilla.kernel.org/show_bug.cgi?id=75121 Reported-by: Doug Smythies <dsmythies@telus.net> Cc: <stable@vger.kernel.org> # 3.14.x Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- drivers/cpufreq/intel_pstate.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index a6d5afa..ffef765 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -57,7 +57,6 @@ struct sample { int32_t core_pct_busy; u64 aperf; u64 mperf; - unsigned long long tsc; int freq; }; @@ -96,7 +95,6 @@ struct cpudata { u64 prev_aperf; u64 prev_mperf; - unsigned long long prev_tsc; struct sample sample; }; @@ -555,45 +553,36 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu) { struct sample *sample = &cpu->sample; int32_t core_pct; - int32_t c0_pct; core_pct = div_fp(int_tofp(sample->aperf), int_tofp(sample->mperf)); core_pct = mul_fp(core_pct, int_tofp(100)); FP_ROUNDUP(core_pct); - c0_pct = div_fp(int_tofp(sample->mperf), int_tofp(sample->tsc)); - sample->freq = fp_toint( mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct)); - sample->core_pct_busy = mul_fp(core_pct, c0_pct); + sample->core_pct_busy = core_pct; } static inline void intel_pstate_sample(struct cpudata *cpu) { u64 aperf, mperf; - unsigned long long tsc; rdmsrl(MSR_IA32_APERF, aperf); rdmsrl(MSR_IA32_MPERF, mperf); - tsc = native_read_tsc(); aperf = aperf >> FRAC_BITS; mperf = mperf >> FRAC_BITS; - tsc = tsc >> FRAC_BITS; cpu->sample.aperf = aperf; cpu->sample.mperf = mperf; - cpu->sample.tsc = tsc; cpu->sample.aperf -= cpu->prev_aperf; cpu->sample.mperf -= cpu->prev_mperf; - cpu->sample.tsc -= cpu->prev_tsc; intel_pstate_calc_busy(cpu); cpu->prev_aperf = aperf; cpu->prev_mperf = mperf; - cpu->prev_tsc = tsc; } static inline void intel_pstate_set_sample_time(struct cpudata *cpu) -- 1.9.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/4] intel_pstate: Correct rounding in busy calculation 2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie 2014-05-29 16:32 ` [PATCH 1/4] intel_pstate: Remove C0 tracking dirk.brandewie @ 2014-05-29 16:32 ` dirk.brandewie 2014-06-15 14:44 ` Doug Smythies 2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie ` (2 subsequent siblings) 4 siblings, 1 reply; 10+ messages in thread From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw) To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, Dirk Brandewie, stable From: Dirk Brandewie <dirk.j.brandewie@intel.com> Changing to fixed point math throughout the busy calculation in commit e66c1768 Change busy calculation to use fixed point math. Introduced some inaccuracies by rounding the busy value at two points in the calculation. This change removes roundings and moves the rounding to the output of the PID where the calculations are complete and the value returned as an integer. Reported-by: Doug Smythies <dsmythies@telus.net> Cc: <stable@vger.kernel.org> # 3.14.x Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- drivers/cpufreq/intel_pstate.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index ffef765..db8a992 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -38,10 +38,10 @@ #define BYT_TURBO_VIDS 0x66d -#define FRAC_BITS 6 +#define FRAC_BITS 8 #define int_tofp(X) ((int64_t)(X) << FRAC_BITS) #define fp_toint(X) ((X) >> FRAC_BITS) -#define FP_ROUNDUP(X) ((X) += 1 << FRAC_BITS) + static inline int32_t mul_fp(int32_t x, int32_t y) { @@ -194,7 +194,10 @@ static signed int pid_calc(struct _pid *pid, int32_t busy) pid->last_err = fp_error; result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm; - + if (result >= 0) + result = result + (1 << (FRAC_BITS-1)); + else + result = result - (1 << (FRAC_BITS-1)); return (signed int)fp_toint(result); } @@ -556,7 +559,6 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu) core_pct = div_fp(int_tofp(sample->aperf), int_tofp(sample->mperf)); core_pct = mul_fp(core_pct, int_tofp(100)); - FP_ROUNDUP(core_pct); sample->freq = fp_toint( mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct)); @@ -602,7 +604,7 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu) max_pstate = int_tofp(cpu->pstate.max_pstate); current_pstate = int_tofp(cpu->pstate.current_pstate); core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate)); - return FP_ROUNDUP(core_busy); + return core_busy; } static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu) -- 1.9.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: [PATCH 2/4] intel_pstate: Correct rounding in busy calculation 2014-05-29 16:32 ` [PATCH 2/4] intel_pstate: Correct rounding in busy calculation dirk.brandewie @ 2014-06-15 14:44 ` Doug Smythies 2014-06-15 22:46 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Doug Smythies @ 2014-06-15 14:44 UTC (permalink / raw) To: dirk.brandewie, linux-pm; +Cc: rjw, 'Dirk Brandewie', stable On 2014.05.29 09:32 Dirk Brandewie wrote: There is a mistake in this patch. Even though the patch is from Dirk, the mistake is my fault, sorry. Severity: Not very serious, but can increase target pstate by one extra value. For real world work flows the issue should self correct (but I have no proof). It is the equivalent of different PID gains for positive and negative numbers. Why this e-mail? Why not just submit a patch? Because I do not yet know how, particularly for v3.16 This part of the patch: result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm; - + if (result >= 0) + result = result + (1 << (FRAC_BITS-1)); + else + result = result - (1 << (FRAC_BITS-1)); return (signed int)fp_toint(result); Should actually be this: result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm; + result = result + (1 << (FRAC_BITS-1)); return (signed int)fp_toint(result); Proof method 1: Use "perf record" and verify all the math by hand. Realize that some numbers are not what is expected, and work backwards. Proof method 2: make a table of all the numbers. I.E. /****************************************************************************/ /* /* round_truth_table.c 2014.06.15 Smythies /* It seems I introduced a mistake into intel_pstate.c. /* Since my brain hurts from all the thinking it through, verify /* by mindlessly generating all possibilities. /* /****************************************************************************/ #include <stdio.h> #include <stdlib.h> main(){ long int count, mistake_way; for(count = 1026; count >= -1026; count--){ /* from 4.008 to -4.008 in steps of 1 / 256ths */ if(count >= 0){ mistake_way = (count + (1 << 7)) >> 8; } else { mistake_way = (count - (1 << 7)) >> 8; } /* endif */ printf("%ld %ld %ld %ld %lf\n", count, count >> 8, mistake_way, (count + (1 << 7)) >> 8, (double)count / 256.0); } /* endfor */ } /* endprogram */ Which gives (edited): 1025 4 4 4 4.003906 1024 4 4 4 4.000000 1023 3 4 4 3.996094 897 3 4 4 3.503906 896 3 4 4 3.500000 895 3 3 3 3.496094 769 3 3 3 3.003906 768 3 3 3 3.000000 767 2 3 3 2.996094 641 2 3 3 2.503906 640 2 3 3 2.500000 639 2 2 2 2.496094 513 2 2 2 2.003906 512 2 2 2 2.000000 511 1 2 2 1.996094 385 1 2 2 1.503906 384 1 2 2 1.500000 383 1 1 1 1.496094 257 1 1 1 1.003906 256 1 1 1 1.000000 255 0 1 1 0.996094 129 0 1 1 0.503906 128 0 1 1 0.500000 127 0 0 0 0.496094 1 0 0 0 0.003906 0 0 0 0 0.000000 -1 -1 -1 0 -0.003906 -127 -1 -1 0 -0.496094 -128 -1 -1 0 -0.500000 -129 -1 -2 -1 -0.503906 -255 -1 -2 -1 -0.996094 -256 -1 -2 -1 -1.000000 -257 -2 -2 -1 -1.003906 -383 -2 -2 -1 -1.496094 -384 -2 -2 -1 -1.500000 -385 -2 -3 -2 -1.503906 -511 -2 -3 -2 -1.996094 -512 -2 -3 -2 -2.000000 -513 -3 -3 -2 -2.003906 -639 -3 -3 -2 -2.496094 -640 -3 -3 -2 -2.500000 -641 -3 -4 -3 -2.503906 -767 -3 -4 -3 -2.996094 -768 -3 -4 -3 -3.000000 -769 -4 -4 -3 -3.003906 -895 -4 -4 -3 -3.496094 -896 -4 -4 -3 -3.500000 -897 -4 -5 -4 -3.503906 -1024 -4 -5 -4 -4.000000 -1025 -5 -5 -4 -4.003906 -1026 -5 -5 -4 -4.007812 > From: Dirk Brandewie <dirk.j.brandewie@intel.com> > > Changing to fixed point math throughout the busy calculation in > commit e66c1768 Change busy calculation to use fixed point > math. Introduced some inaccuracies by rounding the busy value at two > points in the calculation. This change removes roundings and moves > the rounding to the output of the PID where the calculations are > complete and the value returned as an integer. > > Reported-by: Doug Smythies <dsmythies@telus.net> > Cc: <stable@vger.kernel.org> # 3.14.x > Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> > --- > drivers/cpufreq/intel_pstate.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index ffef765..db8a992 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -38,10 +38,10 @@ > #define BYT_TURBO_VIDS 0x66d > > > -#define FRAC_BITS 6 > +#define FRAC_BITS 8 > #define int_tofp(X) ((int64_t)(X) << FRAC_BITS) > #define fp_toint(X) ((X) >> FRAC_BITS) > -#define FP_ROUNDUP(X) ((X) += 1 << FRAC_BITS) >+ > > static inline int32_t mul_fp(int32_t x, int32_t y) > { >@@ -194,7 +194,10 @@ static signed int pid_calc(struct _pid *pid, int32_t busy) > pid->last_err = fp_error; > > result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm; > - > + if (result >= 0) > + result = result + (1 << (FRAC_BITS-1)); > + else > + result = result - (1 << (FRAC_BITS-1)); > return (signed int)fp_toint(result); > } > > @@ -556,7 +559,6 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu) > > core_pct = div_fp(int_tofp(sample->aperf), int_tofp(sample->mperf)); > core_pct = mul_fp(core_pct, int_tofp(100)); > - FP_ROUNDUP(core_pct); > > sample->freq = fp_toint( > mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct)); > @@ -602,7 +604,7 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu) > max_pstate = int_tofp(cpu->pstate.max_pstate); > current_pstate = int_tofp(cpu->pstate.current_pstate); > core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate)); > - return FP_ROUNDUP(core_busy); > + return core_busy; > } > > static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu) > -- > 1.9.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] intel_pstate: Correct rounding in busy calculation 2014-06-15 14:44 ` Doug Smythies @ 2014-06-15 22:46 ` Rafael J. Wysocki 0 siblings, 0 replies; 10+ messages in thread From: Rafael J. Wysocki @ 2014-06-15 22:46 UTC (permalink / raw) To: Doug Smythies; +Cc: dirk.brandewie, linux-pm, 'Dirk Brandewie', stable On Sunday, June 15, 2014 07:44:47 AM Doug Smythies wrote: > On 2014.05.29 09:32 Dirk Brandewie wrote: > > There is a mistake in this patch. > Even though the patch is from Dirk, the mistake is my fault, sorry. > > Severity: Not very serious, but can increase target pstate by one extra value. > For real world work flows the issue should self correct (but I have no proof). > It is the equivalent of different PID gains for positive and negative numbers. > > Why this e-mail? Why not just submit a patch? > Because I do not yet know how, particularly for v3.16 Quite straightforward. Create a patch with a changelog saying that it fixes a bug, put the hash and subject of the commit that introduced the bug in that changelog and send it to linux-pm@vger.kernel.org with a CC to me (preferably at rjw@rjwysocki.net, but you can also use rafael@kernel.org if you like). I will take care of the patch from that point on. Rafael ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/4] intel_pstate: add sample time scaling 2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie 2014-05-29 16:32 ` [PATCH 1/4] intel_pstate: Remove C0 tracking dirk.brandewie 2014-05-29 16:32 ` [PATCH 2/4] intel_pstate: Correct rounding in busy calculation dirk.brandewie @ 2014-05-29 16:32 ` dirk.brandewie 2014-05-29 18:17 ` Yuyang Du 2014-06-09 15:58 ` Doug Smythies 2014-05-29 16:32 ` [PATCH 4/4] intel_pstate: Improve initial busy calculation dirk.brandewie [not found] ` <CADmjqpNoRjhO=BaRBXxONCWd8-i9tzDLGVF4fTBs4jfRtHbdxw@mail.gmail.com> 4 siblings, 2 replies; 10+ messages in thread From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw) To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, Dirk Brandewie, stable From: Dirk Brandewie <dirk.j.brandewie@intel.com> The PID assumes that samples are of equal time, which for a deferable timers this is not true when the system goes idle. This causes the PID to take a long time to converge to the min P state and depending on the pattern of the idle load can make the P state appear stuck. The hold-off value of three sample times before using the scaling is to give a grace period for applications that have high performance requirements and spend a lot of time idle, The poster child for this behavior is the ffmpeg benchmark in the Phoronix test suite. Cc: <stable@vger.kernel.org> # 3.14.x Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- drivers/cpufreq/intel_pstate.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index db8a992..c4dad16 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -58,6 +58,7 @@ struct sample { u64 aperf; u64 mperf; int freq; + ktime_t time; }; struct pstate_data { @@ -93,6 +94,7 @@ struct cpudata { struct vid_data vid; struct _pid pid; + ktime_t last_sample_time; u64 prev_aperf; u64 prev_mperf; struct sample sample; @@ -576,6 +578,8 @@ static inline void intel_pstate_sample(struct cpudata *cpu) aperf = aperf >> FRAC_BITS; mperf = mperf >> FRAC_BITS; + cpu->last_sample_time = cpu->sample.time; + cpu->sample.time = ktime_get(); cpu->sample.aperf = aperf; cpu->sample.mperf = mperf; cpu->sample.aperf -= cpu->prev_aperf; @@ -598,12 +602,24 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu) static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu) { - int32_t core_busy, max_pstate, current_pstate; + int32_t core_busy, max_pstate, current_pstate, sample_ratio; + u32 duration_us; + u32 sample_time; core_busy = cpu->sample.core_pct_busy; max_pstate = int_tofp(cpu->pstate.max_pstate); current_pstate = int_tofp(cpu->pstate.current_pstate); core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate)); + + sample_time = (pid_params.sample_rate_ms * USEC_PER_MSEC); + duration_us = (u32) ktime_us_delta(cpu->sample.time, + cpu->last_sample_time); + if (duration_us > sample_time * 3) { + sample_ratio = div_fp(int_tofp(sample_time), + int_tofp(duration_us)); + core_busy = mul_fp(core_busy, sample_ratio); + } + return core_busy; } -- 1.9.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/4] intel_pstate: add sample time scaling 2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie @ 2014-05-29 18:17 ` Yuyang Du 2014-06-09 15:58 ` Doug Smythies 1 sibling, 0 replies; 10+ messages in thread From: Yuyang Du @ 2014-05-29 18:17 UTC (permalink / raw) To: dirk.brandewie; +Cc: linux-pm, rjw, dsmythies, Dirk Brandewie, stable > static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu) > { > - int32_t core_busy, max_pstate, current_pstate; > + int32_t core_busy, max_pstate, current_pstate, sample_ratio; > + u32 duration_us; > + u32 sample_time; > > core_busy = cpu->sample.core_pct_busy; > max_pstate = int_tofp(cpu->pstate.max_pstate); > current_pstate = int_tofp(cpu->pstate.current_pstate); > core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate)); > + > + sample_time = (pid_params.sample_rate_ms * USEC_PER_MSEC); > + duration_us = (u32) ktime_us_delta(cpu->sample.time, > + cpu->last_sample_time); > + if (duration_us > sample_time * 3) { > + sample_ratio = div_fp(int_tofp(sample_time), > + int_tofp(duration_us)); > + core_busy = mul_fp(core_busy, sample_ratio); > + } > + > return core_busy; > } Hi Dirk, I am afraid I need to question again since you did not address my concern. So generally, this new patch will factor (sample_rate / duration) in (last_freq / last_request) if duration > 3*sample_time. This sample_rate / duration thing looks random. And otherwise it is still effectively performance governor behavior if what I reasoned before is right. So my opinion is, the C0 tracking is not the (true) root cause to perf regression. But if you really need this patch as a temp workaround, it is ok. Thanks, Yuyang ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 3/4] intel_pstate: add sample time scaling 2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie 2014-05-29 18:17 ` Yuyang Du @ 2014-06-09 15:58 ` Doug Smythies 1 sibling, 0 replies; 10+ messages in thread From: Doug Smythies @ 2014-06-09 15:58 UTC (permalink / raw) To: dirk.brandewie, linux-pm Cc: rjw, 'Dirk Brandewie', stable, Doug Smythies On 2014.05.29 09:32 Dirk wrote: > From: Dirk Brandewie <dirk.j.brandewie@intel.com> > The PID assumes that samples are of equal time, which for a deferable > timers this is not true when the system goes idle. This causes the > PID to take a long time to converge to the min P state and depending > on the pattern of the idle load can make the P state appear stuck. > The hold-off value of three sample times before using the scaling is > to give a grace period for applications that have high performance > requirements and spend a lot of time idle, The poster child for this > behavior is the ffmpeg benchmark in the Phoronix test suite. > Cc: <stable@vger.kernel.org> # 3.14.x > Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> > --- > drivers/cpufreq/intel_pstate.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index db8a992..c4dad16 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -58,6 +58,7 @@ struct sample { > u64 aperf; > u64 mperf; > int freq; > + ktime_t time; > }; > > struct pstate_data { > @@ -93,6 +94,7 @@ struct cpudata { > struct vid_data vid; > struct _pid pid; > > + ktime_t last_sample_time; > u64 prev_aperf; > u64 prev_mperf; > struct sample sample; > @@ -576,6 +578,8 @@ static inline void intel_pstate_sample(struct cpudata *cpu) > aperf = aperf >> FRAC_BITS; > mperf = mperf >> FRAC_BITS; > > + cpu->last_sample_time = cpu->sample.time; > + cpu->sample.time = ktime_get(); > cpu->sample.aperf = aperf; > cpu->sample.mperf = mperf; > cpu->sample.aperf -= cpu->prev_aperf; > @@ -598,12 +602,24 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu) > > static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu) > { > - int32_t core_busy, max_pstate, current_pstate; > + int32_t core_busy, max_pstate, current_pstate, sample_ratio; > + u32 duration_us; > + u32 sample_time; > > core_busy = cpu->sample.core_pct_busy; > max_pstate = int_tofp(cpu->pstate.max_pstate); > current_pstate = int_tofp(cpu->pstate.current_pstate); > core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate)); > + > + sample_time = (pid_params.sample_rate_ms * USEC_PER_MSEC); > + duration_us = (u32) ktime_us_delta(cpu->sample.time, > + cpu->last_sample_time); > + if (duration_us > sample_time * 3) { > + sample_ratio = div_fp(int_tofp(sample_time), > + int_tofp(duration_us)); > + core_busy = mul_fp(core_busy, sample_ratio); > + } > + > return core_busy; > } > > -- > 1.9.0 The inclusion of deferrable timer run times subjects this driver to any issues with them. Indeed, there seems to be a problem with the deferrable timers and periodic workflows that can result in the driver incorrectly pushing the CPU frequency downwards. This driver will suffer collateral damage if the root issue is not fixed. Summary: For the periodic cases where the CPU wakes up and does its chunk of work and goes back to sleep all within one jiffy boundary, the deferred timer system does not realize this situation has occurred and the driver is not called. Eventually the driver is called and this patch kicks in and applies downward pressure on the CPU frequency. A perfect example (1000 Hz kernel): Consider a periodic workflow that has 600 microseconds of work to do at 100 hertz, and the job is started asynchronously with respect to the current jiffy count. Then it is 40% probable that the CPU frequency will be pushed to minimum, whereas (on my system) it should be about 2.4 GHz. The intel_pstate driver gets called only once in every 4 seconds. I did this 360 times and got the CPU frequency forced down on 38% of the runs (136 times out of 360). A simple method to check using timer stats: The script: doug@s15:~/temp$ cat set_cpu_timer_stats #! /bin/bash # Reset the counters echo "0" > /proc/timer_stats echo "1" > /proc/timer_stats # Apply a 6 percent load at 100 hertz for 20 seconds taskset -c 5 /home/doug/c/consume 6.0 100.0 20.0 & sleep 10 # What is the CPU 5 frequency? echo -n "CPU 5 frequency (~ 1.6 GHz means forced low, ~2.4 GHz means O.K.): " cat /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_cur_freq sleep 11 cat /proc/timer_stats | grep intel An example where the work is all done within one jiffy: Notice the CPU frequency is pinned low and there are only a few timer calls in the 20 seconds. doug@s15:~/temp$ sudo ./set_cpu_timer_stats CPU 5 frequency (~ 1.6 GHz means forced low, ~2.4 GHz means O.K.): 1599992 consume: 6.0 100.0 20.0 PID: 7530 Elapsed: 20000099 Now: 1402323478790491 19D, 0 swapper/6 intel_pstate_timer_func (intel_pstate_timer_func) 28D, 0 swapper/5 intel_pstate_timer_func (intel_pstate_timer_func) 29D, 0 swapper/3 intel_pstate_timer_func (intel_pstate_timer_func) 20D, 0 swapper/1 intel_pstate_timer_func (intel_pstate_timer_func) 9D, 0 swapper/7 intel_pstate_timer_func (intel_pstate_timer_func) 35D, 0 swapper/2 intel_pstate_timer_func (intel_pstate_timer_func) 12D, 0 swapper/0 intel_pstate_timer_func (intel_pstate_timer_func) 16D, 0 swapper/4 intel_pstate_timer_func (intel_pstate_timer_func) An example where the work spans a jiffy boundary: Notice the CPU frequency is not pinned low and there are the expected ~ 2000 driver calls. doug@s15:~/temp$ sudo ./set_cpu_timer_stats CPU 5 frequency (~ 1.6 GHz means forced low, ~2.4 GHz means O.K.): 2409750 consume: 6.0 100.0 20.0 PID: 7543 Elapsed: 20000099 Now: 1402323597033729 8D, 0 swapper/4 intel_pstate_timer_func (intel_pstate_timer_func) 2004D, 0 swapper/5 intel_pstate_timer_func (intel_pstate_timer_func) 43D, 0 swapper/3 intel_pstate_timer_func (intel_pstate_timer_func) 35D, 0 swapper/2 intel_pstate_timer_func (intel_pstate_timer_func) 24D, 0 swapper/1 intel_pstate_timer_func (intel_pstate_timer_func) 20D, 0 swapper/6 intel_pstate_timer_func (intel_pstate_timer_func) 8D, 0 swapper/7 intel_pstate_timer_func (intel_pstate_timer_func) 10D, 0 swapper/0 intel_pstate_timer_func (intel_pstate_timer_func) O.K. the above was a perfect example, done specifically for dramatic effect and for best exposure of the issue. What about the more generic, but still periodic, case? Well, then it comes down to beat frequencies between the process and the jiffy boundaries. I have run a great many different work/sleep frequencies and there are lots of cases where the driver call is deferred enough to cause this patch code to engage incorrectly. However in these scenarios the CPU frequency will oscillate in step with the beat frequency (I can supply a graph if needed). Note: Yes, fixed time work chunks does not consider the more realistic scenario where the work time would scale with CPU frequency. That just changes the threshold for the issue, but not the root issue itself. Note: Work was done on a 1000 Hz (1 jiffy = 1 mS) kernel. The issue is 4 times worse for 250 Hz kernels. Note: It has taken me until now to get to this point, such that I could comment on-list. One, of many, issues hindering progress, was that I was originally using a 250 Hertz Kernel and Dirk Brandewie was using a 1000 Hertz Kernel. So I would ask Dirk to verify something and it would be fine for him. Note: I would need help to isolate the root issue further. I got lost in timer.c and the mapping .h file (I can not recall the name). ... Doug ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 4/4] intel_pstate: Improve initial busy calculation 2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie ` (2 preceding siblings ...) 2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie @ 2014-05-29 16:32 ` dirk.brandewie [not found] ` <CADmjqpNoRjhO=BaRBXxONCWd8-i9tzDLGVF4fTBs4jfRtHbdxw@mail.gmail.com> 4 siblings, 0 replies; 10+ messages in thread From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw) To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, stable, Dirk Brandewie From: Doug Smythies <dsmythies@telus.net> This change makes the busy calculation using 64 bit math which prevents overflow for large values of aperf/mperf. Cc: <stable@vger.kernel.org> # 3.14.x Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- drivers/cpufreq/intel_pstate.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index c4dad16..774d277 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -556,16 +556,19 @@ static void intel_pstate_get_cpu_pstates(struct cpudata *cpu) static inline void intel_pstate_calc_busy(struct cpudata *cpu) { - struct sample *sample = &cpu->sample; - int32_t core_pct; + int64_t core_pct; + int32_t rem; - core_pct = div_fp(int_tofp(sample->aperf), int_tofp(sample->mperf)); - core_pct = mul_fp(core_pct, int_tofp(100)); + core_pct = int_tofp(sample->aperf) * int_tofp(100); + core_pct = div_u64_rem(core_pct, int_tofp(sample->mperf), &rem); + + if ((rem << 1) >= int_tofp((sample->mperf))) + core_pct += 1; sample->freq = fp_toint( mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct)); - sample->core_pct_busy = core_pct; + sample->core_pct_busy = (int32_t)core_pct; } static inline void intel_pstate_sample(struct cpudata *cpu) -- 1.9.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
[parent not found: <CADmjqpNoRjhO=BaRBXxONCWd8-i9tzDLGVF4fTBs4jfRtHbdxw@mail.gmail.com>]
* Re: [PATCH 0/4] intel_pstate fixes for v3.16 [not found] ` <CADmjqpNoRjhO=BaRBXxONCWd8-i9tzDLGVF4fTBs4jfRtHbdxw@mail.gmail.com> @ 2014-05-30 0:04 ` Stratos Karafotis 0 siblings, 0 replies; 10+ messages in thread From: Stratos Karafotis @ 2014-05-30 0:04 UTC (permalink / raw) To: Dirk Brandewie, Rafael J. Wysocki, Doug Smythies; +Cc: linux-pm, Dirk Brandewie On Thu, May 29, 2014 at 7:32 PM, <dirk.brandewie@gmail.com> wrote: > > From: Dirk Brandewie <dirk.j.brandewie@intel.com> > > Based on Rafael's linux-next branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git > Commit 0c932c3ecbba60bc5f9e08d579891cdbedfdd1c1 > > Fixes the regression where intel_pstate was being too reactive > reducing the P state caused by the C0 tracking which was added to > ensure that on an idle system intel_pstate converge on the lowest > reasonable P state. The C0 tracking mechanism is replaced with a > mechanism that looks at sample time and scales the busy calculation > only if the sample time is *much* larger than the requested sample time. > > Reference: > https://bugzilla.kernel.org/show_bug.cgi?id=75121 > https://lkml.org/lkml/2014/5/6/597 > > Dirk Brandewie (3): > intel_pstate: Remove C0 tracking > intel_pstate: Correct rounding in busy calculation > intel_pstate: add sample time scaling > > Doug Smythies (1): > intel_pstate: Improve initial busy calculation > > drivers/cpufreq/intel_pstate.c | 52 +++++++++++++++++++++++++----------------- > 1 file changed, 31 insertions(+), 21 deletions(-) > Hi, I tested the patch set, run some benchmarks and I would like to post the results: CPU: Core i7-3770 TIME (sec) ENERGY(J) Without With Diff Without With Diff Phoronix kernel build 265.37 260.33 -1.9% 15349.99 15437.57 +0.57% Phoronix Apache 293.69 193.61 -34.8% 9016.28 8544.00 -5.24% mp3 decoding 245.64 245.91 5057.73 5046.07 -0.23% Idle system (turbostat) 60 60 1210.80 1211.40 +0.05% In case you want to, you may add Tested-by: Stratos Karafotis <stratosk@semaphore.gr> Thanks, Stratos ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-06-15 22:46 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie 2014-05-29 16:32 ` [PATCH 1/4] intel_pstate: Remove C0 tracking dirk.brandewie 2014-05-29 16:32 ` [PATCH 2/4] intel_pstate: Correct rounding in busy calculation dirk.brandewie 2014-06-15 14:44 ` Doug Smythies 2014-06-15 22:46 ` Rafael J. Wysocki 2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie 2014-05-29 18:17 ` Yuyang Du 2014-06-09 15:58 ` Doug Smythies 2014-05-29 16:32 ` [PATCH 4/4] intel_pstate: Improve initial busy calculation dirk.brandewie [not found] ` <CADmjqpNoRjhO=BaRBXxONCWd8-i9tzDLGVF4fTBs4jfRtHbdxw@mail.gmail.com> 2014-05-30 0:04 ` [PATCH 0/4] intel_pstate fixes for v3.16 Stratos Karafotis
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.