All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] intel_pstate fixes for v3.16
@ 2014-05-29 16:32 dirk.brandewie
  2014-05-29 16:32 ` [PATCH 1/4] intel_pstate: Remove C0 tracking dirk.brandewie
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw)
  To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, Dirk Brandewie

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

Based on Rafael's linux-next branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
Commit 0c932c3ecbba60bc5f9e08d579891cdbedfdd1c1

Fixes the regression where intel_pstate was being too reactive
reducing the P state caused by the C0 tracking which was added to
ensure that on an idle system intel_pstate converge on the lowest
reasonable P state.  The C0 tracking mechanism is replaced with a
mechanism that looks at sample time and scales the busy calculation
only if the sample time is *much* larger than the requested sample time.

Reference:
   https://bugzilla.kernel.org/show_bug.cgi?id=75121
   https://lkml.org/lkml/2014/5/6/597

Dirk Brandewie (3):
  intel_pstate: Remove C0 tracking
  intel_pstate: Correct rounding in busy calculation
  intel_pstate: add sample time scaling

Doug Smythies (1):
  intel_pstate: Improve initial busy calculation

 drivers/cpufreq/intel_pstate.c | 52 +++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

-- 
1.9.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] intel_pstate: Remove C0 tracking
  2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie
@ 2014-05-29 16:32 ` dirk.brandewie
  2014-05-29 16:32 ` [PATCH 2/4] intel_pstate: Correct rounding in busy calculation dirk.brandewie
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw)
  To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, Dirk Brandewie, stable

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

Commit fcb6a15c intel_pstate: Take core C0 time into account for core busy
introduced a regression referenced below.  The issue with "lockup"
after suspend that this commit was addressing is now dealt with in the
suspend path.


References:
   https://bugzilla.kernel.org/show_bug.cgi?id=66581
   https://bugzilla.kernel.org/show_bug.cgi?id=75121

Reported-by: Doug Smythies <dsmythies@telus.net>
Cc: <stable@vger.kernel.org> # 3.14.x
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index a6d5afa..ffef765 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -57,7 +57,6 @@ struct sample {
 	int32_t core_pct_busy;
 	u64 aperf;
 	u64 mperf;
-	unsigned long long tsc;
 	int freq;
 };
 
@@ -96,7 +95,6 @@ struct cpudata {
 
 	u64	prev_aperf;
 	u64	prev_mperf;
-	unsigned long long prev_tsc;
 	struct sample sample;
 };
 
@@ -555,45 +553,36 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu)
 {
 	struct sample *sample = &cpu->sample;
 	int32_t core_pct;
-	int32_t c0_pct;
 
 	core_pct = div_fp(int_tofp(sample->aperf), int_tofp(sample->mperf));
 	core_pct = mul_fp(core_pct, int_tofp(100));
 	FP_ROUNDUP(core_pct);
 
-	c0_pct = div_fp(int_tofp(sample->mperf), int_tofp(sample->tsc));
-
 	sample->freq = fp_toint(
 		mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
 
-	sample->core_pct_busy = mul_fp(core_pct, c0_pct);
+	sample->core_pct_busy = core_pct;
 }
 
 static inline void intel_pstate_sample(struct cpudata *cpu)
 {
 	u64 aperf, mperf;
-	unsigned long long tsc;
 
 	rdmsrl(MSR_IA32_APERF, aperf);
 	rdmsrl(MSR_IA32_MPERF, mperf);
-	tsc = native_read_tsc();
 
 	aperf = aperf >> FRAC_BITS;
 	mperf = mperf >> FRAC_BITS;
-	tsc = tsc >> FRAC_BITS;
 
 	cpu->sample.aperf = aperf;
 	cpu->sample.mperf = mperf;
-	cpu->sample.tsc = tsc;
 	cpu->sample.aperf -= cpu->prev_aperf;
 	cpu->sample.mperf -= cpu->prev_mperf;
-	cpu->sample.tsc -= cpu->prev_tsc;
 
 	intel_pstate_calc_busy(cpu);
 
 	cpu->prev_aperf = aperf;
 	cpu->prev_mperf = mperf;
-	cpu->prev_tsc = tsc;
 }
 
 static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] intel_pstate: Correct rounding in busy calculation
  2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie
  2014-05-29 16:32 ` [PATCH 1/4] intel_pstate: Remove C0 tracking dirk.brandewie
@ 2014-05-29 16:32 ` dirk.brandewie
  2014-06-15 14:44   ` Doug Smythies
  2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw)
  To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, Dirk Brandewie, stable

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

Changing to fixed point math throughout the busy calculation in
commit e66c1768 Change busy calculation to use fixed point
math. Introduced some inaccuracies by rounding the busy value at two
points in the calculation.  This change removes roundings and moves
the rounding to the output of the PID where the calculations are
complete and the value returned as an integer.

Reported-by: Doug Smythies <dsmythies@telus.net>
Cc: <stable@vger.kernel.org> # 3.14.x
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index ffef765..db8a992 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -38,10 +38,10 @@
 #define BYT_TURBO_VIDS		0x66d
 
 
-#define FRAC_BITS 6
+#define FRAC_BITS 8
 #define int_tofp(X) ((int64_t)(X) << FRAC_BITS)
 #define fp_toint(X) ((X) >> FRAC_BITS)
-#define FP_ROUNDUP(X) ((X) += 1 << FRAC_BITS)
+
 
 static inline int32_t mul_fp(int32_t x, int32_t y)
 {
@@ -194,7 +194,10 @@ static signed int pid_calc(struct _pid *pid, int32_t busy)
 	pid->last_err = fp_error;
 
 	result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm;
-
+	if (result >= 0)
+		result = result + (1 << (FRAC_BITS-1));
+	else
+		result = result - (1 << (FRAC_BITS-1));
 	return (signed int)fp_toint(result);
 }
 
@@ -556,7 +559,6 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu)
 
 	core_pct = div_fp(int_tofp(sample->aperf), int_tofp(sample->mperf));
 	core_pct = mul_fp(core_pct, int_tofp(100));
-	FP_ROUNDUP(core_pct);
 
 	sample->freq = fp_toint(
 		mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
@@ -602,7 +604,7 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
 	max_pstate = int_tofp(cpu->pstate.max_pstate);
 	current_pstate = int_tofp(cpu->pstate.current_pstate);
 	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
-	return FP_ROUNDUP(core_busy);
+	return core_busy;
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] intel_pstate: add sample time scaling
  2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie
  2014-05-29 16:32 ` [PATCH 1/4] intel_pstate: Remove C0 tracking dirk.brandewie
  2014-05-29 16:32 ` [PATCH 2/4] intel_pstate: Correct rounding in busy calculation dirk.brandewie
@ 2014-05-29 16:32 ` dirk.brandewie
  2014-05-29 18:17   ` Yuyang Du
  2014-06-09 15:58   ` Doug Smythies
  2014-05-29 16:32 ` [PATCH 4/4] intel_pstate: Improve initial busy calculation dirk.brandewie
       [not found] ` <CADmjqpNoRjhO=BaRBXxONCWd8-i9tzDLGVF4fTBs4jfRtHbdxw@mail.gmail.com>
  4 siblings, 2 replies; 10+ messages in thread
From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw)
  To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, Dirk Brandewie, stable

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

The PID assumes that samples are of equal time, which for a deferable
timers this is not true when the system goes idle.  This causes the
PID to take a long time to converge to the min P state and depending
on the pattern of the idle load can make the P state appear stuck.

The hold-off value of three sample times before using the scaling is
to give a grace period for applications that have high performance
requirements and spend a lot of time idle,  The poster child for this
behavior is the ffmpeg benchmark in the Phoronix test suite.

Cc: <stable@vger.kernel.org> # 3.14.x
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index db8a992..c4dad16 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -58,6 +58,7 @@ struct sample {
 	u64 aperf;
 	u64 mperf;
 	int freq;
+	ktime_t time;
 };
 
 struct pstate_data {
@@ -93,6 +94,7 @@ struct cpudata {
 	struct vid_data vid;
 	struct _pid pid;
 
+	ktime_t last_sample_time;
 	u64	prev_aperf;
 	u64	prev_mperf;
 	struct sample sample;
@@ -576,6 +578,8 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
 	aperf = aperf >> FRAC_BITS;
 	mperf = mperf >> FRAC_BITS;
 
+	cpu->last_sample_time = cpu->sample.time;
+	cpu->sample.time = ktime_get();
 	cpu->sample.aperf = aperf;
 	cpu->sample.mperf = mperf;
 	cpu->sample.aperf -= cpu->prev_aperf;
@@ -598,12 +602,24 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 
 static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
 {
-	int32_t core_busy, max_pstate, current_pstate;
+	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
+	u32 duration_us;
+	u32 sample_time;
 
 	core_busy = cpu->sample.core_pct_busy;
 	max_pstate = int_tofp(cpu->pstate.max_pstate);
 	current_pstate = int_tofp(cpu->pstate.current_pstate);
 	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
+
+	sample_time = (pid_params.sample_rate_ms  * USEC_PER_MSEC);
+	duration_us = (u32) ktime_us_delta(cpu->sample.time,
+					cpu->last_sample_time);
+	if (duration_us > sample_time * 3) {
+		sample_ratio = div_fp(int_tofp(sample_time),
+				int_tofp(duration_us));
+		core_busy = mul_fp(core_busy, sample_ratio);
+	}
+
 	return core_busy;
 }
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] intel_pstate: Improve initial busy calculation
  2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie
                   ` (2 preceding siblings ...)
  2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie
@ 2014-05-29 16:32 ` dirk.brandewie
       [not found] ` <CADmjqpNoRjhO=BaRBXxONCWd8-i9tzDLGVF4fTBs4jfRtHbdxw@mail.gmail.com>
  4 siblings, 0 replies; 10+ messages in thread
From: dirk.brandewie @ 2014-05-29 16:32 UTC (permalink / raw)
  To: linux-pm; +Cc: dirk.brandewie, rjw, dsmythies, stable, Dirk Brandewie

From: Doug Smythies <dsmythies@telus.net>

This change makes the busy calculation using 64 bit math which prevents
overflow for large values of aperf/mperf.

Cc: <stable@vger.kernel.org> # 3.14.x
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c4dad16..774d277 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -556,16 +556,19 @@ static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
 
 static inline void intel_pstate_calc_busy(struct cpudata *cpu)
 {
-	struct sample *sample = &cpu->sample;
-	int32_t core_pct;
+	int64_t core_pct;
+	int32_t rem;
 
-	core_pct = div_fp(int_tofp(sample->aperf), int_tofp(sample->mperf));
-	core_pct = mul_fp(core_pct, int_tofp(100));
+	core_pct = int_tofp(sample->aperf) * int_tofp(100);
+	core_pct = div_u64_rem(core_pct, int_tofp(sample->mperf), &rem);
+
+	if ((rem << 1) >= int_tofp((sample->mperf)))
+		core_pct += 1;
 
 	sample->freq = fp_toint(
 		mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
 
-	sample->core_pct_busy = core_pct;
+	sample->core_pct_busy = (int32_t)core_pct;
 }
 
 static inline void intel_pstate_sample(struct cpudata *cpu)
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] intel_pstate: add sample time scaling
  2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie
@ 2014-05-29 18:17   ` Yuyang Du
  2014-06-09 15:58   ` Doug Smythies
  1 sibling, 0 replies; 10+ messages in thread
From: Yuyang Du @ 2014-05-29 18:17 UTC (permalink / raw)
  To: dirk.brandewie; +Cc: linux-pm, rjw, dsmythies, Dirk Brandewie, stable

>  static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
>  {
> -	int32_t core_busy, max_pstate, current_pstate;
> +	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
> +	u32 duration_us;
> +	u32 sample_time;
>  
>  	core_busy = cpu->sample.core_pct_busy;
>  	max_pstate = int_tofp(cpu->pstate.max_pstate);
>  	current_pstate = int_tofp(cpu->pstate.current_pstate);
>  	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
> +
> +	sample_time = (pid_params.sample_rate_ms  * USEC_PER_MSEC);
> +	duration_us = (u32) ktime_us_delta(cpu->sample.time,
> +					cpu->last_sample_time);
> +	if (duration_us > sample_time * 3) {
> +		sample_ratio = div_fp(int_tofp(sample_time),
> +				int_tofp(duration_us));
> +		core_busy = mul_fp(core_busy, sample_ratio);
> +	}
> +
>  	return core_busy;
>  }

Hi Dirk,

I am afraid I need to question again since you did not address my concern.

So generally, this new patch will factor (sample_rate / duration) in
(last_freq / last_request) if duration > 3*sample_time.

This sample_rate / duration thing looks random. And otherwise it is still
effectively performance governor behavior if what I reasoned before is right.

So my opinion is, the C0 tracking is not the (true) root cause to perf regression.
But if you really need this patch as a temp workaround, it is ok.

Thanks,
Yuyang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] intel_pstate fixes for v3.16
       [not found] ` <CADmjqpNoRjhO=BaRBXxONCWd8-i9tzDLGVF4fTBs4jfRtHbdxw@mail.gmail.com>
@ 2014-05-30  0:04   ` Stratos Karafotis
  0 siblings, 0 replies; 10+ messages in thread
From: Stratos Karafotis @ 2014-05-30  0:04 UTC (permalink / raw)
  To: Dirk Brandewie, Rafael J. Wysocki, Doug Smythies; +Cc: linux-pm, Dirk Brandewie

On Thu, May 29, 2014 at 7:32 PM,  <dirk.brandewie@gmail.com> wrote:
> 
> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
> 
> Based on Rafael's linux-next branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
> Commit 0c932c3ecbba60bc5f9e08d579891cdbedfdd1c1
> 
> Fixes the regression where intel_pstate was being too reactive
> reducing the P state caused by the C0 tracking which was added to
> ensure that on an idle system intel_pstate converge on the lowest
> reasonable P state.  The C0 tracking mechanism is replaced with a
> mechanism that looks at sample time and scales the busy calculation
> only if the sample time is *much* larger than the requested sample time.
> 
> Reference:
>    https://bugzilla.kernel.org/show_bug.cgi?id=75121
>    https://lkml.org/lkml/2014/5/6/597
> 
> Dirk Brandewie (3):
>   intel_pstate: Remove C0 tracking
>   intel_pstate: Correct rounding in busy calculation
>   intel_pstate: add sample time scaling
> 
> Doug Smythies (1):
>   intel_pstate: Improve initial busy calculation
> 
>  drivers/cpufreq/intel_pstate.c | 52 +++++++++++++++++++++++++-----------------
>  1 file changed, 31 insertions(+), 21 deletions(-)
> 

Hi,

I tested the patch set, run some benchmarks and I would like
to post the results:

CPU: Core i7-3770

			    TIME (sec)			ENERGY(J)
			Without	With	Diff	Without		With	  Diff
Phoronix kernel build	265.37	260.33	 -1.9%	15349.99	15437.57  +0.57%
Phoronix Apache		293.69	193.61	-34.8%	 9016.28	 8544.00  -5.24%
mp3 decoding 		245.64	245.91		 5057.73	 5046.07  -0.23%
Idle system (turbostat)  60	 60		 1210.80	 1211.40  +0.05%


In case you want to, you may add

Tested-by: Stratos Karafotis <stratosk@semaphore.gr>


Thanks,
Stratos

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 3/4] intel_pstate: add sample time scaling
  2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie
  2014-05-29 18:17   ` Yuyang Du
@ 2014-06-09 15:58   ` Doug Smythies
  1 sibling, 0 replies; 10+ messages in thread
From: Doug Smythies @ 2014-06-09 15:58 UTC (permalink / raw)
  To: dirk.brandewie, linux-pm
  Cc: rjw, 'Dirk Brandewie', stable, Doug Smythies


On 2014.05.29 09:32 Dirk wrote:

> From: Dirk Brandewie <dirk.j.brandewie@intel.com>

> The PID assumes that samples are of equal time, which for a deferable
> timers this is not true when the system goes idle.  This causes the
> PID to take a long time to converge to the min P state and depending
> on the pattern of the idle load can make the P state appear stuck.

> The hold-off value of three sample times before using the scaling is
> to give a grace period for applications that have high performance
> requirements and spend a lot of time idle,  The poster child for this
> behavior is the ffmpeg benchmark in the Phoronix test suite.

> Cc: <stable@vger.kernel.org> # 3.14.x
> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
> ---
> drivers/cpufreq/intel_pstate.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)

> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index db8a992..c4dad16 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -58,6 +58,7 @@ struct sample {
> 	u64 aperf;
> 	u64 mperf;
> 	int freq;
> +	ktime_t time;
> };
> 
> struct pstate_data {
> @@ -93,6 +94,7 @@ struct cpudata {
> 	struct vid_data vid;
> 	struct _pid pid;
> 
> +	ktime_t last_sample_time;
> 	u64	prev_aperf;
> 	u64	prev_mperf;
> 	struct sample sample;
> @@ -576,6 +578,8 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
> 	aperf = aperf >> FRAC_BITS;
> 	mperf = mperf >> FRAC_BITS;
> 
> +	cpu->last_sample_time = cpu->sample.time;
> +	cpu->sample.time = ktime_get();
> 	cpu->sample.aperf = aperf;
>	cpu->sample.mperf = mperf;
> 	cpu->sample.aperf -= cpu->prev_aperf;
> @@ -598,12 +602,24 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
> 
> static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
> {
> -	int32_t core_busy, max_pstate, current_pstate;
> +	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
> +	u32 duration_us;
> +	u32 sample_time;
> 
> 	core_busy = cpu->sample.core_pct_busy;
> 	max_pstate = int_tofp(cpu->pstate.max_pstate);
> 	current_pstate = int_tofp(cpu->pstate.current_pstate);
> 	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
> +
> +	sample_time = (pid_params.sample_rate_ms  * USEC_PER_MSEC);
> +	duration_us = (u32) ktime_us_delta(cpu->sample.time,
> +					cpu->last_sample_time);
> +	if (duration_us > sample_time * 3) {
> +		sample_ratio = div_fp(int_tofp(sample_time),
> +				int_tofp(duration_us));
> +		core_busy = mul_fp(core_busy, sample_ratio);
> +	}
> +
> 	return core_busy;
> }
> 
> -- 
> 1.9.0

The inclusion of deferrable timer run times subjects
this driver to any issues with them. Indeed, there
seems to be a problem with the deferrable timers
and periodic workflows that can result in the driver
incorrectly pushing the CPU frequency downwards.

This driver will suffer collateral damage if the
root issue is not fixed.

Summary:

For the periodic cases where the CPU wakes up and does
its chunk of work and goes back to sleep all within
one jiffy boundary, the deferred timer system does not
realize this situation has occurred and the driver is
not called. Eventually the driver is called and this
patch kicks in and applies downward pressure on
the CPU frequency.

A perfect example (1000 Hz kernel):

Consider a periodic workflow that has 600 microseconds
of work to do at 100 hertz, and the job is started
asynchronously with respect to the current jiffy count.
Then it is 40% probable that the CPU frequency will
be pushed to minimum, whereas (on my system) it should
be about 2.4 GHz. The intel_pstate driver gets called
only once in every 4 seconds.

I did this 360 times and got the CPU frequency forced
down on 38% of the runs (136 times out of 360).

A simple method to check using timer stats:

The script:

doug@s15:~/temp$ cat set_cpu_timer_stats
#! /bin/bash
# Reset the counters
echo "0" > /proc/timer_stats
echo "1" > /proc/timer_stats
# Apply a 6 percent load at 100 hertz for 20 seconds
taskset -c 5 /home/doug/c/consume 6.0 100.0 20.0 &
sleep 10
# What is the CPU 5 frequency?
echo -n "CPU 5 frequency (~ 1.6 GHz means forced low, ~2.4 GHz means O.K.): "
cat /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_cur_freq
sleep 11
cat /proc/timer_stats | grep intel

An example where the work is all done within one jiffy:

Notice the CPU frequency is pinned low and there are only
a few timer calls in the 20 seconds.
 
doug@s15:~/temp$ sudo ./set_cpu_timer_stats
CPU 5 frequency (~ 1.6 GHz means forced low, ~2.4 GHz means O.K.): 1599992
consume: 6.0 100.0 20.0  PID: 7530  Elapsed: 20000099  Now: 1402323478790491
  19D,     0 swapper/6        intel_pstate_timer_func (intel_pstate_timer_func)
  28D,     0 swapper/5        intel_pstate_timer_func (intel_pstate_timer_func)
  29D,     0 swapper/3        intel_pstate_timer_func (intel_pstate_timer_func)
  20D,     0 swapper/1        intel_pstate_timer_func (intel_pstate_timer_func)
   9D,     0 swapper/7        intel_pstate_timer_func (intel_pstate_timer_func)
  35D,     0 swapper/2        intel_pstate_timer_func (intel_pstate_timer_func)
  12D,     0 swapper/0        intel_pstate_timer_func (intel_pstate_timer_func)
  16D,     0 swapper/4        intel_pstate_timer_func (intel_pstate_timer_func)

An example where the work spans a jiffy boundary:
Notice the CPU frequency is not pinned low and there are
the expected ~ 2000 driver calls.

doug@s15:~/temp$ sudo ./set_cpu_timer_stats
CPU 5 frequency (~ 1.6 GHz means forced low, ~2.4 GHz means O.K.): 2409750
consume: 6.0 100.0 20.0  PID: 7543  Elapsed: 20000099  Now: 1402323597033729
   8D,     0 swapper/4        intel_pstate_timer_func (intel_pstate_timer_func)
2004D,     0 swapper/5        intel_pstate_timer_func (intel_pstate_timer_func)
  43D,     0 swapper/3        intel_pstate_timer_func (intel_pstate_timer_func)
  35D,     0 swapper/2        intel_pstate_timer_func (intel_pstate_timer_func)
  24D,     0 swapper/1        intel_pstate_timer_func (intel_pstate_timer_func)
  20D,     0 swapper/6        intel_pstate_timer_func (intel_pstate_timer_func)
   8D,     0 swapper/7        intel_pstate_timer_func (intel_pstate_timer_func)
  10D,     0 swapper/0        intel_pstate_timer_func (intel_pstate_timer_func)

O.K. the above was a perfect example, done specifically
for dramatic effect and for best exposure of the issue.
What about the more generic, but still periodic, case?

Well, then it comes down to beat frequencies between the
process and the jiffy boundaries. I have run a great many
different work/sleep frequencies and there are lots of
cases where the driver call is deferred enough to cause
this patch code to engage incorrectly. However in these
scenarios the CPU frequency will oscillate in step with
the beat frequency (I can supply a graph if needed).

Note: Yes, fixed time work chunks does not consider the
more realistic scenario where the work time would scale
with CPU frequency. That just changes the threshold for
the issue, but not the root issue itself.

Note: Work was done on a 1000 Hz (1 jiffy = 1 mS) kernel.
The issue is 4 times worse for 250 Hz kernels.

Note: It has taken me until now to get to this point,
such that I could comment on-list. One, of many, issues
hindering progress, was that I was originally using
a 250 Hertz Kernel and Dirk Brandewie was using a
1000 Hertz Kernel. So I would ask Dirk to verify
something and it would be fine for him.

Note: I would need help to isolate the root issue
further. I got lost in timer.c and the mapping
.h file (I can not recall the name).

... Doug



^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 2/4] intel_pstate: Correct rounding in busy calculation
  2014-05-29 16:32 ` [PATCH 2/4] intel_pstate: Correct rounding in busy calculation dirk.brandewie
@ 2014-06-15 14:44   ` Doug Smythies
  2014-06-15 22:46     ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Doug Smythies @ 2014-06-15 14:44 UTC (permalink / raw)
  To: dirk.brandewie, linux-pm; +Cc: rjw, 'Dirk Brandewie', stable

On 2014.05.29 09:32 Dirk Brandewie wrote:

There is a mistake in this patch.
Even though the patch is from Dirk, the mistake is my fault, sorry.

Severity: Not very serious, but can increase target pstate by one extra value.
For real world work flows the issue should self correct (but I have no proof).
It is the equivalent of different PID gains for positive and negative numbers.

Why this e-mail? Why not just submit a patch?
Because I do not yet know how, particularly for v3.16

This part of the patch:

 	result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm;
 -
 +	if (result >= 0)
 +		result = result + (1 << (FRAC_BITS-1));
 +	else
 +		result = result - (1 << (FRAC_BITS-1));
 	return (signed int)fp_toint(result);

Should actually be this:

        result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm;
+       result = result + (1 << (FRAC_BITS-1));
        return (signed int)fp_toint(result);

Proof method 1: Use "perf record" and verify all the math by hand.
Realize that some numbers are not what is expected, and work backwards.

Proof method 2: make a table of all the numbers. I.E.

/****************************************************************************/
/*
/* round_truth_table.c  2014.06.15  Smythies
/*       It seems I introduced a mistake into intel_pstate.c.
/*       Since my brain hurts from all the thinking it through, verify
/*       by mindlessly generating all possibilities.
/*
/****************************************************************************/

#include <stdio.h>
#include <stdlib.h>

main(){

   long int count, mistake_way;

   for(count = 1026; count >= -1026; count--){ /* from 4.008 to -4.008 in steps of 1 / 256ths */
      if(count >= 0){
         mistake_way = (count + (1 << 7)) >> 8;
      } else {
         mistake_way = (count - (1 << 7)) >> 8;
      } /* endif */
      printf("%ld %ld %ld %ld %lf\n", count, count >> 8, mistake_way, (count + (1 << 7)) >> 8, (double)count / 256.0);
   } /* endfor */
} /* endprogram */

Which gives (edited):

1025 4 4 4 4.003906
1024 4 4 4 4.000000
1023 3 4 4 3.996094

897 3 4 4 3.503906
896 3 4 4 3.500000
895 3 3 3 3.496094

769 3 3 3 3.003906
768 3 3 3 3.000000
767 2 3 3 2.996094

641 2 3 3 2.503906
640 2 3 3 2.500000
639 2 2 2 2.496094

513 2 2 2 2.003906
512 2 2 2 2.000000
511 1 2 2 1.996094

385 1 2 2 1.503906
384 1 2 2 1.500000
383 1 1 1 1.496094

257 1 1 1 1.003906
256 1 1 1 1.000000
255 0 1 1 0.996094

129 0 1 1 0.503906
128 0 1 1 0.500000
127 0 0 0 0.496094

1 0 0 0 0.003906
0 0 0 0 0.000000
-1 -1 -1 0 -0.003906

-127 -1 -1 0 -0.496094
-128 -1 -1 0 -0.500000
-129 -1 -2 -1 -0.503906

-255 -1 -2 -1 -0.996094
-256 -1 -2 -1 -1.000000
-257 -2 -2 -1 -1.003906

-383 -2 -2 -1 -1.496094
-384 -2 -2 -1 -1.500000
-385 -2 -3 -2 -1.503906

-511 -2 -3 -2 -1.996094
-512 -2 -3 -2 -2.000000
-513 -3 -3 -2 -2.003906

-639 -3 -3 -2 -2.496094
-640 -3 -3 -2 -2.500000
-641 -3 -4 -3 -2.503906

-767 -3 -4 -3 -2.996094
-768 -3 -4 -3 -3.000000
-769 -4 -4 -3 -3.003906

-895 -4 -4 -3 -3.496094
-896 -4 -4 -3 -3.500000
-897 -4 -5 -4 -3.503906

-1024 -4 -5 -4 -4.000000
-1025 -5 -5 -4 -4.003906
-1026 -5 -5 -4 -4.007812

> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
>
> Changing to fixed point math throughout the busy calculation in
> commit e66c1768 Change busy calculation to use fixed point
> math. Introduced some inaccuracies by rounding the busy value at two
> points in the calculation.  This change removes roundings and moves
> the rounding to the output of the PID where the calculations are
> complete and the value returned as an integer.
>
> Reported-by: Doug Smythies <dsmythies@telus.net>
> Cc: <stable@vger.kernel.org> # 3.14.x
> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
> ---
> drivers/cpufreq/intel_pstate.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index ffef765..db8a992 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -38,10 +38,10 @@
> #define BYT_TURBO_VIDS		0x66d
> 
> 
> -#define FRAC_BITS 6
> +#define FRAC_BITS 8
> #define int_tofp(X) ((int64_t)(X) << FRAC_BITS)
> #define fp_toint(X) ((X) >> FRAC_BITS)
> -#define FP_ROUNDUP(X) ((X) += 1 << FRAC_BITS)
>+
> 
> static inline int32_t mul_fp(int32_t x, int32_t y)
> {
>@@ -194,7 +194,10 @@ static signed int pid_calc(struct _pid *pid, int32_t busy)
> 	pid->last_err = fp_error;
> 
> 	result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm;
> -
> +	if (result >= 0)
> +		result = result + (1 << (FRAC_BITS-1));
> +	else
> +		result = result - (1 << (FRAC_BITS-1));
> 	return (signed int)fp_toint(result);
> }
> 
> @@ -556,7 +559,6 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu)
> 
> 	core_pct = div_fp(int_tofp(sample->aperf), int_tofp(sample->mperf));
> 	core_pct = mul_fp(core_pct, int_tofp(100));
> -	FP_ROUNDUP(core_pct);
> 
> 	sample->freq = fp_toint(
> 		mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
> @@ -602,7 +604,7 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
> 	max_pstate = int_tofp(cpu->pstate.max_pstate);
> 	current_pstate = int_tofp(cpu->pstate.current_pstate);
> 	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
> -	return FP_ROUNDUP(core_busy);
> +	return core_busy;
> }
> 
> static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
> -- 
> 1.9.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] intel_pstate: Correct rounding in busy calculation
  2014-06-15 14:44   ` Doug Smythies
@ 2014-06-15 22:46     ` Rafael J. Wysocki
  0 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2014-06-15 22:46 UTC (permalink / raw)
  To: Doug Smythies; +Cc: dirk.brandewie, linux-pm, 'Dirk Brandewie', stable

On Sunday, June 15, 2014 07:44:47 AM Doug Smythies wrote:
> On 2014.05.29 09:32 Dirk Brandewie wrote:
> 
> There is a mistake in this patch.
> Even though the patch is from Dirk, the mistake is my fault, sorry.
> 
> Severity: Not very serious, but can increase target pstate by one extra value.
> For real world work flows the issue should self correct (but I have no proof).
> It is the equivalent of different PID gains for positive and negative numbers.
> 
> Why this e-mail? Why not just submit a patch?
> Because I do not yet know how, particularly for v3.16

Quite straightforward.  Create a patch with a changelog saying that it
fixes a bug, put the hash and subject of the commit that introduced
the bug in that changelog and send it to linux-pm@vger.kernel.org with
a CC to me (preferably at rjw@rjwysocki.net, but you can also use
rafael@kernel.org if you like).  I will take care of the patch from
that point on.

Rafael

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-06-15 22:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-29 16:32 [PATCH 0/4] intel_pstate fixes for v3.16 dirk.brandewie
2014-05-29 16:32 ` [PATCH 1/4] intel_pstate: Remove C0 tracking dirk.brandewie
2014-05-29 16:32 ` [PATCH 2/4] intel_pstate: Correct rounding in busy calculation dirk.brandewie
2014-06-15 14:44   ` Doug Smythies
2014-06-15 22:46     ` Rafael J. Wysocki
2014-05-29 16:32 ` [PATCH 3/4] intel_pstate: add sample time scaling dirk.brandewie
2014-05-29 18:17   ` Yuyang Du
2014-06-09 15:58   ` Doug Smythies
2014-05-29 16:32 ` [PATCH 4/4] intel_pstate: Improve initial busy calculation dirk.brandewie
     [not found] ` <CADmjqpNoRjhO=BaRBXxONCWd8-i9tzDLGVF4fTBs4jfRtHbdxw@mail.gmail.com>
2014-05-30  0:04   ` [PATCH 0/4] intel_pstate fixes for v3.16 Stratos Karafotis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.