All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly
@ 2015-04-12  4:10 Doug Smythies
  2015-04-12  4:10 ` [PATCH 1/5] intel_pstate: Add tsc collection and keep previous target pstate. Add both to trace Doug Smythies
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Doug Smythies @ 2015-04-12  4:10 UTC (permalink / raw)
  To: kristen, rjw; +Cc: dsmythies, linux-pm

This patch set attempts to address some issues with
the intel_pstate driver:

The driver has a finicky tradeoffs between fixed point
integer math and gain settings and the setpoint.

The application is not really best suited to a PID
(Proportional Integral Derivative) controller.

The duration method can incorrectly drive the target
pstate downwards, because a long duration might not
be due to idle.

The driver can incorrectly drive the target pstate
upwards, when the C0 time is virtually 0.

The patch set brings back the use of C0 time as the driving
input as to what to do next in terms of target pstate.
Recall that C0 time was used once before and was eventually
removed due to situations of pstate lockup at a low pstate.
Why is this time better? Because it is scaled properly,
whereas it was not before. Also the response curve is
flexible and adjustable.

Patch 1 basically adds better debug information to the
perf data. It was originally requested in June, and Dirk agreed
to include it in September. Regardless of the acceptance or
rejection of this whole patch set, please accept this one.
scripts/checkpatch.pl complains about one line as over 80
characters, not sure how to split the line.

Patch 2 brings back the use of C0 time for the calculation
of core_busy. Floor and ceiling hold values can be adjusted
Once the floor load is reached the response is linear until
the ceiling load is reached. The equations are just a simple
y=mx+b line, where m = slope and b = intercept.
Equation 1: min = m * floor + b
Equation 2: max = m * ceiling + b
Two equations and two unknowns, solved and coded.
scripts/checkpatch.pl O.K.

Patch 3 bypasses the PID controller and calculates the
target pstate directly. An IIR (Infinite Impulse
Response) filter takes care of filtering.
Again a simple y=mx+b line.
Equation 1: pmin = m * min + b
Equation 2: pmax = m * max + b
Again, two simple equations and two unknowns, solved
and coded.
scripts/checkpatch.pl O.K.
Note: not the same min and max as in patch 2.

Note: There are two definitions of percent within
this driver, one goes to 100% at the max turbo (if
it is enabled) and the other is 100% at the max
non-turbo (enabled or not). It can be very confusing.
The min used in patch 3 must be the mathematical min
for the processor under the current turbo enabled or
disabled state, which might not be the same
as /sys/devices/system/cpu/intel_pstate/min_perf_pct
when turbo is enabled.

Patch 4 adds weighting to longer duration events to,
at least partially, compensate for the asymmetric
rising and falling load typical durations.
scripts/checkpatch.pl O.K.

Patch 5 just adjusts the default IIR gain.
10 percent gives a rising edge response time similar
to the current version of the driver.
scripts/checkpatch.pl O.K.

Anticipated questions:

Q: Shouldn't the response curve be normalized to a reference
pstate, perhaps the max non-turbo pstate?

A: Doing so can create a lockup scenario, depending on the
response curve settings and the overall pstate range of the
particular processor. A downside to not normalizing has not been
detected.

Q: One of the patches in the set is titled V2. What was V1?

A: To a limit, V1 ran the IIR filter N times when the duration
was N times nominal. The method was expensive in terms of
multiplies and divides. V2 saves a lot of clock cycles, and
is good enough.

Migration path:

It should not be assumed that the current pstate is actually
what was in effect during the last sample time. If a higher
pstate was actually used, due to another active CPU, then
that should override the use of the current pstate in the filter.
This issue also exists in the current version of the driver,
and was only discovered a few days ago. see also:
https://bugzilla.kernel.org/show_bug.cgi?id=93521#c55
This will be addressed, if possible, in a future patch.

There are likely some math (multiply and divide) savings that
can be realized in future patches.

While the use of the PID controller has been eliminated, and
because the change is likely to be controversial, the
code has been left intact. It will be removed in a future patch.

Doug Smythies (5):
  intel_pstate: Add tsc collection and keep previous target pstate. Add
    both to trace.
  intel_pstate: Use C0 time for busy calculations (again).
  intel_pstate: Calculate target pstate directly.
  intel_pstate: Compensate for intermediate durations (v2).
  intel_pstate: Adjust default IIR filter gain.

 Documentation/cpu-freq/intel-pstate.txt |   2 +
 drivers/cpufreq/intel_pstate.c          | 188 ++++++++++++++++++++++++--------
 include/trace/events/power.h            |  25 +++--
 3 files changed, 163 insertions(+), 52 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] intel_pstate: Add tsc collection and keep previous target pstate. Add both to trace.
  2015-04-12  4:10 [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Doug Smythies
@ 2015-04-12  4:10 ` Doug Smythies
  2015-04-29 16:57   ` Kristen Carlson Accardi
  2015-04-12  4:10 ` [PATCH 2/5] intel_pstate: Use C0 time for busy calculations (again) Doug Smythies
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Doug Smythies @ 2015-04-12  4:10 UTC (permalink / raw)
  To: kristen, rjw; +Cc: dsmythies, linux-pm

The intel_pstate driver is difficult to debug and investigate without tsc.
Also, it is likely use of tsc, and some version of C0 percentage,
will be re-introdcued in futute.
There have also been occasions where it is desirebale to know, and
confirm, the previous target pstate.
This patch brings back tsc, adds previous target pstate,
and adds both to the trace data collection.

Signed-off-by: Doug Smythies <dsmythies@telus.net>
---
 drivers/cpufreq/intel_pstate.c | 31 +++++++++++++++++++++----------
 include/trace/events/power.h   | 25 +++++++++++++++++--------
 2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 872c577..f181ce5 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -67,6 +67,7 @@ struct sample {
 	int32_t core_pct_busy;
 	u64 aperf;
 	u64 mperf;
+	u64 tsc;
 	int freq;
 	ktime_t time;
 };
@@ -108,6 +109,7 @@ struct cpudata {
 	ktime_t last_sample_time;
 	u64	prev_aperf;
 	u64	prev_mperf;
+	u64	prev_tsc;
 	struct sample sample;
 };
 
@@ -725,23 +727,28 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
 {
 	u64 aperf, mperf;
 	unsigned long flags;
+	u64 tsc;
 
 	local_irq_save(flags);
 	rdmsrl(MSR_IA32_APERF, aperf);
 	rdmsrl(MSR_IA32_MPERF, mperf);
+	tsc = native_read_tsc();
 	local_irq_restore(flags);
 
 	cpu->last_sample_time = cpu->sample.time;
 	cpu->sample.time = ktime_get();
 	cpu->sample.aperf = aperf;
 	cpu->sample.mperf = mperf;
+	cpu->sample.tsc =  tsc;
 	cpu->sample.aperf -= cpu->prev_aperf;
 	cpu->sample.mperf -= cpu->prev_mperf;
+	cpu->sample.tsc -= cpu->prev_tsc;
 
 	intel_pstate_calc_busy(cpu);
 
 	cpu->prev_aperf = aperf;
 	cpu->prev_mperf = mperf;
+	cpu->prev_tsc = tsc;
 }
 
 static inline void intel_hwp_set_sample_time(struct cpudata *cpu)
@@ -806,6 +813,10 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 	int32_t busy_scaled;
 	struct _pid *pid;
 	signed int ctl;
+	int from;
+	struct sample *sample;
+
+	from = cpu->pstate.current_pstate;
 
 	pid = &cpu->pid;
 	busy_scaled = intel_pstate_get_scaled_busy(cpu);
@@ -814,6 +825,16 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 
 	/* Negative values of ctl increase the pstate and vice versa */
 	intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);
+
+	sample = &cpu->sample;
+	trace_pstate_sample(fp_toint(sample->core_pct_busy),
+		fp_toint(busy_scaled),
+		from,
+		cpu->pstate.current_pstate,
+		sample->mperf,
+		sample->aperf,
+		sample->tsc,
+		sample->freq);
 }
 
 static void intel_hwp_timer_func(unsigned long __data)
@@ -827,21 +848,11 @@ static void intel_hwp_timer_func(unsigned long __data)
 static void intel_pstate_timer_func(unsigned long __data)
 {
 	struct cpudata *cpu = (struct cpudata *) __data;
-	struct sample *sample;
 
 	intel_pstate_sample(cpu);
 
-	sample = &cpu->sample;
-
 	intel_pstate_adjust_busy_pstate(cpu);
 
-	trace_pstate_sample(fp_toint(sample->core_pct_busy),
-			fp_toint(intel_pstate_get_scaled_busy(cpu)),
-			cpu->pstate.current_pstate,
-			sample->mperf,
-			sample->aperf,
-			sample->freq);
-
 	intel_pstate_set_sample_time(cpu);
 }
 
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index d19840b..630d1e5 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -42,45 +42,54 @@ TRACE_EVENT(pstate_sample,
 
 	TP_PROTO(u32 core_busy,
 		u32 scaled_busy,
-		u32 state,
+		u32 from,
+		u32 to,
 		u64 mperf,
 		u64 aperf,
+		u64 tsc,
 		u32 freq
 		),
 
 	TP_ARGS(core_busy,
 		scaled_busy,
-		state,
+		from,
+		to,
 		mperf,
 		aperf,
+		tsc,
 		freq
 		),
 
 	TP_STRUCT__entry(
 		__field(u32, core_busy)
 		__field(u32, scaled_busy)
-		__field(u32, state)
+		__field(u32, from)
+		__field(u32, to)
 		__field(u64, mperf)
 		__field(u64, aperf)
+		__field(u64, tsc)
 		__field(u32, freq)
-
-	),
+		),
 
 	TP_fast_assign(
 		__entry->core_busy = core_busy;
 		__entry->scaled_busy = scaled_busy;
-		__entry->state = state;
+		__entry->from = from;
+		__entry->to = to;
 		__entry->mperf = mperf;
 		__entry->aperf = aperf;
+		__entry->tsc = tsc;
 		__entry->freq = freq;
 		),
 
-	TP_printk("core_busy=%lu scaled=%lu state=%lu mperf=%llu aperf=%llu freq=%lu ",
+	TP_printk("core_busy=%lu scaled=%lu from=%lu to=%lu mperf=%llu aperf=%llu tsc=%llu freq=%lu ",
 		(unsigned long)__entry->core_busy,
 		(unsigned long)__entry->scaled_busy,
-		(unsigned long)__entry->state,
+		(unsigned long)__entry->from,
+		(unsigned long)__entry->to,
 		(unsigned long long)__entry->mperf,
 		(unsigned long long)__entry->aperf,
+		(unsigned long long)__entry->tsc,
 		(unsigned long)__entry->freq
 		)
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] intel_pstate: Use C0 time for busy calculations (again).
  2015-04-12  4:10 [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Doug Smythies
  2015-04-12  4:10 ` [PATCH 1/5] intel_pstate: Add tsc collection and keep previous target pstate. Add both to trace Doug Smythies
@ 2015-04-12  4:10 ` Doug Smythies
  2015-05-06 19:20   ` Kristen Carlson Accardi
  2015-04-12  4:10 ` [PATCH 3/5] intel_pstate: Calculate target pstate directly Doug Smythies
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Doug Smythies @ 2015-04-12  4:10 UTC (permalink / raw)
  To: kristen, rjw; +Cc: dsmythies, linux-pm

This patch brings back the inclusion of C0 time
for the calculation of core_busy.
scaled_busy ultimatley defines the target pstate
(CPU frequency) verses load (C0) response curve.
The target pstate will be held at minimum until the load
is larger than the c0_floor. Thereafter, the response
is roughly linear until the maximum target pstate is
reached at the c0_ceiling.
A larger co_floor and lesser c0_ceiling tends towards
minimum energy, at a cost of performance and slower rising
edge load response times. A lesser c0_floor and larger
c0_ceiling tends towards more energy consumption, but
better performance and faster rising edge load response
times. Note, for falling edge loads, response times are
dominated by durations, and this driver runs very rarely.
c0_floor and c0_ceiling are available in the debugfs.
c0_floor and c0_ceiling are in units of tenths of a percent.

Signed-off-by: Doug Smythies <dsmythies@telus.net>
---
 Documentation/cpu-freq/intel-pstate.txt |  2 +
 drivers/cpufreq/intel_pstate.c          | 87 +++++++++++++++++++++++----------
 2 files changed, 63 insertions(+), 26 deletions(-)

diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt
index 6557507..583a048 100644
--- a/Documentation/cpu-freq/intel-pstate.txt
+++ b/Documentation/cpu-freq/intel-pstate.txt
@@ -56,6 +56,8 @@ For legacy mode debugfs files have also been added to allow tuning of
 the internal governor algorythm. These files are located at
 /sys/kernel/debug/pstate_snb/ These files are NOT present in HWP mode.
 
+      c0_ceiling
+      c0_floor
       deadband
       d_gain_pct
       i_gain_pct
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index f181ce5..ddc3602 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -121,6 +121,8 @@ struct pstate_adjust_policy {
 	int p_gain_pct;
 	int d_gain_pct;
 	int i_gain_pct;
+	int c0_ceiling;
+	int c0_floor;
 };
 
 struct pstate_funcs {
@@ -313,6 +315,8 @@ static struct pid_param pid_files[] = {
 	{"deadband", &pid_params.deadband},
 	{"setpoint", &pid_params.setpoint},
 	{"p_gain_pct", &pid_params.p_gain_pct},
+	{"c0_ceiling", &pid_params.c0_ceiling},
+	{"c0_floor", &pid_params.c0_floor},
 	{NULL, NULL}
 };
 
@@ -624,6 +628,8 @@ static struct cpu_defaults core_params = {
 		.p_gain_pct = 20,
 		.d_gain_pct = 0,
 		.i_gain_pct = 0,
+		.c0_ceiling = 950,
+		.c0_floor = 450,
 	},
 	.funcs = {
 		.get_max = core_get_max_pstate,
@@ -642,6 +648,8 @@ static struct cpu_defaults byt_params = {
 		.p_gain_pct = 14,
 		.d_gain_pct = 0,
 		.i_gain_pct = 4,
+		.c0_ceiling = 950,
+		.c0_floor = 450,
 	},
 	.funcs = {
 		.get_max = byt_get_max_pstate,
@@ -720,6 +728,14 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu)
 			cpu->pstate.max_pstate * cpu->pstate.scaling / 100),
 			core_pct));
 
+	core_pct = int_tofp(sample->mperf) * int_tofp(1000);
+	core_pct = div64_u64(core_pct, int_tofp(sample->tsc));
+
+	/*
+	 * Basically CO (or load) has been calculated
+	 * in units of tenths of a percent
+	*/
+
 	sample->core_pct_busy = (int32_t)core_pct;
 }
 
@@ -769,43 +785,60 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 
 static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
 {
-	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
+	int64_t scaled_busy, max, min, nom;
 	u32 duration_us;
-	u32 sample_time;
 
 	/*
-	 * core_busy is the ratio of actual performance to max
-	 * max_pstate is the max non turbo pstate available
-	 * current_pstate was the pstate that was requested during
-	 * 	the last sample period.
+	 * The target pstate veres CPU load is adjusted
+	 * as per the desired floor and ceiling values.
+	 * this is a simple y = mx + b line defined by
+	 * c0_floor results in minimum pstate percent
+	 * c0_ceiling results in maximum pstate percent
 	 *
-	 * We normalize core_busy, which was our actual percent
-	 * performance to what we requested during the last sample
-	 * period. The result will be a percentage of busy at a
-	 * specified pstate.
+	 * carry an extra digit herein.
 	 */
-	core_busy = cpu->sample.core_pct_busy;
-	max_pstate = int_tofp(cpu->pstate.max_pstate);
-	current_pstate = int_tofp(cpu->pstate.current_pstate);
-	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
+
+	if (limits.no_turbo || limits.turbo_disabled)
+		max = int_tofp(cpu->pstate.max_pstate);
+	else
+		max = int_tofp(cpu->pstate.turbo_pstate);
+
+	nom = int_tofp(cpu->pstate.max_pstate);
+	min = int_tofp(cpu->pstate.min_pstate);
+	max = div_u64(max * int_tofp(1000), nom);
+	min = div_u64(min * int_tofp(1000), nom);
+	nom = int_tofp(pid_params.c0_floor);
 
 	/*
-	 * Since we have a deferred timer, it will not fire unless
-	 * we are in C0.  So, determine if the actual elapsed time
-	 * is significantly greater (3x) than our sample interval.  If it
-	 * is, then we were idle for a long enough period of time
-	 * to adjust our busyness.
+	 * Idle check.
+	 * Since we have a deferable timer, it will not fire unless
+	 * we are in the C0 state on a jiffy boundary.  Very long
+	 * durations can be either due to long idle (C0 time near 0),
+	 * or due to short idle times that spaned jiffy boundaries
+	 * (C0 time not near zreo).
+	 * The very long durations are 0.5 seconds or more.
+	 * The very low C0 threshold of 0.1 percent is arbitrary,
+	 * but it should be a small number.
+	 * recall that the units of core_pct_busy are tenths of a percent.
+	 *
+	 * Note: the use of this calculation will become clear in the next patch
 	 */
-	sample_time = pid_params.sample_rate_ms  * USEC_PER_MSEC;
 	duration_us = (u32) ktime_us_delta(cpu->sample.time,
 					   cpu->last_sample_time);
-	if (duration_us > sample_time * 3) {
-		sample_ratio = div_fp(int_tofp(sample_time),
-				      int_tofp(duration_us));
-		core_busy = mul_fp(core_busy, sample_ratio);
-	}
+	if (duration_us > 500000 && cpu->sample.core_pct_busy < int_tofp(1))
+		return (int32_t) 0;
+
+	if (cpu->sample.core_pct_busy <= nom)
+		return (int32_t) 0;
+
+	scaled_busy = div_u64((max - min) * (cpu->sample.core_pct_busy - nom),
+		(int_tofp(pid_params.c0_ceiling) - nom)) + min;
+
+	/*
+	 * Return an extra digit, tenths of a percent.
+	 */
+	return (int32_t) scaled_busy;
 
-	return core_busy;
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
@@ -1065,6 +1098,8 @@ static void copy_pid_params(struct pstate_adjust_policy *policy)
 	pid_params.d_gain_pct = policy->d_gain_pct;
 	pid_params.deadband = policy->deadband;
 	pid_params.setpoint = policy->setpoint;
+	pid_params.c0_ceiling = policy->c0_ceiling;
+	pid_params.c0_floor = policy->c0_floor;
 }
 
 static void copy_cpu_funcs(struct pstate_funcs *funcs)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] intel_pstate: Calculate target pstate directly.
  2015-04-12  4:10 [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Doug Smythies
  2015-04-12  4:10 ` [PATCH 1/5] intel_pstate: Add tsc collection and keep previous target pstate. Add both to trace Doug Smythies
  2015-04-12  4:10 ` [PATCH 2/5] intel_pstate: Use C0 time for busy calculations (again) Doug Smythies
@ 2015-04-12  4:10 ` Doug Smythies
  2015-04-12  4:10 ` [PATCH 4/5] intel_pstate: Compensate for intermediate durations (v2) Doug Smythies
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Doug Smythies @ 2015-04-12  4:10 UTC (permalink / raw)
  To: kristen, rjw; +Cc: dsmythies, linux-pm

This patch eliminates the use of the PID controller
and calculates the target pstate directly.
This driver is subject to many complex situations,
beat frequencies and enormous variations in sample
time periods. It is not best suited to PID control.
In signal processing, derivatives are difficult at
the best of times, additionally so with  the extremely
variable sampling times, and the load / sleep sampling
interactions (beat frequencies).
For the integral term, this isn't really an integrate
and eventually null out the error scenario.
The remaining proportinal term can be better handled with
a different type of filter.
The PID driver, with fractional gains, also gets into
tradeoff difficutlites between setpoint, optimal gain
settings and finite integer arthimetic.
For now, the PID code, and a dummy call to it, remains.
Calculate the target pstate directly and add a simple
IIR output filter to the target PState.
As a function of load / sleep frequency and how it beats
against this drivers sampling times, the driver has a
tendency to be underdamped and to oscillate, requiring
a bandwidth limiting filter on the target PState.
It is a simple IIR (Infinite Impulse Response) type filter with the
purpose of damping down the inherent oscillations caused by a
sampled system that can have measured loads extremes in any
given sample. The /sys/kernel/debug/pstate_snb/p_gain_pct has
been re-tasked to be the gain for this filter. Optimal gain
setting is a tradeoff between response time and adequate
damping. The idle check flushes the filter if required,
as do the initialization routines.

Signed-off-by: Doug Smythies <dsmythies@telus.net>
---
 drivers/cpufreq/intel_pstate.c | 49 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index ddc3602..0b38d17 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -68,6 +68,7 @@ struct sample {
 	u64 aperf;
 	u64 mperf;
 	u64 tsc;
+	u64 target;
 	int freq;
 	ktime_t time;
 };
@@ -645,7 +646,7 @@ static struct cpu_defaults byt_params = {
 		.sample_rate_ms = 10,
 		.deadband = 0,
 		.setpoint = 97,
-		.p_gain_pct = 14,
+		.p_gain_pct = 20,
 		.d_gain_pct = 0,
 		.i_gain_pct = 4,
 		.c0_ceiling = 950,
@@ -713,6 +714,7 @@ static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
 	if (pstate_funcs.get_vid)
 		pstate_funcs.get_vid(cpu);
 	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate);
+	cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
 }
 
 static inline void intel_pstate_calc_busy(struct cpudata *cpu)
@@ -820,13 +822,14 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
 	 * The very low C0 threshold of 0.1 percent is arbitrary,
 	 * but it should be a small number.
 	 * recall that the units of core_pct_busy are tenths of a percent.
-	 *
-	 * Note: the use of this calculation will become clear in the next patch
+	 * If prolonged idle is detected, then flush the IIR filter,
+	 * otherwise falling edge load response times can be on the order
+	 * of tens of seconds, because this driver runs very rarely.
 	 */
 	duration_us = (u32) ktime_us_delta(cpu->sample.time,
 					   cpu->last_sample_time);
 	if (duration_us > 500000 && cpu->sample.core_pct_busy < int_tofp(1))
-		return (int32_t) 0;
+		cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
 
 	if (cpu->sample.core_pct_busy <= nom)
 		return (int32_t) 0;
@@ -838,7 +841,6 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
 	 * Return an extra digit, tenths of a percent.
 	 */
 	return (int32_t) scaled_busy;
-
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
@@ -848,16 +850,48 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 	signed int ctl;
 	int from;
 	struct sample *sample;
+	int64_t max, min, nom, pmin, prange, scaled, target;
 
 	from = cpu->pstate.current_pstate;
 
 	pid = &cpu->pid;
 	busy_scaled = intel_pstate_get_scaled_busy(cpu);
 
+	scaled = (int64_t) busy_scaled;
+
+	/*
+	 * a null, for now. Will be removed in a future patch.
+	 * strictly: ctl = pid_calc(pid, busy_scaled / 10);
+	 * but it now a temp dummy call, so do not waste
+	 * the divide clock cycles.
+	 */
 	ctl = pid_calc(pid, busy_scaled);
 
-	/* Negative values of ctl increase the pstate and vice versa */
-	intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);
+	if (limits.no_turbo || limits.turbo_disabled)
+		max = int_tofp(cpu->pstate.max_pstate);
+	else
+		max = int_tofp(cpu->pstate.turbo_pstate);
+
+	pmin = int_tofp(cpu->pstate.min_pstate);
+	prange = max - pmin;
+	nom = int_tofp(cpu->pstate.max_pstate);
+	max = div_u64(max * int_tofp(1000), nom);
+	min = div_u64(pmin * int_tofp(1000), nom);
+
+	if ((scaled - min) <= 0)
+		target = int_tofp(cpu->pstate.min_pstate);
+	else
+		target = div_u64(prange * (scaled-min), (max - min)) + pmin;
+	/*
+	 * Bandwidth limit the output. Re-task p_gain_pct for this purpose.
+	 */
+	target = div_u64((int_tofp(100 - pid_params.p_gain_pct) *
+		cpu->sample.target + int_tofp(pid_params.p_gain_pct) *
+		target), int_tofp(100));
+	cpu->sample.target = target;
+
+	target = target + (1 << (FRAC_BITS-1));
+	intel_pstate_set_pstate(cpu, fp_toint(target));
 
 	sample = &cpu->sample;
 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
@@ -1020,6 +1054,7 @@ static void intel_pstate_stop_cpu(struct cpufreq_policy *policy)
 		return;
 
 	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate);
+	cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
 }
 
 static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] intel_pstate: Compensate for intermediate durations (v2).
  2015-04-12  4:10 [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Doug Smythies
                   ` (2 preceding siblings ...)
  2015-04-12  4:10 ` [PATCH 3/5] intel_pstate: Calculate target pstate directly Doug Smythies
@ 2015-04-12  4:10 ` Doug Smythies
  2015-04-12  4:10 ` [PATCH 5/5] intel_pstate: Adjust default IIR filter gain Doug Smythies
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Doug Smythies @ 2015-04-12  4:10 UTC (permalink / raw)
  To: kristen, rjw; +Cc: dsmythies, linux-pm

On the falling edge of lower frequency periodic loads the
duration is always longer than the rising edge. The result
is a tendancy for the average target pstate to end up a
little high due to what basically ends up as asymetric
weighting. Note that at some limit point a lower frequency
periodic load has to be considered as separate 100 percent load
followed by idle events.
This patch modifies the IIR filter gain as a function of
duration so as to more properly represent the longer duration
cases. In the limit the IIR filter history is flushed with the
new value.

Signed-off-by: Doug Smythies <dsmythies@telus.net>
---
 drivers/cpufreq/intel_pstate.c | 79 ++++++++++++++++++++++++++----------------
 1 file changed, 49 insertions(+), 30 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 0b38d17..66e662d 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -788,7 +788,6 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
 {
 	int64_t scaled_busy, max, min, nom;
-	u32 duration_us;
 
 	/*
 	 * The target pstate veres CPU load is adjusted
@@ -811,26 +810,6 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
 	min = div_u64(min * int_tofp(1000), nom);
 	nom = int_tofp(pid_params.c0_floor);
 
-	/*
-	 * Idle check.
-	 * Since we have a deferable timer, it will not fire unless
-	 * we are in the C0 state on a jiffy boundary.  Very long
-	 * durations can be either due to long idle (C0 time near 0),
-	 * or due to short idle times that spaned jiffy boundaries
-	 * (C0 time not near zreo).
-	 * The very long durations are 0.5 seconds or more.
-	 * The very low C0 threshold of 0.1 percent is arbitrary,
-	 * but it should be a small number.
-	 * recall that the units of core_pct_busy are tenths of a percent.
-	 * If prolonged idle is detected, then flush the IIR filter,
-	 * otherwise falling edge load response times can be on the order
-	 * of tens of seconds, because this driver runs very rarely.
-	 */
-	duration_us = (u32) ktime_us_delta(cpu->sample.time,
-					   cpu->last_sample_time);
-	if (duration_us > 500000 && cpu->sample.core_pct_busy < int_tofp(1))
-		cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
-
 	if (cpu->sample.core_pct_busy <= nom)
 		return (int32_t) 0;
 
@@ -850,7 +829,9 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 	signed int ctl;
 	int from;
 	struct sample *sample;
-	int64_t max, min, nom, pmin, prange, scaled, target;
+	int64_t max, min, nom, pmin, prange, scaled, unfiltered_target;
+	u32 duration_us;
+	u32 sample_time;
 
 	from = cpu->pstate.current_pstate;
 
@@ -879,19 +860,57 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 	min = div_u64(pmin * int_tofp(1000), nom);
 
 	if ((scaled - min) <= 0)
-		target = int_tofp(cpu->pstate.min_pstate);
+		unfiltered_target = int_tofp(cpu->pstate.min_pstate);
 	else
-		target = div_u64(prange * (scaled-min), (max - min)) + pmin;
+		unfiltered_target = div_u64(prange * (scaled-min),
+		(max - min)) + pmin;
+
+	/*
+	 * Idle check.
+	 * Since we have a deferable timer, it will not fire unless
+	 * we are in the C0 state on a jiffy boundary.  Very long
+	 * durations can be either due to long idle (C0 time near 0),
+	 * or due to short idle times that spaned jiffy boundaries
+	 * (C0 time not near zreo).
+	 * The very long durations are 0.5 seconds or more.
+	 * Recall that the units of core_pct_busy are tenths of a percent.
+	 * Either way, a very long duration will effectively flush
+	 * the IIR filter, otherwise falling edge load response times
+	 * can be on the order of tens of seconds, because this driver
+	 * runs very rarely. Furthermore, for higher periodic loads that
+	 * just so happen to not be in the C0 state on jiffy boundaries,
+	 * the long ago history should be forgotten.
+	 * For cases of durations that are a few times the set sample
+	 * period, increase the IIR filter gain so as to weight
+	 * the sample more appropriately.
+	 *
+	 * To Do: sample_time should be forced to be accurate. For
+	 * example if the kernel is a 250 Hz kernel, then a
+	 * sample_rate_ms of 10 should result in a sample_time of 12.
+	 */
+	sample_time = pid_params.sample_rate_ms  * USEC_PER_MSEC;
+	duration_us = (u32) ktime_us_delta(cpu->sample.time,
+		cpu->last_sample_time);
+	scaled = div_u64(int_tofp(duration_us) *
+		int_tofp(pid_params.p_gain_pct), int_tofp(sample_time));
+	if (scaled > int_tofp(100))
+		scaled = int_tofp(100);
+	/*
+	 * This code should not be required,
+	 * but short duration times have been observed
+	 */
+	if (scaled < int_tofp(pid_params.p_gain_pct))
+		scaled = int_tofp(pid_params.p_gain_pct);
+
 	/*
 	 * Bandwidth limit the output. Re-task p_gain_pct for this purpose.
 	 */
-	target = div_u64((int_tofp(100 - pid_params.p_gain_pct) *
-		cpu->sample.target + int_tofp(pid_params.p_gain_pct) *
-		target), int_tofp(100));
-	cpu->sample.target = target;
+	cpu->sample.target = div_u64((int_tofp(100) - scaled) *
+			cpu->sample.target + scaled *
+			unfiltered_target, int_tofp(100));
 
-	target = target + (1 << (FRAC_BITS-1));
-	intel_pstate_set_pstate(cpu, fp_toint(target));
+	intel_pstate_set_pstate(cpu, fp_toint(cpu->sample.target +
+			(1 << (FRAC_BITS-1))));
 
 	sample = &cpu->sample;
 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] intel_pstate: Adjust default IIR filter gain.
  2015-04-12  4:10 [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Doug Smythies
                   ` (3 preceding siblings ...)
  2015-04-12  4:10 ` [PATCH 4/5] intel_pstate: Compensate for intermediate durations (v2) Doug Smythies
@ 2015-04-12  4:10 ` Doug Smythies
  2015-04-13 16:32 ` [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Kristen Carlson Accardi
  2015-04-16 16:35 ` Doug Smythies
  6 siblings, 0 replies; 14+ messages in thread
From: Doug Smythies @ 2015-04-12  4:10 UTC (permalink / raw)
  To: kristen, rjw; +Cc: dsmythies, linux-pm

A default IIR filter gain of 10 percent is
recommended. It best matches a rising edge step
function load increase response time to the
previous version of this driver. Although not by
a lot, it better damps lower frequency load / sleep
rate response oscillations.

Signed-off-by: Doug Smythies <dsmythies@telus.net>
---
 drivers/cpufreq/intel_pstate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 66e662d..a374309 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -626,7 +626,7 @@ static struct cpu_defaults core_params = {
 		.sample_rate_ms = 10,
 		.deadband = 0,
 		.setpoint = 97,
-		.p_gain_pct = 20,
+		.p_gain_pct = 10,
 		.d_gain_pct = 0,
 		.i_gain_pct = 0,
 		.c0_ceiling = 950,
@@ -646,7 +646,7 @@ static struct cpu_defaults byt_params = {
 		.sample_rate_ms = 10,
 		.deadband = 0,
 		.setpoint = 97,
-		.p_gain_pct = 20,
+		.p_gain_pct = 10,
 		.d_gain_pct = 0,
 		.i_gain_pct = 4,
 		.c0_ceiling = 950,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly
  2015-04-12  4:10 [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Doug Smythies
                   ` (4 preceding siblings ...)
  2015-04-12  4:10 ` [PATCH 5/5] intel_pstate: Adjust default IIR filter gain Doug Smythies
@ 2015-04-13 16:32 ` Kristen Carlson Accardi
  2015-04-13 21:16   ` Doug Smythies
  2015-04-16 16:35 ` Doug Smythies
  6 siblings, 1 reply; 14+ messages in thread
From: Kristen Carlson Accardi @ 2015-04-13 16:32 UTC (permalink / raw)
  To: Doug Smythies; +Cc: rjw, dsmythies, linux-pm

Hi Doug,

On Sat, 11 Apr 2015 21:10:25 -0700
Doug Smythies <doug.smythies@gmail.com> wrote:

> This patch set attempts to address some issues with
> the intel_pstate driver:
> 
> The driver has a finicky tradeoffs between fixed point
> integer math and gain settings and the setpoint.
> 
> The application is not really best suited to a PID
> (Proportional Integral Derivative) controller.

Agree!

> 
> The duration method can incorrectly drive the target
> pstate downwards, because a long duration might not
> be due to idle.
> 
> The driver can incorrectly drive the target pstate
> upwards, when the C0 time is virtually 0.
> 
> The patch set brings back the use of C0 time as the driving
> input as to what to do next in terms of target pstate.
> Recall that C0 time was used once before and was eventually
> removed due to situations of pstate lockup at a low pstate.
> Why is this time better? Because it is scaled properly,
> whereas it was not before. Also the response curve is
> flexible and adjustable.
> 
> Patch 1 basically adds better debug information to the
> perf data. It was originally requested in June, and Dirk agreed
> to include it in September. Regardless of the acceptance or
> rejection of this whole patch set, please accept this one.
> scripts/checkpatch.pl complains about one line as over 80
> characters, not sure how to split the line.
> 
> Patch 2 brings back the use of C0 time for the calculation
> of core_busy. Floor and ceiling hold values can be adjusted
> Once the floor load is reached the response is linear until
> the ceiling load is reached. The equations are just a simple
> y=mx+b line, where m = slope and b = intercept.
> Equation 1: min = m * floor + b
> Equation 2: max = m * ceiling + b
> Two equations and two unknowns, solved and coded.
> scripts/checkpatch.pl O.K.
> 
> Patch 3 bypasses the PID controller and calculates the
> target pstate directly. An IIR (Infinite Impulse
> Response) filter takes care of filtering.
> Again a simple y=mx+b line.
> Equation 1: pmin = m * min + b
> Equation 2: pmax = m * max + b
> Again, two simple equations and two unknowns, solved
> and coded.
> scripts/checkpatch.pl O.K.
> Note: not the same min and max as in patch 2.
> 
> Note: There are two definitions of percent within
> this driver, one goes to 100% at the max turbo (if
> it is enabled) and the other is 100% at the max
> non-turbo (enabled or not). It can be very confusing.
> The min used in patch 3 must be the mathematical min
> for the processor under the current turbo enabled or
> disabled state, which might not be the same
> as /sys/devices/system/cpu/intel_pstate/min_perf_pct
> when turbo is enabled.
> 
> Patch 4 adds weighting to longer duration events to,
> at least partially, compensate for the asymmetric
> rising and falling load typical durations.
> scripts/checkpatch.pl O.K.
> 
> Patch 5 just adjusts the default IIR gain.
> 10 percent gives a rising edge response time similar
> to the current version of the driver.
> scripts/checkpatch.pl O.K.
> 
> Anticipated questions:
> 
> Q: Shouldn't the response curve be normalized to a reference
> pstate, perhaps the max non-turbo pstate?
> 
> A: Doing so can create a lockup scenario, depending on the
> response curve settings and the overall pstate range of the
> particular processor. A downside to not normalizing has not been
> detected.
> 
> Q: One of the patches in the set is titled V2. What was V1?
> 
> A: To a limit, V1 ran the IIR filter N times when the duration
> was N times nominal. The method was expensive in terms of
> multiplies and divides. V2 saves a lot of clock cycles, and
> is good enough.
> 
> Migration path:
> 
> It should not be assumed that the current pstate is actually
> what was in effect during the last sample time. If a higher
> pstate was actually used, due to another active CPU, then
> that should override the use of the current pstate in the filter.
> This issue also exists in the current version of the driver,
> and was only discovered a few days ago. see also:
> https://bugzilla.kernel.org/show_bug.cgi?id=93521#c55
> This will be addressed, if possible, in a future patch.
> 
> There are likely some math (multiply and divide) savings that
> can be realized in future patches.
> 
> While the use of the PID controller has been eliminated, and
> because the change is likely to be controversial, the
> code has been left intact. It will be removed in a future patch.

I am completely supportive of removing the PID controller.

> 
> Doug Smythies (5):
>   intel_pstate: Add tsc collection and keep previous target pstate. Add
>     both to trace.
>   intel_pstate: Use C0 time for busy calculations (again).
>   intel_pstate: Calculate target pstate directly.
>   intel_pstate: Compensate for intermediate durations (v2).
>   intel_pstate: Adjust default IIR filter gain.
> 
>  Documentation/cpu-freq/intel-pstate.txt |   2 +
>  drivers/cpufreq/intel_pstate.c          | 188 ++++++++++++++++++++++++--------
>  include/trace/events/power.h            |  25 +++--
>  3 files changed, 163 insertions(+), 52 deletions(-)
> 

Thanks for your patches, I'll be doing some testing and review and let
you know how things go.

Kristen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly
  2015-04-13 16:32 ` [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Kristen Carlson Accardi
@ 2015-04-13 21:16   ` Doug Smythies
  0 siblings, 0 replies; 14+ messages in thread
From: Doug Smythies @ 2015-04-13 21:16 UTC (permalink / raw)
  To: 'Kristen Carlson Accardi'; +Cc: rjw, linux-pm

Hi Kristen,

Thanks for your quick and positive reply.

On 2015.04.13 09:32 Kristen wrote:
> On Sat, 11 Apr 2015 21:10:25 -0700 Doug Smythies wrote:

> Thanks for your patches, I'll be doing some testing and review and let
> you know how things go.

And from the other thread:

> I'm going to need to do a lot of
> benchmarking on a variety of platforms to make sure we haven't regressed
> anything.

Yes, of course.

I make no claim to have found the best default operating parameters,
IIR gain, c0_floor, c0_ceiling, sample_rate_ms.

I only have a i7-2600K to test with.

Other results:

Phoronix: Kernel compile:
acpi-cpufreq: 122.4 seconds
4.0RC7, unpatched: 121.8 seconds
Patched: 122.4 seconds

Phoronix: Apache:
acpi-cpufreq: 19429
4.0RC7, unpatched: 20089
Patched: 20410

Phoronix: ffmpeg:
acpi-cpufreq: 17.06 seconds and 1444.36 Joules Package Energy.
4.0RC7, unpatched: 17.65 seconds and 1444.36 Joules
Patched: 23.1 seconds and 1302 Joules

Note: Dirk proved that the ffmpeg test does odd switching of CPUs
when they become 50% loaded. With the settings I have used for c0_floor
and c0_ceiling, these results are not a surprise. 
 
2000 Seconds idle (server): 
Unpatched: 7704 Joules
Patched: 7625 Joules.

An i5-4690K running some Ubuntu desktop, but otherwise idle:
Average CPU frequency goes way down and power consumption drops by just over half a watt.

An i7-4790K running xorg and Arch distro, but otherwise idle:
Average CPU frequency drops considerably, and power consumption is about the same.
Note: unpatched CPU frequencies were not as high as previously reported on that bugzilla report.
I do not know why.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=93521

Still waiting to hear from the other i7-4790K user that had high CPU frequency when idle issues.

There is a bit of evidence that the unpatched reference kernel might do a little
better at holding CPU frequencies lower in the low load 20 Hz work / sleep frequency range.
I'm still looking at that scenario.

There are many graphs at:
Double u double u double u dot smythies dot com /~doug/linux/intel_pstate/build220/index.html

... Doug



^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly
  2015-04-12  4:10 [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Doug Smythies
                   ` (5 preceding siblings ...)
  2015-04-13 16:32 ` [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Kristen Carlson Accardi
@ 2015-04-16 16:35 ` Doug Smythies
  2015-04-17  0:46   ` Rafael J. Wysocki
  6 siblings, 1 reply; 14+ messages in thread
From: Doug Smythies @ 2015-04-16 16:35 UTC (permalink / raw)
  To: kristen, rjw; +Cc: linux-pm

On 2015.04.11 09:10 Doug Smythies wrote:

> Patch 1 basically adds better debug information to the
> perf data. It was originally requested in June, and Dirk agreed
> to include it in September. Regardless of the acceptance or
> rejection of this whole patch set, please accept this one.
> scripts/checkpatch.pl complains about one line as over 80
> characters, not sure how to split the line.

Please consider to accept patch 1 of 5 for inclusion
in Kernel 4.1RC1.

I understand the need to do regression and multiple platform
testing on the rest of the patch set, but this one
just adds some useful debug information.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly
  2015-04-16 16:35 ` Doug Smythies
@ 2015-04-17  0:46   ` Rafael J. Wysocki
  0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2015-04-17  0:46 UTC (permalink / raw)
  To: Doug Smythies, kristen; +Cc: linux-pm

On Thursday, April 16, 2015 09:35:26 AM Doug Smythies wrote:
> On 2015.04.11 09:10 Doug Smythies wrote:
> 
> > Patch 1 basically adds better debug information to the
> > perf data. It was originally requested in June, and Dirk agreed
> > to include it in September. Regardless of the acceptance or
> > rejection of this whole patch set, please accept this one.
> > scripts/checkpatch.pl complains about one line as over 80
> > characters, not sure how to split the line.
> 
> Please consider to accept patch 1 of 5 for inclusion
> in Kernel 4.1RC1.
> 
> I understand the need to do regression and multiple platform
> testing on the rest of the patch set, but this one
> just adds some useful debug information.

Kristen, what do you think?


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] intel_pstate: Add tsc collection and keep previous target pstate. Add both to trace.
  2015-04-12  4:10 ` [PATCH 1/5] intel_pstate: Add tsc collection and keep previous target pstate. Add both to trace Doug Smythies
@ 2015-04-29 16:57   ` Kristen Carlson Accardi
  2015-05-04 23:34     ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Kristen Carlson Accardi @ 2015-04-29 16:57 UTC (permalink / raw)
  To: Doug Smythies; +Cc: rjw, dsmythies, linux-pm

On Sat, 11 Apr 2015 21:10:26 -0700
Doug Smythies <doug.smythies@gmail.com> wrote:

> The intel_pstate driver is difficult to debug and investigate without tsc.
> Also, it is likely use of tsc, and some version of C0 percentage,
> will be re-introdcued in futute.
> There have also been occasions where it is desirebale to know, and
> confirm, the previous target pstate.
> This patch brings back tsc, adds previous target pstate,
> and adds both to the trace data collection.
> 
> Signed-off-by: Doug Smythies <dsmythies@telus.net>

Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>

> ---
>  drivers/cpufreq/intel_pstate.c | 31 +++++++++++++++++++++----------
>  include/trace/events/power.h   | 25 +++++++++++++++++--------
>  2 files changed, 38 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 872c577..f181ce5 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -67,6 +67,7 @@ struct sample {
>  	int32_t core_pct_busy;
>  	u64 aperf;
>  	u64 mperf;
> +	u64 tsc;
>  	int freq;
>  	ktime_t time;
>  };
> @@ -108,6 +109,7 @@ struct cpudata {
>  	ktime_t last_sample_time;
>  	u64	prev_aperf;
>  	u64	prev_mperf;
> +	u64	prev_tsc;
>  	struct sample sample;
>  };
>  
> @@ -725,23 +727,28 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
>  {
>  	u64 aperf, mperf;
>  	unsigned long flags;
> +	u64 tsc;
>  
>  	local_irq_save(flags);
>  	rdmsrl(MSR_IA32_APERF, aperf);
>  	rdmsrl(MSR_IA32_MPERF, mperf);
> +	tsc = native_read_tsc();
>  	local_irq_restore(flags);
>  
>  	cpu->last_sample_time = cpu->sample.time;
>  	cpu->sample.time = ktime_get();
>  	cpu->sample.aperf = aperf;
>  	cpu->sample.mperf = mperf;
> +	cpu->sample.tsc =  tsc;
>  	cpu->sample.aperf -= cpu->prev_aperf;
>  	cpu->sample.mperf -= cpu->prev_mperf;
> +	cpu->sample.tsc -= cpu->prev_tsc;
>  
>  	intel_pstate_calc_busy(cpu);
>  
>  	cpu->prev_aperf = aperf;
>  	cpu->prev_mperf = mperf;
> +	cpu->prev_tsc = tsc;
>  }
>  
>  static inline void intel_hwp_set_sample_time(struct cpudata *cpu)
> @@ -806,6 +813,10 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
>  	int32_t busy_scaled;
>  	struct _pid *pid;
>  	signed int ctl;
> +	int from;
> +	struct sample *sample;
> +
> +	from = cpu->pstate.current_pstate;
>  
>  	pid = &cpu->pid;
>  	busy_scaled = intel_pstate_get_scaled_busy(cpu);
> @@ -814,6 +825,16 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
>  
>  	/* Negative values of ctl increase the pstate and vice versa */
>  	intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);
> +
> +	sample = &cpu->sample;
> +	trace_pstate_sample(fp_toint(sample->core_pct_busy),
> +		fp_toint(busy_scaled),
> +		from,
> +		cpu->pstate.current_pstate,
> +		sample->mperf,
> +		sample->aperf,
> +		sample->tsc,
> +		sample->freq);
>  }
>  
>  static void intel_hwp_timer_func(unsigned long __data)
> @@ -827,21 +848,11 @@ static void intel_hwp_timer_func(unsigned long __data)
>  static void intel_pstate_timer_func(unsigned long __data)
>  {
>  	struct cpudata *cpu = (struct cpudata *) __data;
> -	struct sample *sample;
>  
>  	intel_pstate_sample(cpu);
>  
> -	sample = &cpu->sample;
> -
>  	intel_pstate_adjust_busy_pstate(cpu);
>  
> -	trace_pstate_sample(fp_toint(sample->core_pct_busy),
> -			fp_toint(intel_pstate_get_scaled_busy(cpu)),
> -			cpu->pstate.current_pstate,
> -			sample->mperf,
> -			sample->aperf,
> -			sample->freq);
> -
>  	intel_pstate_set_sample_time(cpu);
>  }
>  
> diff --git a/include/trace/events/power.h b/include/trace/events/power.h
> index d19840b..630d1e5 100644
> --- a/include/trace/events/power.h
> +++ b/include/trace/events/power.h
> @@ -42,45 +42,54 @@ TRACE_EVENT(pstate_sample,
>  
>  	TP_PROTO(u32 core_busy,
>  		u32 scaled_busy,
> -		u32 state,
> +		u32 from,
> +		u32 to,
>  		u64 mperf,
>  		u64 aperf,
> +		u64 tsc,
>  		u32 freq
>  		),
>  
>  	TP_ARGS(core_busy,
>  		scaled_busy,
> -		state,
> +		from,
> +		to,
>  		mperf,
>  		aperf,
> +		tsc,
>  		freq
>  		),
>  
>  	TP_STRUCT__entry(
>  		__field(u32, core_busy)
>  		__field(u32, scaled_busy)
> -		__field(u32, state)
> +		__field(u32, from)
> +		__field(u32, to)
>  		__field(u64, mperf)
>  		__field(u64, aperf)
> +		__field(u64, tsc)
>  		__field(u32, freq)
> -
> -	),
> +		),
>  
>  	TP_fast_assign(
>  		__entry->core_busy = core_busy;
>  		__entry->scaled_busy = scaled_busy;
> -		__entry->state = state;
> +		__entry->from = from;
> +		__entry->to = to;
>  		__entry->mperf = mperf;
>  		__entry->aperf = aperf;
> +		__entry->tsc = tsc;
>  		__entry->freq = freq;
>  		),
>  
> -	TP_printk("core_busy=%lu scaled=%lu state=%lu mperf=%llu aperf=%llu freq=%lu ",
> +	TP_printk("core_busy=%lu scaled=%lu from=%lu to=%lu mperf=%llu aperf=%llu tsc=%llu freq=%lu ",
>  		(unsigned long)__entry->core_busy,
>  		(unsigned long)__entry->scaled_busy,
> -		(unsigned long)__entry->state,
> +		(unsigned long)__entry->from,
> +		(unsigned long)__entry->to,
>  		(unsigned long long)__entry->mperf,
>  		(unsigned long long)__entry->aperf,
> +		(unsigned long long)__entry->tsc,
>  		(unsigned long)__entry->freq
>  		)
>  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] intel_pstate: Add tsc collection and keep previous target pstate. Add both to trace.
  2015-04-29 16:57   ` Kristen Carlson Accardi
@ 2015-05-04 23:34     ` Rafael J. Wysocki
  0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2015-05-04 23:34 UTC (permalink / raw)
  To: Kristen Carlson Accardi; +Cc: Doug Smythies, dsmythies, linux-pm

On Wednesday, April 29, 2015 09:57:07 AM Kristen Carlson Accardi wrote:
> On Sat, 11 Apr 2015 21:10:26 -0700
> Doug Smythies <doug.smythies@gmail.com> wrote:
> 
> > The intel_pstate driver is difficult to debug and investigate without tsc.
> > Also, it is likely use of tsc, and some version of C0 percentage,
> > will be re-introdcued in futute.
> > There have also been occasions where it is desirebale to know, and
> > confirm, the previous target pstate.
> > This patch brings back tsc, adds previous target pstate,
> > and adds both to the trace data collection.
> > 
> > Signed-off-by: Doug Smythies <dsmythies@telus.net>
> 
> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>

Queued up for 4.2 with minor changes in the changelog/subject, thanks!

> > ---
> >  drivers/cpufreq/intel_pstate.c | 31 +++++++++++++++++++++----------
> >  include/trace/events/power.h   | 25 +++++++++++++++++--------
> >  2 files changed, 38 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > index 872c577..f181ce5 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -67,6 +67,7 @@ struct sample {
> >  	int32_t core_pct_busy;
> >  	u64 aperf;
> >  	u64 mperf;
> > +	u64 tsc;
> >  	int freq;
> >  	ktime_t time;
> >  };
> > @@ -108,6 +109,7 @@ struct cpudata {
> >  	ktime_t last_sample_time;
> >  	u64	prev_aperf;
> >  	u64	prev_mperf;
> > +	u64	prev_tsc;
> >  	struct sample sample;
> >  };
> >  
> > @@ -725,23 +727,28 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
> >  {
> >  	u64 aperf, mperf;
> >  	unsigned long flags;
> > +	u64 tsc;
> >  
> >  	local_irq_save(flags);
> >  	rdmsrl(MSR_IA32_APERF, aperf);
> >  	rdmsrl(MSR_IA32_MPERF, mperf);
> > +	tsc = native_read_tsc();
> >  	local_irq_restore(flags);
> >  
> >  	cpu->last_sample_time = cpu->sample.time;
> >  	cpu->sample.time = ktime_get();
> >  	cpu->sample.aperf = aperf;
> >  	cpu->sample.mperf = mperf;
> > +	cpu->sample.tsc =  tsc;
> >  	cpu->sample.aperf -= cpu->prev_aperf;
> >  	cpu->sample.mperf -= cpu->prev_mperf;
> > +	cpu->sample.tsc -= cpu->prev_tsc;
> >  
> >  	intel_pstate_calc_busy(cpu);
> >  
> >  	cpu->prev_aperf = aperf;
> >  	cpu->prev_mperf = mperf;
> > +	cpu->prev_tsc = tsc;
> >  }
> >  
> >  static inline void intel_hwp_set_sample_time(struct cpudata *cpu)
> > @@ -806,6 +813,10 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
> >  	int32_t busy_scaled;
> >  	struct _pid *pid;
> >  	signed int ctl;
> > +	int from;
> > +	struct sample *sample;
> > +
> > +	from = cpu->pstate.current_pstate;
> >  
> >  	pid = &cpu->pid;
> >  	busy_scaled = intel_pstate_get_scaled_busy(cpu);
> > @@ -814,6 +825,16 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
> >  
> >  	/* Negative values of ctl increase the pstate and vice versa */
> >  	intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);
> > +
> > +	sample = &cpu->sample;
> > +	trace_pstate_sample(fp_toint(sample->core_pct_busy),
> > +		fp_toint(busy_scaled),
> > +		from,
> > +		cpu->pstate.current_pstate,
> > +		sample->mperf,
> > +		sample->aperf,
> > +		sample->tsc,
> > +		sample->freq);
> >  }
> >  
> >  static void intel_hwp_timer_func(unsigned long __data)
> > @@ -827,21 +848,11 @@ static void intel_hwp_timer_func(unsigned long __data)
> >  static void intel_pstate_timer_func(unsigned long __data)
> >  {
> >  	struct cpudata *cpu = (struct cpudata *) __data;
> > -	struct sample *sample;
> >  
> >  	intel_pstate_sample(cpu);
> >  
> > -	sample = &cpu->sample;
> > -
> >  	intel_pstate_adjust_busy_pstate(cpu);
> >  
> > -	trace_pstate_sample(fp_toint(sample->core_pct_busy),
> > -			fp_toint(intel_pstate_get_scaled_busy(cpu)),
> > -			cpu->pstate.current_pstate,
> > -			sample->mperf,
> > -			sample->aperf,
> > -			sample->freq);
> > -
> >  	intel_pstate_set_sample_time(cpu);
> >  }
> >  
> > diff --git a/include/trace/events/power.h b/include/trace/events/power.h
> > index d19840b..630d1e5 100644
> > --- a/include/trace/events/power.h
> > +++ b/include/trace/events/power.h
> > @@ -42,45 +42,54 @@ TRACE_EVENT(pstate_sample,
> >  
> >  	TP_PROTO(u32 core_busy,
> >  		u32 scaled_busy,
> > -		u32 state,
> > +		u32 from,
> > +		u32 to,
> >  		u64 mperf,
> >  		u64 aperf,
> > +		u64 tsc,
> >  		u32 freq
> >  		),
> >  
> >  	TP_ARGS(core_busy,
> >  		scaled_busy,
> > -		state,
> > +		from,
> > +		to,
> >  		mperf,
> >  		aperf,
> > +		tsc,
> >  		freq
> >  		),
> >  
> >  	TP_STRUCT__entry(
> >  		__field(u32, core_busy)
> >  		__field(u32, scaled_busy)
> > -		__field(u32, state)
> > +		__field(u32, from)
> > +		__field(u32, to)
> >  		__field(u64, mperf)
> >  		__field(u64, aperf)
> > +		__field(u64, tsc)
> >  		__field(u32, freq)
> > -
> > -	),
> > +		),
> >  
> >  	TP_fast_assign(
> >  		__entry->core_busy = core_busy;
> >  		__entry->scaled_busy = scaled_busy;
> > -		__entry->state = state;
> > +		__entry->from = from;
> > +		__entry->to = to;
> >  		__entry->mperf = mperf;
> >  		__entry->aperf = aperf;
> > +		__entry->tsc = tsc;
> >  		__entry->freq = freq;
> >  		),
> >  
> > -	TP_printk("core_busy=%lu scaled=%lu state=%lu mperf=%llu aperf=%llu freq=%lu ",
> > +	TP_printk("core_busy=%lu scaled=%lu from=%lu to=%lu mperf=%llu aperf=%llu tsc=%llu freq=%lu ",
> >  		(unsigned long)__entry->core_busy,
> >  		(unsigned long)__entry->scaled_busy,
> > -		(unsigned long)__entry->state,
> > +		(unsigned long)__entry->from,
> > +		(unsigned long)__entry->to,
> >  		(unsigned long long)__entry->mperf,
> >  		(unsigned long long)__entry->aperf,
> > +		(unsigned long long)__entry->tsc,
> >  		(unsigned long)__entry->freq
> >  		)
> >  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] intel_pstate: Use C0 time for busy calculations (again).
  2015-04-12  4:10 ` [PATCH 2/5] intel_pstate: Use C0 time for busy calculations (again) Doug Smythies
@ 2015-05-06 19:20   ` Kristen Carlson Accardi
  2015-05-07  6:17     ` Doug Smythies
  0 siblings, 1 reply; 14+ messages in thread
From: Kristen Carlson Accardi @ 2015-05-06 19:20 UTC (permalink / raw)
  To: Doug Smythies; +Cc: rjw, dsmythies, linux-pm

On Sat, 11 Apr 2015 21:10:27 -0700
Doug Smythies <doug.smythies@gmail.com> wrote:

> This patch brings back the inclusion of C0 time
> for the calculation of core_busy.
> scaled_busy ultimatley defines the target pstate
> (CPU frequency) verses load (C0) response curve.
> The target pstate will be held at minimum until the load
> is larger than the c0_floor. Thereafter, the response
> is roughly linear until the maximum target pstate is
> reached at the c0_ceiling.
> A larger co_floor and lesser c0_ceiling tends towards
> minimum energy, at a cost of performance and slower rising
> edge load response times. A lesser c0_floor and larger
> c0_ceiling tends towards more energy consumption, but
> better performance and faster rising edge load response
> times. Note, for falling edge loads, response times are
> dominated by durations, and this driver runs very rarely.
> c0_floor and c0_ceiling are available in the debugfs.
> c0_floor and c0_ceiling are in units of tenths of a percent.
> 
> Signed-off-by: Doug Smythies <dsmythies@telus.net>
> ---
>  Documentation/cpu-freq/intel-pstate.txt |  2 +
>  drivers/cpufreq/intel_pstate.c          | 87 +++++++++++++++++++++++----------
>  2 files changed, 63 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt
> index 6557507..583a048 100644
> --- a/Documentation/cpu-freq/intel-pstate.txt
> +++ b/Documentation/cpu-freq/intel-pstate.txt
> @@ -56,6 +56,8 @@ For legacy mode debugfs files have also been added to allow tuning of
>  the internal governor algorythm. These files are located at
>  /sys/kernel/debug/pstate_snb/ These files are NOT present in HWP mode.
>  
> +      c0_ceiling
> +      c0_floor
>        deadband
>        d_gain_pct
>        i_gain_pct
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index f181ce5..ddc3602 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -121,6 +121,8 @@ struct pstate_adjust_policy {
>  	int p_gain_pct;
>  	int d_gain_pct;
>  	int i_gain_pct;
> +	int c0_ceiling;
> +	int c0_floor;
>  };
>  
>  struct pstate_funcs {
> @@ -313,6 +315,8 @@ static struct pid_param pid_files[] = {
>  	{"deadband", &pid_params.deadband},
>  	{"setpoint", &pid_params.setpoint},
>  	{"p_gain_pct", &pid_params.p_gain_pct},
> +	{"c0_ceiling", &pid_params.c0_ceiling},
> +	{"c0_floor", &pid_params.c0_floor},
>  	{NULL, NULL}
>  };
>  
> @@ -624,6 +628,8 @@ static struct cpu_defaults core_params = {
>  		.p_gain_pct = 20,
>  		.d_gain_pct = 0,
>  		.i_gain_pct = 0,
> +		.c0_ceiling = 950,
> +		.c0_floor = 450,
>  	},
>  	.funcs = {
>  		.get_max = core_get_max_pstate,
> @@ -642,6 +648,8 @@ static struct cpu_defaults byt_params = {
>  		.p_gain_pct = 14,
>  		.d_gain_pct = 0,
>  		.i_gain_pct = 4,
> +		.c0_ceiling = 950,
> +		.c0_floor = 450,
>  	},
>  	.funcs = {
>  		.get_max = byt_get_max_pstate,
> @@ -720,6 +728,14 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu)
>  			cpu->pstate.max_pstate * cpu->pstate.scaling / 100),
>  			core_pct));
>  
> +	core_pct = int_tofp(sample->mperf) * int_tofp(1000);
> +	core_pct = div64_u64(core_pct, int_tofp(sample->tsc));

FYI - It's not actually valid to use the contents of mperf without aperf
according the the intel SDM:

"Only the IA32_APERF/IA32_MPERF ratio is architecturally defined;
software should not attach meaning to the content of the individual of IA32_APERF or IA32_MPERF MSRs."

There is no guarantee mperf and tsc are using the same clock.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 2/5] intel_pstate: Use C0 time for busy calculations (again).
  2015-05-06 19:20   ` Kristen Carlson Accardi
@ 2015-05-07  6:17     ` Doug Smythies
  0 siblings, 0 replies; 14+ messages in thread
From: Doug Smythies @ 2015-05-07  6:17 UTC (permalink / raw)
  To: 'Kristen Carlson Accardi'; +Cc: rjw, linux-pm, Doug Smythies

On 2015.05.06 Kristen Carlson Accardi wrote:
> On Sat, 11 Apr 2015 21:10:27 -0700 Doug Smythies <doug.smythies@gmail.com> wrote:

>> This patch brings back the inclusion of C0 time
>> for the calculation of core_busy.

...

>> +	core_pct = int_tofp(sample->mperf) * int_tofp(1000);
>> +	core_pct = div64_u64(core_pct, int_tofp(sample->tsc));

> FYI - It's not actually valid to use the contents of mperf without aperf
> according the the intel SDM:

> "Only the IA32_APERF/IA32_MPERF ratio is architecturally defined;
> software should not attach meaning to the content of the individual of IA32_APERF or IA32_MPERF MSRs."
> There is no guarantee mperf and tsc are using the same clock.

I was just bringing back code that used to be there.
Reference: fcb6a15c2e7e76d493e6f91ea889ab40e1c643a4 2014.02.03 Dirk Brandewie

I do not know of another way to calculate load, but I will try to look at
how the acpi-cpufreq driver does it.

On my computer, available clock sources are: tsc hpet acpi_pm
and it seems plenty close enough using any one of the 3.
 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-05-07  6:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-12  4:10 [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Doug Smythies
2015-04-12  4:10 ` [PATCH 1/5] intel_pstate: Add tsc collection and keep previous target pstate. Add both to trace Doug Smythies
2015-04-29 16:57   ` Kristen Carlson Accardi
2015-05-04 23:34     ` Rafael J. Wysocki
2015-04-12  4:10 ` [PATCH 2/5] intel_pstate: Use C0 time for busy calculations (again) Doug Smythies
2015-05-06 19:20   ` Kristen Carlson Accardi
2015-05-07  6:17     ` Doug Smythies
2015-04-12  4:10 ` [PATCH 3/5] intel_pstate: Calculate target pstate directly Doug Smythies
2015-04-12  4:10 ` [PATCH 4/5] intel_pstate: Compensate for intermediate durations (v2) Doug Smythies
2015-04-12  4:10 ` [PATCH 5/5] intel_pstate: Adjust default IIR filter gain Doug Smythies
2015-04-13 16:32 ` [PATCH 0/5] intel_pstate: Use C0 time, calculate target pstate directly Kristen Carlson Accardi
2015-04-13 21:16   ` Doug Smythies
2015-04-16 16:35 ` Doug Smythies
2015-04-17  0:46   ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.