[PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time
@ 2015-12-03 17:55 Philippe Longepe
  2015-12-03 17:55 ` Philippe Longepe
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Philippe Longepe @ 2015-12-03 17:55 UTC (permalink / raw)
  To: linux-pm; +Cc: srinivas.pandruvada, Philippe Longepe

From: Philippe Longepe <philippe.longepe@intel.com>

V5 : it fixes a small typo on 'Signed-off-by'

Rework on this patch description as proposed by Srinivas:

Include 1 patch from Srinivas to make the algorithm configurable and 
2 patches from me and Stephane to include the load and IOBoost 
calc_busy function

Philippe Longepe (1):
  cpufreq: intel_pstate: account for non C0 time

Srinivas Pandruvada (1):
  cpufreq: intel_pstate: configurable algorithm to get target pstate

Stephane Gasparini (1):
  cpufreq: intel_pstate: Account for IO wait time

 drivers/cpufreq/intel_pstate.c | 64 +++++++++++++++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 13 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time
  2015-12-03 17:55 [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time Philippe Longepe
@ 2015-12-03 17:55 ` Philippe Longepe
  2015-12-03 17:55 ` [PATCH V5 1/3] cpufreq: intel_pstate: configurable algorithm to get target pstate Philippe Longepe
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Philippe Longepe @ 2015-12-03 17:55 UTC (permalink / raw)
  To: linux-pm; +Cc: srinivas.pandruvada, Philippe Longepe

From: Philippe Longepe <philippe.longepe@intel.com>

V5 : it fixes a small typo on 'Signed-off-by'

Rework on this patch description as proposed by Srinivas:

Include 1 patch from Srinivas to make the algorithm configurable and 
2 patches from me and Stephane to include the load and IOBoost 
calc_busy function

Philippe Longepe (1):
  cpufreq: intel_pstate: account for non C0 time

Srinivas Pandruvada (1):
  cpufreq: intel_pstate: configurable algorithm to get target pstate

Stephane Gasparini (1):
  cpufreq: intel_pstate: Account for IO wait time

 drivers/cpufreq/intel_pstate.c | 64 +++++++++++++++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 13 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V5 1/3] cpufreq: intel_pstate: configurable algorithm to get target pstate
  2015-12-03 17:55 [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time Philippe Longepe
  2015-12-03 17:55 ` Philippe Longepe
@ 2015-12-03 17:55 ` Philippe Longepe
  2015-12-04  1:41   ` Rafael J. Wysocki
  2015-12-03 17:55 ` Philippe Longepe
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Philippe Longepe @ 2015-12-03 17:55 UTC (permalink / raw)
  To: linux-pm; +Cc: srinivas.pandruvada

From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

Current approach of one size fits all is not sustainable. Only per cpuid
based configuration, we can do now is to select different PID parameters.
But that is not enough. The target systems using different cpus have
different power and performance requirement. They may use different
algorithms to get next pstate based on their power or performance
preference. For example low powered systems may want to ramp up pstates
more conservatively than a core desktop or server platforms. A server
platforms may want to run as close to max to get better performance, than
a laptop like systems.
This change allows cpuid based selection of algorithm to get target pstate.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/cpufreq/intel_pstate.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d3b0e50..c2553a1 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -66,6 +66,7 @@ static inline int ceiling_fp(int32_t x)
 
 struct sample {
 	int32_t core_pct_busy;
+	int32_t busy_scaled;
 	u64 aperf;
 	u64 mperf;
 	u64 tsc;
@@ -133,6 +134,7 @@ struct pstate_funcs {
 	int (*get_scaling)(void);
 	void (*set)(struct cpudata*, int pstate);
 	void (*get_vid)(struct cpudata *);
+	int32_t (*get_target_pstate)(struct cpudata *);
 };
 
 struct cpu_defaults {
@@ -140,6 +142,8 @@ struct cpu_defaults {
 	struct pstate_funcs funcs;
 };
 
+static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu);
+
 static struct pstate_adjust_policy pid_params;
 static struct pstate_funcs pstate_funcs;
 static int hwp_active;
@@ -738,6 +742,7 @@ static struct cpu_defaults core_params = {
 		.get_turbo = core_get_turbo_pstate,
 		.get_scaling = core_get_scaling,
 		.set = core_set_pstate,
+		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
 
@@ -758,6 +763,7 @@ static struct cpu_defaults silvermont_params = {
 		.set = atom_set_pstate,
 		.get_scaling = silvermont_get_scaling,
 		.get_vid = atom_get_vid,
+		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
 
@@ -778,6 +784,7 @@ static struct cpu_defaults airmont_params = {
 		.set = atom_set_pstate,
 		.get_scaling = airmont_get_scaling,
 		.get_vid = atom_get_vid,
+		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
 
@@ -797,6 +804,7 @@ static struct cpu_defaults knl_params = {
 		.get_turbo = knl_get_turbo_pstate,
 		.get_scaling = core_get_scaling,
 		.set = core_set_pstate,
+		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
 
@@ -922,7 +930,7 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 	mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
 
-static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
+static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
 {
 	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
 	s64 duration_us;
@@ -960,30 +968,24 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
 		core_busy = mul_fp(core_busy, sample_ratio);
 	}
 
-	return core_busy;
+	cpu->sample.busy_scaled = core_busy;
+	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid, core_busy));
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 {
-	int32_t busy_scaled;
-	struct _pid *pid;
-	signed int ctl;
-	int from;
+	int from, target_pstate;
 	struct sample *sample;
 
 	from = cpu->pstate.current_pstate;
 
-	pid = &cpu->pid;
-	busy_scaled = intel_pstate_get_scaled_busy(cpu);
+	target_pstate = pstate_funcs.get_target_pstate(cpu);
 
-	ctl = pid_calc(pid, busy_scaled);
-
-	/* Negative values of ctl increase the pstate and vice versa */
-	intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl, true);
+	intel_pstate_set_pstate(cpu, target_pstate, true);
 
 	sample = &cpu->sample;
 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
-		fp_toint(busy_scaled),
+		fp_toint(sample->busy_scaled),
 		from,
 		cpu->pstate.current_pstate,
 		sample->mperf,
@@ -1238,6 +1240,8 @@ static void copy_cpu_funcs(struct pstate_funcs *funcs)
 	pstate_funcs.get_scaling = funcs->get_scaling;
 	pstate_funcs.set       = funcs->set;
 	pstate_funcs.get_vid   = funcs->get_vid;
+	pstate_funcs.get_target_pstate = funcs->get_target_pstate;
+
 }
 
 #if IS_ENABLED(CONFIG_ACPI)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V5 1/3] cpufreq: intel_pstate: configurable algorithm to get target pstate
  2015-12-03 17:55 [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time Philippe Longepe
  2015-12-03 17:55 ` Philippe Longepe
  2015-12-03 17:55 ` [PATCH V5 1/3] cpufreq: intel_pstate: configurable algorithm to get target pstate Philippe Longepe
@ 2015-12-03 17:55 ` Philippe Longepe
  2015-12-03 17:55 ` [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time Philippe Longepe
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Philippe Longepe @ 2015-12-03 17:55 UTC (permalink / raw)
  To: linux-pm; +Cc: srinivas.pandruvada

From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

Current approach of one size fits all is not sustainable. Only per cpuid
based configuration, we can do now is to select different PID parameters.
But that is not enough. The target systems using different cpus have
different power and performance requirement. They may use different
algorithms to get next pstate based on their power or performance
preference. For example low powered systems may want to ramp up pstates
more conservatively than a core desktop or server platforms. A server
platforms may want to run as close to max to get better performance, than
a laptop like systems.
This change allows cpuid based selection of algorithm to get target pstate.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/cpufreq/intel_pstate.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d3b0e50..c2553a1 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -66,6 +66,7 @@ static inline int ceiling_fp(int32_t x)
 
 struct sample {
 	int32_t core_pct_busy;
+	int32_t busy_scaled;
 	u64 aperf;
 	u64 mperf;
 	u64 tsc;
@@ -133,6 +134,7 @@ struct pstate_funcs {
 	int (*get_scaling)(void);
 	void (*set)(struct cpudata*, int pstate);
 	void (*get_vid)(struct cpudata *);
+	int32_t (*get_target_pstate)(struct cpudata *);
 };
 
 struct cpu_defaults {
@@ -140,6 +142,8 @@ struct cpu_defaults {
 	struct pstate_funcs funcs;
 };
 
+static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu);
+
 static struct pstate_adjust_policy pid_params;
 static struct pstate_funcs pstate_funcs;
 static int hwp_active;
@@ -738,6 +742,7 @@ static struct cpu_defaults core_params = {
 		.get_turbo = core_get_turbo_pstate,
 		.get_scaling = core_get_scaling,
 		.set = core_set_pstate,
+		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
 
@@ -758,6 +763,7 @@ static struct cpu_defaults silvermont_params = {
 		.set = atom_set_pstate,
 		.get_scaling = silvermont_get_scaling,
 		.get_vid = atom_get_vid,
+		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
 
@@ -778,6 +784,7 @@ static struct cpu_defaults airmont_params = {
 		.set = atom_set_pstate,
 		.get_scaling = airmont_get_scaling,
 		.get_vid = atom_get_vid,
+		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
 
@@ -797,6 +804,7 @@ static struct cpu_defaults knl_params = {
 		.get_turbo = knl_get_turbo_pstate,
 		.get_scaling = core_get_scaling,
 		.set = core_set_pstate,
+		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
 
@@ -922,7 +930,7 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 	mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
 
-static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
+static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
 {
 	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
 	s64 duration_us;
@@ -960,30 +968,24 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
 		core_busy = mul_fp(core_busy, sample_ratio);
 	}
 
-	return core_busy;
+	cpu->sample.busy_scaled = core_busy;
+	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid, core_busy));
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 {
-	int32_t busy_scaled;
-	struct _pid *pid;
-	signed int ctl;
-	int from;
+	int from, target_pstate;
 	struct sample *sample;
 
 	from = cpu->pstate.current_pstate;
 
-	pid = &cpu->pid;
-	busy_scaled = intel_pstate_get_scaled_busy(cpu);
+	target_pstate = pstate_funcs.get_target_pstate(cpu);
 
-	ctl = pid_calc(pid, busy_scaled);
-
-	/* Negative values of ctl increase the pstate and vice versa */
-	intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl, true);
+	intel_pstate_set_pstate(cpu, target_pstate, true);
 
 	sample = &cpu->sample;
 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
-		fp_toint(busy_scaled),
+		fp_toint(sample->busy_scaled),
 		from,
 		cpu->pstate.current_pstate,
 		sample->mperf,
@@ -1238,6 +1240,8 @@ static void copy_cpu_funcs(struct pstate_funcs *funcs)
 	pstate_funcs.get_scaling = funcs->get_scaling;
 	pstate_funcs.set       = funcs->set;
 	pstate_funcs.get_vid   = funcs->get_vid;
+	pstate_funcs.get_target_pstate = funcs->get_target_pstate;
+
 }
 
 #if IS_ENABLED(CONFIG_ACPI)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time
  2015-12-03 17:55 [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time Philippe Longepe
                   ` (2 preceding siblings ...)
  2015-12-03 17:55 ` Philippe Longepe
@ 2015-12-03 17:55 ` Philippe Longepe
  2015-12-04  2:12   ` Rafael J. Wysocki
  2015-12-03 17:55 ` Philippe Longepe
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Philippe Longepe @ 2015-12-03 17:55 UTC (permalink / raw)
  To: linux-pm; +Cc: srinivas.pandruvada, Stephane Gasparini

The current function to calculate business uses the ratio of actual
performance to max non turbo state requested during the last sample
period. This causes overestimation of busyness which results in higher
power consumption. This is a problem for low power systems.
The algorithm here uses cpu busyness to select next target P state.
The Percent Busy (or load) can be estimated as the ratio of the mperf
counter running at a constant frequency only during active periods (C0)
and the time stamp counter running at the same frequency but also during
idle. So,
Percent Busy = 100 * (delta_mperf / delta_tsc)
Use this algorithm for platforms with SoCs based on airmont and silvermont
cores.

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>
---
 drivers/cpufreq/intel_pstate.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c2553a1..2cf8bb6 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -143,6 +143,7 @@ struct cpu_defaults {
 };
 
 static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu);
+static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu);
 
 static struct pstate_adjust_policy pid_params;
 static struct pstate_funcs pstate_funcs;
@@ -763,7 +764,7 @@ static struct cpu_defaults silvermont_params = {
 		.set = atom_set_pstate,
 		.get_scaling = silvermont_get_scaling,
 		.get_vid = atom_get_vid,
-		.get_target_pstate = get_target_pstate_use_performance,
+		.get_target_pstate = get_target_pstate_use_cpu_load,
 	},
 };
 
@@ -784,7 +785,7 @@ static struct cpu_defaults airmont_params = {
 		.set = atom_set_pstate,
 		.get_scaling = airmont_get_scaling,
 		.get_vid = atom_get_vid,
-		.get_target_pstate = get_target_pstate_use_performance,
+		.get_target_pstate = get_target_pstate_use_cpu_load,
 	},
 };
 
@@ -930,6 +931,26 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 	mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
 
+static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)
+{
+	struct sample *sample = &cpu->sample;
+	int32_t cpu_load;
+
+	/*
+	 * The load can be estimated as the ratio of the mperf counter
+	 * running at a constant frequency during active periods
+	 * (C0) and the time stamp counter running at the same frequency
+	 * also during C-states.
+	 */
+	cpu_load = div64_u64(100 * sample->mperf, sample->tsc);
+
+	cpu->sample.busy_scaled = int_tofp(cpu_load);
+
+	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid,
+						cpu->sample.busy_scaled));
+}
+
+
 static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
 {
 	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time
  2015-12-03 17:55 [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time Philippe Longepe
                   ` (3 preceding siblings ...)
  2015-12-03 17:55 ` [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time Philippe Longepe
@ 2015-12-03 17:55 ` Philippe Longepe
  2015-12-03 17:55 ` [PATCH V5 3/3] cpufreq: intel_pstate: Account for IO wait time Philippe Longepe
  2015-12-03 17:55 ` Philippe Longepe
  6 siblings, 0 replies; 16+ messages in thread
From: Philippe Longepe @ 2015-12-03 17:55 UTC (permalink / raw)
  To: linux-pm; +Cc: srinivas.pandruvada, Stephane Gasparini

The current function to calculate business uses the ratio of actual
performance to max non turbo state requested during the last sample
period. This causes overestimation of busyness which results in higher
power consumption. This is a problem for low power systems.
The algorithm here uses cpu busyness to select next target P state.
The Percent Busy (or load) can be estimated as the ratio of the mperf
counter running at a constant frequency only during active periods (C0)
and the time stamp counter running at the same frequency but also during
idle. So,
Percent Busy = 100 * (delta_mperf / delta_tsc)
Use this algorithm for platforms with SoCs based on airmont and silvermont
cores.

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>
---
 drivers/cpufreq/intel_pstate.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c2553a1..2cf8bb6 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -143,6 +143,7 @@ struct cpu_defaults {
 };
 
 static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu);
+static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu);
 
 static struct pstate_adjust_policy pid_params;
 static struct pstate_funcs pstate_funcs;
@@ -763,7 +764,7 @@ static struct cpu_defaults silvermont_params = {
 		.set = atom_set_pstate,
 		.get_scaling = silvermont_get_scaling,
 		.get_vid = atom_get_vid,
-		.get_target_pstate = get_target_pstate_use_performance,
+		.get_target_pstate = get_target_pstate_use_cpu_load,
 	},
 };
 
@@ -784,7 +785,7 @@ static struct cpu_defaults airmont_params = {
 		.set = atom_set_pstate,
 		.get_scaling = airmont_get_scaling,
 		.get_vid = atom_get_vid,
-		.get_target_pstate = get_target_pstate_use_performance,
+		.get_target_pstate = get_target_pstate_use_cpu_load,
 	},
 };
 
@@ -930,6 +931,26 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 	mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
 
+static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)
+{
+	struct sample *sample = &cpu->sample;
+	int32_t cpu_load;
+
+	/*
+	 * The load can be estimated as the ratio of the mperf counter
+	 * running at a constant frequency during active periods
+	 * (C0) and the time stamp counter running at the same frequency
+	 * also during C-states.
+	 */
+	cpu_load = div64_u64(100 * sample->mperf, sample->tsc);
+
+	cpu->sample.busy_scaled = int_tofp(cpu_load);
+
+	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid,
+						cpu->sample.busy_scaled));
+}
+
+
 static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
 {
 	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V5 3/3] cpufreq: intel_pstate: Account for IO wait time
  2015-12-03 17:55 [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time Philippe Longepe
                   ` (4 preceding siblings ...)
  2015-12-03 17:55 ` Philippe Longepe
@ 2015-12-03 17:55 ` Philippe Longepe
  2015-12-04  2:27   ` Rafael J. Wysocki
  2015-12-03 17:55 ` Philippe Longepe
  6 siblings, 1 reply; 16+ messages in thread
From: Philippe Longepe @ 2015-12-03 17:55 UTC (permalink / raw)
  To: linux-pm; +Cc: srinivas.pandruvada, Stephane Gasparini, Philippe Longepe

From: Stephane Gasparini <stephane.gasparini@intel.com>

To improve IO bound work, account IO wait time in calculating
CPU busy. This change gets IO wait time using get_cpu_iowait_time_us,
and converts time into number of IO cycles spent at max non turbo
frequency. This IO cycle count is added to mperf value to account
for IO wait time.

Signed-off-by: Philippe Longepe <philippe.longepe@intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 2cf8bb6..fb92402 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -114,6 +114,7 @@ struct cpudata {
 	u64	prev_mperf;
 	u64	prev_tsc;
 	struct sample sample;
+	u64 prev_cummulative_iowait;
 };
 
 static struct cpudata **all_cpu_data;
@@ -934,16 +935,28 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)
 {
 	struct sample *sample = &cpu->sample;
+	u64 cummulative_iowait, delta_iowait_us;
+	u64 delta_iowait_mperf;
+	u64 mperf, now;
 	int32_t cpu_load;
 
+	cummulative_iowait = get_cpu_iowait_time_us(cpu->cpu, &now);
+
+	/* Convert iowait time into number of IO cycles spent at max_freq */
+	delta_iowait_us = cummulative_iowait - cpu->prev_cummulative_iowait;
+	delta_iowait_mperf = div64_u64(delta_iowait_us * cpu->pstate.scaling *
+		cpu->pstate.max_pstate, 1000);
+
+	mperf = cpu->sample.mperf + delta_iowait_mperf;
+	cpu->prev_cummulative_iowait = cummulative_iowait;
+
 	/*
 	 * The load can be estimated as the ratio of the mperf counter
 	 * running at a constant frequency during active periods
 	 * (C0) and the time stamp counter running at the same frequency
 	 * also during C-states.
 	 */
-	cpu_load = div64_u64(100 * sample->mperf, sample->tsc);
-
+	cpu_load = div64_u64(100 * mperf, sample->tsc);
 	cpu->sample.busy_scaled = int_tofp(cpu_load);
 
 	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V5 3/3] cpufreq: intel_pstate: Account for IO wait time
  2015-12-03 17:55 [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time Philippe Longepe
                   ` (5 preceding siblings ...)
  2015-12-03 17:55 ` [PATCH V5 3/3] cpufreq: intel_pstate: Account for IO wait time Philippe Longepe
@ 2015-12-03 17:55 ` Philippe Longepe
  6 siblings, 0 replies; 16+ messages in thread
From: Philippe Longepe @ 2015-12-03 17:55 UTC (permalink / raw)
  To: linux-pm; +Cc: srinivas.pandruvada, Stephane Gasparini, Philippe Longepe

From: Stephane Gasparini <stephane.gasparini@intel.com>

To improve IO bound work, account IO wait time in calculating
CPU busy. This change gets IO wait time using get_cpu_iowait_time_us,
and converts time into number of IO cycles spent at max non turbo
frequency. This IO cycle count is added to mperf value to account
for IO wait time.

Signed-off-by: Philippe Longepe <philippe.longepe@intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 2cf8bb6..fb92402 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -114,6 +114,7 @@ struct cpudata {
 	u64	prev_mperf;
 	u64	prev_tsc;
 	struct sample sample;
+	u64 prev_cummulative_iowait;
 };
 
 static struct cpudata **all_cpu_data;
@@ -934,16 +935,28 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)
 {
 	struct sample *sample = &cpu->sample;
+	u64 cummulative_iowait, delta_iowait_us;
+	u64 delta_iowait_mperf;
+	u64 mperf, now;
 	int32_t cpu_load;
 
+	cummulative_iowait = get_cpu_iowait_time_us(cpu->cpu, &now);
+
+	/* Convert iowait time into number of IO cycles spent at max_freq */
+	delta_iowait_us = cummulative_iowait - cpu->prev_cummulative_iowait;
+	delta_iowait_mperf = div64_u64(delta_iowait_us * cpu->pstate.scaling *
+		cpu->pstate.max_pstate, 1000);
+
+	mperf = cpu->sample.mperf + delta_iowait_mperf;
+	cpu->prev_cummulative_iowait = cummulative_iowait;
+
 	/*
 	 * The load can be estimated as the ratio of the mperf counter
 	 * running at a constant frequency during active periods
 	 * (C0) and the time stamp counter running at the same frequency
 	 * also during C-states.
 	 */
-	cpu_load = div64_u64(100 * sample->mperf, sample->tsc);
-
+	cpu_load = div64_u64(100 * mperf, sample->tsc);
 	cpu->sample.busy_scaled = int_tofp(cpu_load);
 
 	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V5 1/3] cpufreq: intel_pstate: configurable algorithm to get target pstate
  2015-12-03 17:55 ` [PATCH V5 1/3] cpufreq: intel_pstate: configurable algorithm to get target pstate Philippe Longepe
@ 2015-12-04  1:41   ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2015-12-04  1:41 UTC (permalink / raw)
  To: Philippe Longepe; +Cc: linux-pm, srinivas.pandruvada

On Thursday, December 03, 2015 06:55:54 PM Philippe Longepe wrote:
> From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

If you submit somebody else's patches (like in this case), please always
add a Sign-off-by tag from yourself to them in addition to the original
Sign-off-by tag from the author.  Otherwise, they can't be applied for
formal reasons.

> Current approach of one size fits all is not sustainable. Only per cpuid
> based configuration, we can do now is to select different PID parameters.
> But that is not enough. 

The above part of the changelog can go away in my view.  It doesn't add
anything useful.

> The target systems using different cpus have

"Target systems ..."

> different power and performance requirement.

"requrements" (s missing).

> They may use different
> algorithms to get next pstate based on their power or performance

"the next P-state"

> preference. For example low powered systems may want to ramp up pstates
> more conservatively than a core desktop or server platforms.

What about: "For example, power-constrained systems may not want to use
high-performance P-states as aggressively as a full-size desktop or a server
platform."

> A server
> platforms may want to run as close to max to get better performance, than
> a laptop like systems.

"A server platform may want to run close to the max to achieve better performance,
while laptop-like systems may prefer sacrificing performance for longer
battery lifes."

> This change allows cpuid based selection of algorithm to get target pstate.

"For the above reasons, modify intel_pstate to allow the target P-state
selection algorithm to be depend on the CPU ID."

> 
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

The change itself looks reasonable to me.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time
  2015-12-03 17:55 ` [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time Philippe Longepe
@ 2015-12-04  2:12   ` Rafael J. Wysocki
  2015-12-04 14:33     ` Philippe Longepe
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2015-12-04  2:12 UTC (permalink / raw)
  To: Philippe Longepe; +Cc: linux-pm, srinivas.pandruvada, Stephane Gasparini

On Thursday, December 03, 2015 06:55:56 PM Philippe Longepe wrote:
> The current function to calculate business uses the ratio of actual

"Utilization" would be a better word here.

> performance to max non turbo state requested during the last sample
> period.

What it really does is to use the measured percentage of busy cycles
scaled by the ratio of the current P-state to the max available
non-turbo one.

> This causes overestimation of busyness which results in higher
> power consumption.

"This leads to an overestimation of utilization which causes higher-performance
P-states to be selected more often and that leads to increased energy
consumption."  [Power can't be consumed.]

> This is a problem for low power systems.

"This is a problem for low-power systems, so it is better to use a different
utilization calculation algorithm for them."

> The algorithm here uses cpu busyness to select next target P state.
> The Percent Busy (or load) can be estimated as the ratio of the mperf
> counter running at a constant frequency only during active periods (C0)
> and the time stamp counter running at the same frequency but also during
> idle.

"Namely, the Percent Busy value (or load) can be estimated as the ratio of the
MPERF counter that runs at a constant rate only during active periods (C0) to
the time stamp counter (TSC) that also runs (at the same rate) during idle. 
That is"

> Percent Busy = 100 * (delta_mperf / delta_tsc)

> Use this algorithm for platforms with SoCs based on airmont and silvermont
> cores.

"based on the Airmont and Silvermont Atom cores".

> 
> Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
> Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>

Is Stephane a co-author of the patch?  Or if not, then what's his role?

> ---
>  drivers/cpufreq/intel_pstate.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index c2553a1..2cf8bb6 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -143,6 +143,7 @@ struct cpu_defaults {
>  };
>  
>  static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu);
> +static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu);
>  
>  static struct pstate_adjust_policy pid_params;
>  static struct pstate_funcs pstate_funcs;
> @@ -763,7 +764,7 @@ static struct cpu_defaults silvermont_params = {
>  		.set = atom_set_pstate,
>  		.get_scaling = silvermont_get_scaling,
>  		.get_vid = atom_get_vid,
> -		.get_target_pstate = get_target_pstate_use_performance,
> +		.get_target_pstate = get_target_pstate_use_cpu_load,
>  	},
>  };
>  
> @@ -784,7 +785,7 @@ static struct cpu_defaults airmont_params = {
>  		.set = atom_set_pstate,
>  		.get_scaling = airmont_get_scaling,
>  		.get_vid = atom_get_vid,
> -		.get_target_pstate = get_target_pstate_use_performance,
> +		.get_target_pstate = get_target_pstate_use_cpu_load,
>  	},
>  };
>  
> @@ -930,6 +931,26 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
>  	mod_timer_pinned(&cpu->timer, jiffies + delay);
>  }
>  
> +static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)

The function name might have been better, but well.

> +{
> +	struct sample *sample = &cpu->sample;
> +	int32_t cpu_load;
> +
> +	/*
> +	 * The load can be estimated as the ratio of the mperf counter
> +	 * running at a constant frequency during active periods
> +	 * (C0) and the time stamp counter running at the same frequency
> +	 * also during C-states.
> +	 */
> +	cpu_load = div64_u64(100 * sample->mperf, sample->tsc);
> +
> +	cpu->sample.busy_scaled = int_tofp(cpu_load);

This is questionable somewhat.

Why do you convert the result of an integer calculation to fixed point
instead of doing the calculation in fixed point in the first place?

> +
> +	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid,
> +						cpu->sample.busy_scaled));
> +}
> +
> +
>  static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
>  {
>  	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
> 

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V5 3/3] cpufreq: intel_pstate: Account for IO wait time
  2015-12-03 17:55 ` [PATCH V5 3/3] cpufreq: intel_pstate: Account for IO wait time Philippe Longepe
@ 2015-12-04  2:27   ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2015-12-04  2:27 UTC (permalink / raw)
  To: Philippe Longepe
  Cc: linux-pm, srinivas.pandruvada, Stephane Gasparini, Philippe Longepe

On Thursday, December 03, 2015 06:55:58 PM Philippe Longepe wrote:
> From: Stephane Gasparini <stephane.gasparini@intel.com>
> 
> To improve IO bound work,

Improve how?

> account IO wait time in calculating
> CPU busy. This change gets IO wait time using get_cpu_iowait_time_us,

"uses get_cpu_iowait_time_us() to obtain the IO wait time value "

> and converts time into number of IO cycles spent at max non turbo

Surely CPU cycles spent waiting on IO?

> frequency. This IO cycle count is added to mperf value to account
> for IO wait time.

This also should say that the trick is only used for Atom.

What about:

"The CPU utilization calculation algorithm used for the Airmont/Silvermont
Atom cores is modified by adding that "IO cycles count" to the MPERF value
to account for the IO wait time."

You should also say what's the expected impact of this change.

> 
> Signed-off-by: Philippe Longepe <philippe.longepe@intel.com>
> Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 2cf8bb6..fb92402 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -114,6 +114,7 @@ struct cpudata {
>  	u64	prev_mperf;
>  	u64	prev_tsc;
>  	struct sample sample;
> +	u64 prev_cummulative_iowait;
>  };
>  
>  static struct cpudata **all_cpu_data;
> @@ -934,16 +935,28 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
>  static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)
>  {
>  	struct sample *sample = &cpu->sample;
> +	u64 cummulative_iowait, delta_iowait_us;
> +	u64 delta_iowait_mperf;
> +	u64 mperf, now;
>  	int32_t cpu_load;
>  
> +	cummulative_iowait = get_cpu_iowait_time_us(cpu->cpu, &now);
> +
> +	/* Convert iowait time into number of IO cycles spent at max_freq */

It would be good to say why this is done here in the comment.

> +	delta_iowait_us = cummulative_iowait - cpu->prev_cummulative_iowait;
> +	delta_iowait_mperf = div64_u64(delta_iowait_us * cpu->pstate.scaling *
> +		cpu->pstate.max_pstate, 1000);

MSEC_PER_SEC should be used here if I'm not mistaken.

> +
> +	mperf = cpu->sample.mperf + delta_iowait_mperf;
> +	cpu->prev_cummulative_iowait = cummulative_iowait;
> +
>  	/*
>  	 * The load can be estimated as the ratio of the mperf counter
>  	 * running at a constant frequency during active periods
>  	 * (C0) and the time stamp counter running at the same frequency
>  	 * also during C-states.
>  	 */
> -	cpu_load = div64_u64(100 * sample->mperf, sample->tsc);
> -
> +	cpu_load = div64_u64(100 * mperf, sample->tsc);
>  	cpu->sample.busy_scaled = int_tofp(cpu_load);
>  
>  	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid,
> 

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time
  2015-12-04  2:12   ` Rafael J. Wysocki
@ 2015-12-04 14:33     ` Philippe Longepe
  2015-12-04 17:39       ` Srinivas Pandruvada
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Longepe @ 2015-12-04 14:33 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, srinivas.pandruvada, Stephane Gasparini

Thank you Rafael,

I'll try to sent a new patch set taking your remarks into account for 
these series before tonight.

Also, do you think we could have in some corner case a division by zero 
when prev_tsc = tsc ?

Normally it should never happen but just in case, I think we should 
complete the following test:

if ((cpu->prev_mperf == mperf) || (cpu->prev_tsc == tsc)) {
...

Thx & Regards

Philippe


On 04/12/2015 03:12, Rafael J. Wysocki wrote:
> On Thursday, December 03, 2015 06:55:56 PM Philippe Longepe wrote:
>> The current function to calculate business uses the ratio of actual
> "Utilization" would be a better word here.
>
>> performance to max non turbo state requested during the last sample
>> period.
> What it really does is to use the measured percentage of busy cycles
> scaled by the ratio of the current P-state to the max available
> non-turbo one.
>
>> This causes overestimation of busyness which results in higher
>> power consumption.
> "This leads to an overestimation of utilization which causes higher-performance
> P-states to be selected more often and that leads to increased energy
> consumption."  [Power can't be consumed.]
>
>> This is a problem for low power systems.
> "This is a problem for low-power systems, so it is better to use a different
> utilization calculation algorithm for them."
>
>> The algorithm here uses cpu busyness to select next target P state.
>> The Percent Busy (or load) can be estimated as the ratio of the mperf
>> counter running at a constant frequency only during active periods (C0)
>> and the time stamp counter running at the same frequency but also during
>> idle.
> "Namely, the Percent Busy value (or load) can be estimated as the ratio of the
> MPERF counter that runs at a constant rate only during active periods (C0) to
> the time stamp counter (TSC) that also runs (at the same rate) during idle.
> That is"
>
>> Percent Busy = 100 * (delta_mperf / delta_tsc)
>> Use this algorithm for platforms with SoCs based on airmont and silvermont
>> cores.
> "based on the Airmont and Silvermont Atom cores".
>
>> Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
>> Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>
> Is Stephane a co-author of the patch?  Or if not, then what's his role?
>
>> ---
>>   drivers/cpufreq/intel_pstate.c | 25 +++++++++++++++++++++++--
>>   1 file changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index c2553a1..2cf8bb6 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -143,6 +143,7 @@ struct cpu_defaults {
>>   };
>>   
>>   static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu);
>> +static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu);
>>   
>>   static struct pstate_adjust_policy pid_params;
>>   static struct pstate_funcs pstate_funcs;
>> @@ -763,7 +764,7 @@ static struct cpu_defaults silvermont_params = {
>>   		.set = atom_set_pstate,
>>   		.get_scaling = silvermont_get_scaling,
>>   		.get_vid = atom_get_vid,
>> -		.get_target_pstate = get_target_pstate_use_performance,
>> +		.get_target_pstate = get_target_pstate_use_cpu_load,
>>   	},
>>   };
>>   
>> @@ -784,7 +785,7 @@ static struct cpu_defaults airmont_params = {
>>   		.set = atom_set_pstate,
>>   		.get_scaling = airmont_get_scaling,
>>   		.get_vid = atom_get_vid,
>> -		.get_target_pstate = get_target_pstate_use_performance,
>> +		.get_target_pstate = get_target_pstate_use_cpu_load,
>>   	},
>>   };
>>   
>> @@ -930,6 +931,26 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
>>   	mod_timer_pinned(&cpu->timer, jiffies + delay);
>>   }
>>   
>> +static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)
> The function name might have been better, but well.
>
>> +{
>> +	struct sample *sample = &cpu->sample;
>> +	int32_t cpu_load;
>> +
>> +	/*
>> +	 * The load can be estimated as the ratio of the mperf counter
>> +	 * running at a constant frequency during active periods
>> +	 * (C0) and the time stamp counter running at the same frequency
>> +	 * also during C-states.
>> +	 */
>> +	cpu_load = div64_u64(100 * sample->mperf, sample->tsc);
>> +
>> +	cpu->sample.busy_scaled = int_tofp(cpu_load);
> This is questionable somewhat.
>
> Why do you convert the result of an integer calculation to fixed point
> instead of doing the calculation in fixed point in the first place?
>
>> +
>> +	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid,
>> +						cpu->sample.busy_scaled));
>> +}
>> +
>> +
>>   static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
>>   {
>>   	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
>>
> Thanks,
> Rafael
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time
  2015-12-04 14:33     ` Philippe Longepe
@ 2015-12-04 17:39       ` Srinivas Pandruvada
  2015-12-04 23:01         ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Srinivas Pandruvada @ 2015-12-04 17:39 UTC (permalink / raw)
  To: Philippe Longepe; +Cc: Rafael J. Wysocki, linux-pm, Stephane Gasparini

On Fri, 2015-12-04 at 15:33 +0100, Philippe Longepe wrote:
> Thank you Rafael,
> 
> I'll try to sent a new patch set taking your remarks into account for 
> these series before tonight.
> 
> Also, do you think we could have in some corner case a division by zero 
> when prev_tsc = tsc ?
> 
> Normally it should never happen but just in case, I think we should 
> complete the following test:
> 
> if ((cpu->prev_mperf == mperf) || (cpu->prev_tsc == tsc)) {
> ...
On Knight landing platform the first one is possible and second one
should never happen. But this function will never be called for such
platforms.

> 
> Thx & Regards
> 
> Philippe
> 
> 
> On 04/12/2015 03:12, Rafael J. Wysocki wrote:
> > On Thursday, December 03, 2015 06:55:56 PM Philippe Longepe wrote:
> >> The current function to calculate business uses the ratio of actual
> > "Utilization" would be a better word here.
> >
> >> performance to max non turbo state requested during the last sample
> >> period.
> > What it really does is to use the measured percentage of busy cycles
> > scaled by the ratio of the current P-state to the max available
> > non-turbo one.
> >
> >> This causes overestimation of busyness which results in higher
> >> power consumption.
> > "This leads to an overestimation of utilization which causes higher-performance
> > P-states to be selected more often and that leads to increased energy
> > consumption."  [Power can't be consumed.]
> >
> >> This is a problem for low power systems.
> > "This is a problem for low-power systems, so it is better to use a different
> > utilization calculation algorithm for them."
> >
> >> The algorithm here uses cpu busyness to select next target P state.
> >> The Percent Busy (or load) can be estimated as the ratio of the mperf
> >> counter running at a constant frequency only during active periods (C0)
> >> and the time stamp counter running at the same frequency but also during
> >> idle.
> > "Namely, the Percent Busy value (or load) can be estimated as the ratio of the
> > MPERF counter that runs at a constant rate only during active periods (C0) to
> > the time stamp counter (TSC) that also runs (at the same rate) during idle.
> > That is"
> >
> >> Percent Busy = 100 * (delta_mperf / delta_tsc)
> >> Use this algorithm for platforms with SoCs based on airmont and silvermont
> >> cores.
> > "based on the Airmont and Silvermont Atom cores".
> >
> >> Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
> >> Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>
> > Is Stephane a co-author of the patch?  Or if not, then what's his role?
> >
> >> ---
> >>   drivers/cpufreq/intel_pstate.c | 25 +++++++++++++++++++++++--
> >>   1 file changed, 23 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> >> index c2553a1..2cf8bb6 100644
> >> --- a/drivers/cpufreq/intel_pstate.c
> >> +++ b/drivers/cpufreq/intel_pstate.c
> >> @@ -143,6 +143,7 @@ struct cpu_defaults {
> >>   };
> >>   
> >>   static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu);
> >> +static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu);
> >>   
> >>   static struct pstate_adjust_policy pid_params;
> >>   static struct pstate_funcs pstate_funcs;
> >> @@ -763,7 +764,7 @@ static struct cpu_defaults silvermont_params = {
> >>   		.set = atom_set_pstate,
> >>   		.get_scaling = silvermont_get_scaling,
> >>   		.get_vid = atom_get_vid,
> >> -		.get_target_pstate = get_target_pstate_use_performance,
> >> +		.get_target_pstate = get_target_pstate_use_cpu_load,
> >>   	},
> >>   };
> >>   
> >> @@ -784,7 +785,7 @@ static struct cpu_defaults airmont_params = {
> >>   		.set = atom_set_pstate,
> >>   		.get_scaling = airmont_get_scaling,
> >>   		.get_vid = atom_get_vid,
> >> -		.get_target_pstate = get_target_pstate_use_performance,
> >> +		.get_target_pstate = get_target_pstate_use_cpu_load,
> >>   	},
> >>   };
> >>   
> >> @@ -930,6 +931,26 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
> >>   	mod_timer_pinned(&cpu->timer, jiffies + delay);
> >>   }
> >>   
> >> +static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)
> > The function name might have been better, but well.
> >
> >> +{
> >> +	struct sample *sample = &cpu->sample;
> >> +	int32_t cpu_load;
> >> +
> >> +	/*
> >> +	 * The load can be estimated as the ratio of the mperf counter
> >> +	 * running at a constant frequency during active periods
> >> +	 * (C0) and the time stamp counter running at the same frequency
> >> +	 * also during C-states.
> >> +	 */
> >> +	cpu_load = div64_u64(100 * sample->mperf, sample->tsc);
> >> +
> >> +	cpu->sample.busy_scaled = int_tofp(cpu_load);
> > This is questionable somewhat.
> >
> > Why do you convert the result of an integer calculation to fixed point
> > instead of doing the calculation in fixed point in the first place?
> >
> >> +
> >> +	return (cpu->pstate.current_pstate - pid_calc(&cpu->pid,
> >> +						cpu->sample.busy_scaled));
> >> +}
> >> +
> >> +
> >>   static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
> >>   {
> >>   	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
> >>
> > Thanks,
> > Rafael
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time
  2015-12-04 23:01         ` Rafael J. Wysocki
@ 2015-12-04 22:48           ` Srinivas Pandruvada
  2015-12-04 23:16             ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Srinivas Pandruvada @ 2015-12-04 22:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Philippe Longepe, linux-pm, Stephane Gasparini

On Sat, 2015-12-05 at 00:01 +0100, Rafael J. Wysocki wrote:
> On Friday, December 04, 2015 09:39:03 AM Srinivas Pandruvada wrote:
> > On Fri, 2015-12-04 at 15:33 +0100, Philippe Longepe wrote:
> > > Thank you Rafael,
> > > 
> > > I'll try to sent a new patch set taking your remarks into account for 
> > > these series before tonight.
> > > 
> > > Also, do you think we could have in some corner case a division by zero 
> > > when prev_tsc = tsc ?
> > > 
> > > Normally it should never happen but just in case, I think we should 
> > > complete the following test:
> > > 
> > > if ((cpu->prev_mperf == mperf) || (cpu->prev_tsc == tsc)) {
> > > ...
> > On Knight landing platform the first one is possible and second one
> > should never happen. But this function will never be called for such
> > platforms.
> 
> Still, in case there's a future platform where those counters don't advance
> for extended time intervals, would it really hurt to add a check for that
> and skip the sample when it triggers?
Doesn't hurt.

Thanks,
Srinivas

> 
> Thanks,
> Rafael
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time
  2015-12-04 17:39       ` Srinivas Pandruvada
@ 2015-12-04 23:01         ` Rafael J. Wysocki
  2015-12-04 22:48           ` Srinivas Pandruvada
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2015-12-04 23:01 UTC (permalink / raw)
  To: Srinivas Pandruvada; +Cc: Philippe Longepe, linux-pm, Stephane Gasparini

On Friday, December 04, 2015 09:39:03 AM Srinivas Pandruvada wrote:
> On Fri, 2015-12-04 at 15:33 +0100, Philippe Longepe wrote:
> > Thank you Rafael,
> > 
> > I'll try to sent a new patch set taking your remarks into account for 
> > these series before tonight.
> > 
> > Also, do you think we could have in some corner case a division by zero 
> > when prev_tsc = tsc ?
> > 
> > Normally it should never happen but just in case, I think we should 
> > complete the following test:
> > 
> > if ((cpu->prev_mperf == mperf) || (cpu->prev_tsc == tsc)) {
> > ...
> On Knight landing platform the first one is possible and second one
> should never happen. But this function will never be called for such
> platforms.

Still, in case there's a future platform where those counters don't advance
for extended time intervals, would it really hurt to add a check for that
and skip the sample when it triggers?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time
  2015-12-04 22:48           ` Srinivas Pandruvada
@ 2015-12-04 23:16             ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2015-12-04 23:16 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Philippe Longepe, linux-pm, Stephane Gasparini

On Fri, Dec 4, 2015 at 11:48 PM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> On Sat, 2015-12-05 at 00:01 +0100, Rafael J. Wysocki wrote:
>> On Friday, December 04, 2015 09:39:03 AM Srinivas Pandruvada wrote:
>> > On Fri, 2015-12-04 at 15:33 +0100, Philippe Longepe wrote:
>> > > Thank you Rafael,
>> > >
>> > > I'll try to sent a new patch set taking your remarks into account for
>> > > these series before tonight.
>> > >
>> > > Also, do you think we could have in some corner case a division by zero
>> > > when prev_tsc = tsc ?
>> > >
>> > > Normally it should never happen but just in case, I think we should
>> > > complete the following test:
>> > >
>> > > if ((cpu->prev_mperf == mperf) || (cpu->prev_tsc == tsc)) {
>> > > ...
>> > On Knight landing platform the first one is possible and second one
>> > should never happen. But this function will never be called for such
>> > platforms.
>>
>> Still, in case there's a future platform where those counters don't advance
>> for extended time intervals, would it really hurt to add a check for that
>> and skip the sample when it triggers?
> Doesn't hurt.

OK

Maybe checking the TSC would be overkill as there would be other
significant problems if it didn't advance, but IMO it would be good to
have a sanity check for the MPERF in there.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-12-04 23:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03 17:55 [PATCH V5 0/3] cpufreq: intel_pstate: account non C0 time Philippe Longepe
2015-12-03 17:55 ` Philippe Longepe
2015-12-03 17:55 ` [PATCH V5 1/3] cpufreq: intel_pstate: configurable algorithm to get target pstate Philippe Longepe
2015-12-04  1:41   ` Rafael J. Wysocki
2015-12-03 17:55 ` Philippe Longepe
2015-12-03 17:55 ` [PATCH V5 2/3] cpufreq: intel_pstate: account for non C0 time Philippe Longepe
2015-12-04  2:12   ` Rafael J. Wysocki
2015-12-04 14:33     ` Philippe Longepe
2015-12-04 17:39       ` Srinivas Pandruvada
2015-12-04 23:01         ` Rafael J. Wysocki
2015-12-04 22:48           ` Srinivas Pandruvada
2015-12-04 23:16             ` Rafael J. Wysocki
2015-12-03 17:55 ` Philippe Longepe
2015-12-03 17:55 ` [PATCH V5 3/3] cpufreq: intel_pstate: Account for IO wait time Philippe Longepe
2015-12-04  2:27   ` Rafael J. Wysocki
2015-12-03 17:55 ` Philippe Longepe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.