[PATCH 0/3] intel_pstate: Improvements related to the APERF/MPERF computation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] intel_pstate: Improvements related to the APERF/MPERF computation
@ 2016-05-06 23:42 Rafael J. Wysocki
  2016-05-06 23:44 ` [PATCH 1/3] intel_pstate: Clarify average performance computation Rafael J. Wysocki
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-06 23:42 UTC (permalink / raw)
  To: Linux PM list; +Cc: Srinivas Pandruvada, Linux Kernel Mailing List

Hi,

As per the subject, on top of the current linux-next branch of linux-pm.git.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] intel_pstate: Clarify average performance computation
  2016-05-06 23:42 [PATCH 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
@ 2016-05-06 23:44 ` Rafael J. Wysocki
  2016-05-10  1:18   ` Srinivas Pandruvada
  2016-05-06 23:45 ` [PATCH 2/3] intel_pstate: Use sample.core_avg_perf in get_avg_pstate() Rafael J. Wysocki
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-06 23:44 UTC (permalink / raw)
  To: Linux PM list; +Cc: Srinivas Pandruvada, Linux Kernel Mailing List

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utilization of the core as suggested by its name
which is confusing.

For this reason, change the name of that field to core_avg_perf
and rename the function that computes its value accordingly.

Also notice that it would be more useful if it was a "raw" fraction
rather than percentage, so change its meaning too and update the
code using it accordingly (it is better to change the name of
the field along with its meaning in one go than to make those
two changes separately, as that would likely lead to more
confusion).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |   24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -72,10 +72,10 @@ static inline int ceiling_fp(int32_t x)
 
 /**
  * struct sample -	Store performance sample
- * @core_pct_busy:	Ratio of APERF/MPERF in percent, which is actual
+ * @core_avg_perf:	Ratio of APERF/MPERF which is the actual average
  *			performance during last sample period
  * @busy_scaled:	Scaled busy value which is used to calculate next
- *			P state. This can be different than core_pct_busy
+ *			P state. This can be different than core_avg_perf
  *			to account for cpu idle period
  * @aperf:		Difference of actual performance frequency clock count
  *			read from APERF MSR between last and current sample
@@ -90,7 +90,7 @@ static inline int ceiling_fp(int32_t x)
  * data for choosing next P State.
  */
 struct sample {
-	int32_t core_pct_busy;
+	int32_t core_avg_perf;
 	int32_t busy_scaled;
 	u64 aperf;
 	u64 mperf;
@@ -1147,15 +1147,11 @@ static void intel_pstate_get_cpu_pstates
 	intel_pstate_set_min_pstate(cpu);
 }
 
-static inline void intel_pstate_calc_busy(struct cpudata *cpu)
+static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
 {
 	struct sample *sample = &cpu->sample;
-	int64_t core_pct;
 
-	core_pct = sample->aperf * int_tofp(100);
-	core_pct = div64_u64(core_pct, sample->mperf);
-
-	sample->core_pct_busy = (int32_t)core_pct;
+	sample->core_avg_perf = div_fp(sample->aperf, sample->mperf);
 }
 
 static inline bool intel_pstate_sample(struct cpudata *cpu, u64 time)
@@ -1198,9 +1194,9 @@ static inline bool intel_pstate_sample(s
 
 static inline int32_t get_avg_frequency(struct cpudata *cpu)
 {
-	return fp_toint(mul_fp(cpu->sample.core_pct_busy,
+	return fp_toint(mul_fp(cpu->sample.core_avg_perf,
 			       int_tofp(cpu->pstate.max_pstate_physical *
-						cpu->pstate.scaling / 100)));
+						cpu->pstate.scaling)));
 }
 
 static inline int32_t get_avg_pstate(struct cpudata *cpu)
@@ -1260,7 +1256,7 @@ static inline int32_t get_target_pstate_
 	 * period. The result will be a percentage of busy at a
 	 * specified pstate.
 	 */
-	core_busy = cpu->sample.core_pct_busy;
+	core_busy = 100 * cpu->sample.core_avg_perf;
 	max_pstate = cpu->pstate.max_pstate_physical;
 	current_pstate = cpu->pstate.current_pstate;
 	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
@@ -1312,7 +1308,7 @@ static inline void intel_pstate_adjust_b
 	intel_pstate_update_pstate(cpu, target_pstate);
 
 	sample = &cpu->sample;
-	trace_pstate_sample(fp_toint(sample->core_pct_busy),
+	trace_pstate_sample(fp_toint(100 * sample->core_avg_perf),
 		fp_toint(sample->busy_scaled),
 		from,
 		cpu->pstate.current_pstate,
@@ -1332,7 +1328,7 @@ static void intel_pstate_update_util(str
 		bool sample_taken = intel_pstate_sample(cpu, time);
 
 		if (sample_taken) {
-			intel_pstate_calc_busy(cpu);
+			intel_pstate_calc_avg_perf(cpu);
 			if (!hwp_active)
 				intel_pstate_adjust_busy_pstate(cpu);
 		}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
  2016-05-06 23:42 [PATCH 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
  2016-05-06 23:44 ` [PATCH 1/3] intel_pstate: Clarify average performance computation Rafael J. Wysocki
@ 2016-05-06 23:45 ` Rafael J. Wysocki
  2016-05-06 23:47 ` [PATCH 3/3] intel_pstate: Clean up get_target_pstate_use_performance() Rafael J. Wysocki
  2016-05-11 17:06 ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
  3 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-06 23:45 UTC (permalink / raw)
  To: Linux PM list; +Cc: Srinivas Pandruvada, Linux Kernel Mailing List

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Notice that get_avg_pstate() can use sample.core_avg_perf instead of
carrying the same division again, so make it do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1201,8 +1201,7 @@ static inline int32_t get_avg_frequency(
 
 static inline int32_t get_avg_pstate(struct cpudata *cpu)
 {
-	return div64_u64(cpu->pstate.max_pstate_physical * cpu->sample.aperf,
-			 cpu->sample.mperf);
+	return fp_toint(cpu->pstate.max_pstate_physical * cpu->sample.core_avg_perf);
 }
 
 static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] intel_pstate: Clean up get_target_pstate_use_performance()
  2016-05-06 23:42 [PATCH 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
  2016-05-06 23:44 ` [PATCH 1/3] intel_pstate: Clarify average performance computation Rafael J. Wysocki
  2016-05-06 23:45 ` [PATCH 2/3] intel_pstate: Use sample.core_avg_perf in get_avg_pstate() Rafael J. Wysocki
@ 2016-05-06 23:47 ` Rafael J. Wysocki
  2016-05-10  1:24   ` Srinivas Pandruvada
  2016-05-11 17:06 ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
  3 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-06 23:47 UTC (permalink / raw)
  To: Linux PM list; +Cc: Srinivas Pandruvada, Linux Kernel Mailing List

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The way the code in get_target_pstate_use_performance() is arranged
and the comments in there are totally confusing, so modify them to
reflect what's going on.

The results of the computations should be the same as before.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |   32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1241,43 +1241,37 @@ static inline int32_t get_target_pstate_
 
 static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
 {
-	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
+	int32_t perf_scaled, sample_ratio;
 	u64 duration_ns;
 
 	/*
-	 * core_busy is the ratio of actual performance to max
-	 * max_pstate is the max non turbo pstate available
-	 * current_pstate was the pstate that was requested during
-	 * 	the last sample period.
-	 *
-	 * We normalize core_busy, which was our actual percent
-	 * performance to what we requested during the last sample
-	 * period. The result will be a percentage of busy at a
-	 * specified pstate.
+	 * perf_scaled is the average performance during the last sampling
+	 * period (in percent) scaled by the ratio of the P-state requested
+	 * last time to the maximum P-state.  That measures the system's
+	 * response to the previous P-state selection.
 	 */
-	core_busy = 100 * cpu->sample.core_avg_perf;
-	max_pstate = cpu->pstate.max_pstate_physical;
-	current_pstate = cpu->pstate.current_pstate;
-	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
+	perf_scaled = div_fp(cpu->pstate.max_pstate_physical,
+			     cpu->pstate.current_pstate);
+	perf_scaled = mul_fp(perf_scaled, 100 * cpu->sample.core_avg_perf);
 
 	/*
 	 * Since our utilization update callback will not run unless we are
 	 * in C0, check if the actual elapsed time is significantly greater (3x)
 	 * than our sample interval.  If it is, then we were idle for a long
-	 * enough period of time to adjust our busyness.
+	 * enough period of time to adjust our performance metric.
 	 */
 	duration_ns = cpu->sample.time - cpu->last_sample_time;
 	if ((s64)duration_ns > pid_params.sample_rate_ns * 3) {
 		sample_ratio = div_fp(pid_params.sample_rate_ns, duration_ns);
-		core_busy = mul_fp(core_busy, sample_ratio);
+		perf_scaled = mul_fp(perf_scaled, sample_ratio);
 	} else {
 		sample_ratio = div_fp(100 * cpu->sample.mperf, cpu->sample.tsc);
 		if (sample_ratio < int_tofp(1))
-			core_busy = 0;
+			perf_scaled = 0;
 	}
 
-	cpu->sample.busy_scaled = core_busy;
-	return cpu->pstate.current_pstate - pid_calc(&cpu->pid, core_busy);
+	cpu->sample.busy_scaled = perf_scaled;
+	return cpu->pstate.current_pstate - pid_calc(&cpu->pid, perf_scaled);
 }
 
 static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] intel_pstate: Clarify average performance computation
  2016-05-06 23:44 ` [PATCH 1/3] intel_pstate: Clarify average performance computation Rafael J. Wysocki
@ 2016-05-10  1:18   ` Srinivas Pandruvada
  2016-05-10 19:21     ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Srinivas Pandruvada @ 2016-05-10  1:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM list; +Cc: Linux Kernel Mailing List

On Sat, 2016-05-07 at 01:44 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The core_pct_busy field of struct sample actually contains the
> average performace during the last sampling period (in percent)
> and not the utilization of the core as suggested by its name
> which is confusing.
> 
> For this reason, change the name of that field to core_avg_perf
> and rename the function that computes its value accordingly.
> 
Makes perfect sense.

> Also notice that it would be more useful if it was a "raw" fraction
> rather than percentage, so change its meaning too and update the
> code using it accordingly (it is better to change the name of
> the field along with its meaning in one go than to make those
> two changes separately, as that would likely lead to more
> confusion).
Due to the calculation the results from old and new method will be
similar but not same. For example in one scenario the
get_avg_frequency difference is 4.3KHz (printed side by side using both
old style using pct and new using fraction)
Frequency with old calc: 2996093 Hz
Frequency with old calc: 3000460 Hz

How much do you think the performance gain changing fraction vs pct?

> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c |   24 ++++++++++--------------
>  1 file changed, 10 insertions(+), 14 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -72,10 +72,10 @@ static inline int ceiling_fp(int32_t x)
>  
>  /**
>   * struct sample -	Store performance sample
> - * @core_pct_busy:	Ratio of APERF/MPERF in percent, which is
> actual
> + * @core_avg_perf:	Ratio of APERF/MPERF which is the actual
> average
>   *			performance during last sample period
>   * @busy_scaled:	Scaled busy value which is used to calculate
> next
> - *			P state. This can be different than
> core_pct_busy
> + *			P state. This can be different than
> core_avg_perf
>   *			to account for cpu idle period
>   * @aperf:		Difference of actual performance frequency
> clock count
>   *			read from APERF MSR between last and
> current sample
> @@ -90,7 +90,7 @@ static inline int ceiling_fp(int32_t x)
>   * data for choosing next P State.
>   */
>  struct sample {
> -	int32_t core_pct_busy;
> +	int32_t core_avg_perf;
>  	int32_t busy_scaled;
>  	u64 aperf;
>  	u64 mperf;
> @@ -1147,15 +1147,11 @@ static void intel_pstate_get_cpu_pstates
>  	intel_pstate_set_min_pstate(cpu);
>  }
>  
> -static inline void intel_pstate_calc_busy(struct cpudata *cpu)
> +static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
>  {
>  	struct sample *sample = &cpu->sample;
> -	int64_t core_pct;
>  
> -	core_pct = sample->aperf * int_tofp(100);
> -	core_pct = div64_u64(core_pct, sample->mperf);
> -
> -	sample->core_pct_busy = (int32_t)core_pct;
> +	sample->core_avg_perf = div_fp(sample->aperf, sample-
> >mperf);
>  }
>  
>  static inline bool intel_pstate_sample(struct cpudata *cpu, u64
> time)
> @@ -1198,9 +1194,9 @@ static inline bool intel_pstate_sample(s
>  
>  static inline int32_t get_avg_frequency(struct cpudata *cpu)
>  {
> -	return fp_toint(mul_fp(cpu->sample.core_pct_busy,
> +	return fp_toint(mul_fp(cpu->sample.core_avg_perf,
>  			       int_tofp(cpu-
> >pstate.max_pstate_physical *
> -						cpu->pstate.scaling
> / 100)));
> +						cpu-
> >pstate.scaling)));
>  }
>  
>  static inline int32_t get_avg_pstate(struct cpudata *cpu)
> @@ -1260,7 +1256,7 @@ static inline int32_t get_target_pstate_
>  	 * period. The result will be a percentage of busy at a
>  	 * specified pstate.
>  	 */
> -	core_busy = cpu->sample.core_pct_busy;
> +	core_busy = 100 * cpu->sample.core_avg_perf;
>  	max_pstate = cpu->pstate.max_pstate_physical;
>  	current_pstate = cpu->pstate.current_pstate;
>  	core_busy = mul_fp(core_busy, div_fp(max_pstate,
> current_pstate));
> @@ -1312,7 +1308,7 @@ static inline void intel_pstate_adjust_b
>  	intel_pstate_update_pstate(cpu, target_pstate);
>  
>  	sample = &cpu->sample;
> -	trace_pstate_sample(fp_toint(sample->core_pct_busy),
> +	trace_pstate_sample(fp_toint(100 * sample->core_avg_perf),
>  		fp_toint(sample->busy_scaled),
>  		from,
>  		cpu->pstate.current_pstate,
> @@ -1332,7 +1328,7 @@ static void intel_pstate_update_util(str
>  		bool sample_taken = intel_pstate_sample(cpu, time);
>  
>  		if (sample_taken) {
> -			intel_pstate_calc_busy(cpu);
> +			intel_pstate_calc_avg_perf(cpu);
>  			if (!hwp_active)
>  				intel_pstate_adjust_busy_pstate(cpu)
> ;
>  		}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] intel_pstate: Clean up get_target_pstate_use_performance()
  2016-05-06 23:47 ` [PATCH 3/3] intel_pstate: Clean up get_target_pstate_use_performance() Rafael J. Wysocki
@ 2016-05-10  1:24   ` Srinivas Pandruvada
  0 siblings, 0 replies; 16+ messages in thread
From: Srinivas Pandruvada @ 2016-05-10  1:24 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM list; +Cc: Linux Kernel Mailing List

On Sat, 2016-05-07 at 01:47 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The way the code in get_target_pstate_use_performance() is arranged
> and the comments in there are totally confusing, so modify them to
> reflect what's going on.
> 
> The results of the computations should be the same as before.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

> ---
>  drivers/cpufreq/intel_pstate.c |   32 +++++++++++++-----------------
> --
>  1 file changed, 13 insertions(+), 19 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -1241,43 +1241,37 @@ static inline int32_t get_target_pstate_
>  
>  static inline int32_t get_target_pstate_use_performance(struct
> cpudata *cpu)
>  {
> -	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
> +	int32_t perf_scaled, sample_ratio;
>  	u64 duration_ns;
>  
>  	/*
> -	 * core_busy is the ratio of actual performance to max
> -	 * max_pstate is the max non turbo pstate available
> -	 * current_pstate was the pstate that was requested during
> -	 * 	the last sample period.
> -	 *
> -	 * We normalize core_busy, which was our actual percent
> -	 * performance to what we requested during the last sample
> -	 * period. The result will be a percentage of busy at a
> -	 * specified pstate.
> +	 * perf_scaled is the average performance during the last
> sampling
> +	 * period (in percent) scaled by the ratio of the P-state
> requested
> +	 * last time to the maximum P-state.  That measures the
> system's
> +	 * response to the previous P-state selection.
>  	 */
> -	core_busy = 100 * cpu->sample.core_avg_perf;
> -	max_pstate = cpu->pstate.max_pstate_physical;
> -	current_pstate = cpu->pstate.current_pstate;
> -	core_busy = mul_fp(core_busy, div_fp(max_pstate,
> current_pstate));
> +	perf_scaled = div_fp(cpu->pstate.max_pstate_physical,
> +			     cpu->pstate.current_pstate);
> +	perf_scaled = mul_fp(perf_scaled, 100 * cpu-
> >sample.core_avg_perf);
>  
>  	/*
>  	 * Since our utilization update callback will not run unless
> we are
>  	 * in C0, check if the actual elapsed time is significantly
> greater (3x)
>  	 * than our sample interval.  If it is, then we were idle
> for a long
> -	 * enough period of time to adjust our busyness.
> +	 * enough period of time to adjust our performance metric.
>  	 */
>  	duration_ns = cpu->sample.time - cpu->last_sample_time;
>  	if ((s64)duration_ns > pid_params.sample_rate_ns * 3) {
>  		sample_ratio = div_fp(pid_params.sample_rate_ns,
> duration_ns);
> -		core_busy = mul_fp(core_busy, sample_ratio);
> +		perf_scaled = mul_fp(perf_scaled, sample_ratio);
>  	} else {
>  		sample_ratio = div_fp(100 * cpu->sample.mperf, cpu-
> >sample.tsc);
>  		if (sample_ratio < int_tofp(1))
> -			core_busy = 0;
> +			perf_scaled = 0;
>  	}
>  
> -	cpu->sample.busy_scaled = core_busy;
> -	return cpu->pstate.current_pstate - pid_calc(&cpu->pid,
> core_busy);
> +	cpu->sample.busy_scaled = perf_scaled;
> +	return cpu->pstate.current_pstate - pid_calc(&cpu->pid,
> perf_scaled);
>  }
>  
>  static inline void intel_pstate_update_pstate(struct cpudata *cpu,
> int pstate)
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] intel_pstate: Clarify average performance computation
  2016-05-10  1:18   ` Srinivas Pandruvada
@ 2016-05-10 19:21     ` Rafael J. Wysocki
  2016-05-10 19:58       ` Srinivas Pandruvada
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-10 19:21 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Linux PM list, Linux Kernel Mailing List

On Tue, May 10, 2016 at 3:18 AM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> On Sat, 2016-05-07 at 01:44 +0200, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> The core_pct_busy field of struct sample actually contains the
>> average performace during the last sampling period (in percent)
>> and not the utilization of the core as suggested by its name
>> which is confusing.
>>
>> For this reason, change the name of that field to core_avg_perf
>> and rename the function that computes its value accordingly.
>>
> Makes perfect sense.
>
>> Also notice that it would be more useful if it was a "raw" fraction
>> rather than percentage, so change its meaning too and update the
>> code using it accordingly (it is better to change the name of
>> the field along with its meaning in one go than to make those
>> two changes separately, as that would likely lead to more
>> confusion).
> Due to the calculation the results from old and new method will be
> similar but not same. For example in one scenario the
> get_avg_frequency difference is 4.3KHz (printed side by side using both
> old style using pct and new using fraction)
> Frequency with old calc: 2996093 Hz

I guess the above is the new one?

> Frequency with old calc: 3000460 Hz

So the relative difference is of the order of 0.1% and that number is
not what is used in PID computations.  That's what is printed, but I'm
not sure if that's really that important. :-)

Here, the sample.aperf bits lost because the 100 was moved away from
intel_pstate_calc_busy() would be multiplied by a relatively large
number to produce the difference that looks significant, but the
numbers actually used in computations are a few orders of magnitude
smaller.

> How much do you think the performance gain changing fraction vs pct?

I'm more concerned about latency than about performance.  On HWP, for
example, the costly multiplication removed by this from the hot path
is of the order of the half of the work done.

That said, I can do something to retain the bits in question for as
long as possible, although the patch will be slightly more complicated
then. :-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] intel_pstate: Clarify average performance computation
  2016-05-10 19:21     ` Rafael J. Wysocki
@ 2016-05-10 19:58       ` Srinivas Pandruvada
  2016-05-10 20:57         ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Srinivas Pandruvada @ 2016-05-10 19:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM list, Linux Kernel Mailing List

On Tue, 2016-05-10 at 21:21 +0200, Rafael J. Wysocki wrote:
> On Tue, May 10, 2016 at 3:18 AM, Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> > 
> > On Sat, 2016-05-07 at 01:44 +0200, Rafael J. Wysocki wrote:
> > > 
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > The core_pct_busy field of struct sample actually contains the
> > > average performace during the last sampling period (in percent)
> > > and not the utilization of the core as suggested by its name
> > > which is confusing.
> > > 
> > > For this reason, change the name of that field to core_avg_perf
> > > and rename the function that computes its value accordingly.
> > > 
> > Makes perfect sense.
> > 
> > > 
> > > Also notice that it would be more useful if it was a "raw"
> > > fraction
> > > rather than percentage, so change its meaning too and update the
> > > code using it accordingly (it is better to change the name of
> > > the field along with its meaning in one go than to make those
> > > two changes separately, as that would likely lead to more
> > > confusion).
> > Due to the calculation the results from old and new method will be
> > similar but not same. For example in one scenario the
> > get_avg_frequency difference is 4.3KHz (printed side by side using
> > both
> > old style using pct and new using fraction)
> > Frequency with old calc: 2996093 Hz
> I guess the above is the new one?
> 
> > 
> > Frequency with old calc: 3000460 Hz
> So the relative difference is of the order of 0.1% and that number is
> not what is used in PID computations.  That's what is printed, but
> I'm
> not sure if that's really that important. :-)

This difference will appear in cpufreq sysfs as their granularity in
KHz for current frequency. But the difference is very small. So I guess
no one will notice.

Thanks,
Srinivas
> 
> Here, the sample.aperf bits lost because the 100 was moved away from
> intel_pstate_calc_busy() would be multiplied by a relatively large
> number to produce the difference that looks significant, but the
> numbers actually used in computations are a few orders of magnitude
> smaller.
> 
> > 
> > How much do you think the performance gain changing fraction vs
> > pct?
> I'm more concerned about latency than about performance.  On HWP, for
> example, the costly multiplication removed by this from the hot path
> is of the order of the half of the work done.
> 
> That said, I can do something to retain the bits in question for as
> long as possible, although the patch will be slightly more
> complicated
> then. :-)
The

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] intel_pstate: Clarify average performance computation
  2016-05-10 19:58       ` Srinivas Pandruvada
@ 2016-05-10 20:57         ` Rafael J. Wysocki
  2016-05-11  5:01           ` Srinivas Pandruvada
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-10 20:57 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Linux PM list, Linux Kernel Mailing List

On Tuesday, May 10, 2016 12:58:04 PM Srinivas Pandruvada wrote:
> On Tue, 2016-05-10 at 21:21 +0200, Rafael J. Wysocki wrote:
> > On Tue, May 10, 2016 at 3:18 AM, Srinivas Pandruvada
> > <srinivas.pandruvada@linux.intel.com> wrote:
> > > 
> > > On Sat, 2016-05-07 at 01:44 +0200, Rafael J. Wysocki wrote:
> > > > 
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > 
> > > > The core_pct_busy field of struct sample actually contains the
> > > > average performace during the last sampling period (in percent)
> > > > and not the utilization of the core as suggested by its name
> > > > which is confusing.
> > > > 
> > > > For this reason, change the name of that field to core_avg_perf
> > > > and rename the function that computes its value accordingly.
> > > > 
> > > Makes perfect sense.
> > > 
> > > > 
> > > > Also notice that it would be more useful if it was a "raw"
> > > > fraction
> > > > rather than percentage, so change its meaning too and update the
> > > > code using it accordingly (it is better to change the name of
> > > > the field along with its meaning in one go than to make those
> > > > two changes separately, as that would likely lead to more
> > > > confusion).
> > > Due to the calculation the results from old and new method will be
> > > similar but not same. For example in one scenario the
> > > get_avg_frequency difference is 4.3KHz (printed side by side using
> > > both
> > > old style using pct and new using fraction)
> > > Frequency with old calc: 2996093 Hz
> > I guess the above is the new one?
> > 
> > > 
> > > Frequency with old calc: 3000460 Hz
> > So the relative difference is of the order of 0.1% and that number is
> > not what is used in PID computations.  That's what is printed, but
> > I'm
> > not sure if that's really that important. :-)
> 
> This difference will appear in cpufreq sysfs as their granularity in
> KHz for current frequency. But the difference is very small. So I guess
> no one will notice.

:-)

> > Here, the sample.aperf bits lost because the 100 was moved away from
> > intel_pstate_calc_busy() would be multiplied by a relatively large
> > number to produce the difference that looks significant, but the
> > numbers actually used in computations are a few orders of magnitude
> > smaller.
> > 
> > > 
> > > How much do you think the performance gain changing fraction vs
> > > pct?
> > I'm more concerned about latency than about performance.  On HWP, for
> > example, the costly multiplication removed by this from the hot path
> > is of the order of the half of the work done.
> > 
> > That said, I can do something to retain the bits in question for as
> > long as possible, although the patch will be slightly more
> > complicated
> > then. :-)

The patch below should do the trick.  I haven't tested it yet, but it is simple
enough IMO (of course, the [2-3/3] will need to be rebased on top of this one).

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] intel_pstate: Clarify average performance computation

The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utilization of the core as suggested by its name
which is confusing.

For this reason, change the name of that field to core_avg_perf
and rename the function that computes its value accordingly.

Also notice that storing this value as percentage requires a costly
integer multiplication to be carried out in a hot path, so instead
store it as an "extended fixed point" value with more fraction bits
and update the code using it accordingly (it is better to change the
name of the field along with its meaning in one go than to make those
two changes separately, as that would likely lead to more
confusion).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |   31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -49,6 +49,9 @@
 #define int_tofp(X) ((int64_t)(X) << FRAC_BITS)
 #define fp_toint(X) ((X) >> FRAC_BITS)
 
+#define EXT_BITS 6
+#define EXT_FRAC_BITS (EXT_BITS + FRAC_BITS)
+
 static inline int32_t mul_fp(int32_t x, int32_t y)
 {
 	return ((int64_t)x * (int64_t)y) >> FRAC_BITS;
@@ -72,10 +75,10 @@ static inline int ceiling_fp(int32_t x)
 
 /**
  * struct sample -	Store performance sample
- * @core_pct_busy:	Ratio of APERF/MPERF in percent, which is actual
+ * @core_avg_perf:	Ratio of APERF/MPERF which is the actual average
  *			performance during last sample period
  * @busy_scaled:	Scaled busy value which is used to calculate next
- *			P state. This can be different than core_pct_busy
+ *			P state. This can be different than core_avg_perf
  *			to account for cpu idle period
  * @aperf:		Difference of actual performance frequency clock count
  *			read from APERF MSR between last and current sample
@@ -90,8 +93,8 @@ static inline int ceiling_fp(int32_t x)
  * data for choosing next P State.
  */
 struct sample {
-	int32_t core_pct_busy;
 	int32_t busy_scaled;
+	u64 core_avg_perf;
 	u64 aperf;
 	u64 mperf;
 	u64 tsc;
@@ -1147,15 +1150,12 @@ static void intel_pstate_get_cpu_pstates
 	intel_pstate_set_min_pstate(cpu);
 }
 
-static inline void intel_pstate_calc_busy(struct cpudata *cpu)
+static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
 {
 	struct sample *sample = &cpu->sample;
-	int64_t core_pct;
-
-	core_pct = sample->aperf * int_tofp(100);
-	core_pct = div64_u64(core_pct, sample->mperf);
 
-	sample->core_pct_busy = (int32_t)core_pct;
+	sample->core_avg_perf = div64_u64(sample->aperf << EXT_FRAC_BITS,
+					  sample->mperf);
 }
 
 static inline bool intel_pstate_sample(struct cpudata *cpu, u64 time)
@@ -1198,9 +1198,8 @@ static inline bool intel_pstate_sample(s
 
 static inline int32_t get_avg_frequency(struct cpudata *cpu)
 {
-	return fp_toint(mul_fp(cpu->sample.core_pct_busy,
-			       int_tofp(cpu->pstate.max_pstate_physical *
-						cpu->pstate.scaling / 100)));
+	return (cpu->sample.core_avg_perf * cpu->pstate.max_pstate_physical *
+			cpu->pstate.scaling) >> EXT_FRAC_BITS;
 }
 
 static inline int32_t get_avg_pstate(struct cpudata *cpu)
@@ -1260,10 +1259,10 @@ static inline int32_t get_target_pstate_
 	 * period. The result will be a percentage of busy at a
 	 * specified pstate.
 	 */
-	core_busy = cpu->sample.core_pct_busy;
 	max_pstate = cpu->pstate.max_pstate_physical;
 	current_pstate = cpu->pstate.current_pstate;
-	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
+	core_busy = (100 * cpu->sample.core_avg_perf *
+			div_fp(max_pstate, current_pstate)) >> EXT_FRAC_BITS;
 
 	/*
 	 * Since our utilization update callback will not run unless we are
@@ -1312,7 +1311,7 @@ static inline void intel_pstate_adjust_b
 	intel_pstate_update_pstate(cpu, target_pstate);
 
 	sample = &cpu->sample;
-	trace_pstate_sample(fp_toint(sample->core_pct_busy),
+	trace_pstate_sample((100 * sample->core_avg_perf) >> EXT_FRAC_BITS,
 		fp_toint(sample->busy_scaled),
 		from,
 		cpu->pstate.current_pstate,
@@ -1332,7 +1331,7 @@ static void intel_pstate_update_util(str
 		bool sample_taken = intel_pstate_sample(cpu, time);
 
 		if (sample_taken) {
-			intel_pstate_calc_busy(cpu);
+			intel_pstate_calc_avg_perf(cpu);
 			if (!hwp_active)
 				intel_pstate_adjust_busy_pstate(cpu);
 		}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] intel_pstate: Clarify average performance computation
  2016-05-10 20:57         ` Rafael J. Wysocki
@ 2016-05-11  5:01           ` Srinivas Pandruvada
  2016-05-11 13:46             ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Srinivas Pandruvada @ 2016-05-11  5:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM list, Linux Kernel Mailing List

On Tue, 2016-05-10 at 22:57 +0200, Rafael J. Wysocki wrote:
> > > > 

[...]

> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] intel_pstate: Clarify average performance
> computation
> 
> The core_pct_busy field of struct sample actually contains the
> average performace during the last sampling period (in percent)
> and not the utilization of the core as suggested by its name
> which is confusing.
> 
> For this reason, change the name of that field to core_avg_perf
> and rename the function that computes its value accordingly.
> 
> Also notice that storing this value as percentage requires a costly
> integer multiplication to be carried out in a hot path, so instead
> store it as an "extended fixed point" value with more fraction bits
> and update the code using it accordingly (it is better to change the
> name of the field along with its meaning in one go than to make those
> two changes separately, as that would likely lead to more
> confusion).
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c |   31 +++++++++++++++---------------
> -
>  1 file changed, 15 insertions(+), 16 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -49,6 +49,9 @@
>  #define int_tofp(X) ((int64_t)(X) << FRAC_BITS)
>  #define fp_toint(X) ((X) >> FRAC_BITS)
>  
> +#define EXT_BITS 6
> +#define EXT_FRAC_BITS (EXT_BITS + FRAC_BITS)
> +
>  static inline int32_t mul_fp(int32_t x, int32_t y)
>  {
>  	return ((int64_t)x * (int64_t)y) >> FRAC_BITS;
> @@ -72,10 +75,10 @@ static inline int ceiling_fp(int32_t x)
>  
>  /**
>   * struct sample -	Store performance sample
> - * @core_pct_busy:	Ratio of APERF/MPERF in percent, which is
> actual
> + * @core_avg_perf:	Ratio of APERF/MPERF which is the actual
> average
>   *			performance during last sample period
>   * @busy_scaled:	Scaled busy value which is used to calculate
> next
> - *			P state. This can be different than
> core_pct_busy
> + *			P state. This can be different than
> core_avg_perf
>   *			to account for cpu idle period
>   * @aperf:		Difference of actual performance frequency
> clock count
>   *			read from APERF MSR between last and
> current sample
> @@ -90,8 +93,8 @@ static inline int ceiling_fp(int32_t x)
>   * data for choosing next P State.
>   */
>  struct sample {
> -	int32_t core_pct_busy;
>  	int32_t busy_scaled;
> +	u64 core_avg_perf;
>  	u64 aperf;
>  	u64 mperf;
>  	u64 tsc;
> @@ -1147,15 +1150,12 @@ static void intel_pstate_get_cpu_pstates
>  	intel_pstate_set_min_pstate(cpu);
>  }
>  
> -static inline void intel_pstate_calc_busy(struct cpudata *cpu)
> +static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
>  {
>  	struct sample *sample = &cpu->sample;
> -	int64_t core_pct;
> -
> -	core_pct = sample->aperf * int_tofp(100);
> -	core_pct = div64_u64(core_pct, sample->mperf);
>  
> -	sample->core_pct_busy = (int32_t)core_pct;
> +	sample->core_avg_perf = div64_u64(sample->aperf <<
> EXT_FRAC_BITS,
> +					  sample->mperf);
>  }
>  
>  static inline bool intel_pstate_sample(struct cpudata *cpu, u64
> time)
> @@ -1198,9 +1198,8 @@ static inline bool intel_pstate_sample(s
>  
>  static inline int32_t get_avg_frequency(struct cpudata *cpu)
>  {
> -	return fp_toint(mul_fp(cpu->sample.core_pct_busy,
> -			       int_tofp(cpu-
> >pstate.max_pstate_physical *
> -						cpu->pstate.scaling
> / 100)));
> +	return (cpu->sample.core_avg_perf * cpu-
> >pstate.max_pstate_physical *
> +			cpu->pstate.scaling) >> EXT_FRAC_BITS;

This breaks frequency display. Needs cast
return ((u64)cpu->sample.core_avg_perf * cpu->
	pstate.max_pstate_physical * cpu->pstate.scaling) >>
EXT_FRAC_BITS;

Otherwise results are very close with the version without this change.

Thanks,
Srinivas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] intel_pstate: Clarify average performance computation
  2016-05-11  5:01           ` Srinivas Pandruvada
@ 2016-05-11 13:46             ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-11 13:46 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM list,
	Linux Kernel Mailing List

On Wed, May 11, 2016 at 7:01 AM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> On Tue, 2016-05-10 at 22:57 +0200, Rafael J. Wysocki wrote:
>> > > >
>
> [...]
>
>> ---
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Subject: [PATCH] intel_pstate: Clarify average performance
>> computation
>>
>> The core_pct_busy field of struct sample actually contains the
>> average performace during the last sampling period (in percent)
>> and not the utilization of the core as suggested by its name
>> which is confusing.
>>
>> For this reason, change the name of that field to core_avg_perf
>> and rename the function that computes its value accordingly.
>>
>> Also notice that storing this value as percentage requires a costly
>> integer multiplication to be carried out in a hot path, so instead
>> store it as an "extended fixed point" value with more fraction bits
>> and update the code using it accordingly (it is better to change the
>> name of the field along with its meaning in one go than to make those
>> two changes separately, as that would likely lead to more
>> confusion).
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>  drivers/cpufreq/intel_pstate.c |   31 +++++++++++++++---------------
>> -
>>  1 file changed, 15 insertions(+), 16 deletions(-)
>>
>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>> ===================================================================
>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>> @@ -49,6 +49,9 @@
>>  #define int_tofp(X) ((int64_t)(X) << FRAC_BITS)
>>  #define fp_toint(X) ((X) >> FRAC_BITS)
>>
>> +#define EXT_BITS 6
>> +#define EXT_FRAC_BITS (EXT_BITS + FRAC_BITS)
>> +
>>  static inline int32_t mul_fp(int32_t x, int32_t y)
>>  {
>>       return ((int64_t)x * (int64_t)y) >> FRAC_BITS;
>> @@ -72,10 +75,10 @@ static inline int ceiling_fp(int32_t x)
>>
>>  /**
>>   * struct sample -   Store performance sample
>> - * @core_pct_busy:   Ratio of APERF/MPERF in percent, which is
>> actual
>> + * @core_avg_perf:   Ratio of APERF/MPERF which is the actual
>> average
>>   *                   performance during last sample period
>>   * @busy_scaled:     Scaled busy value which is used to calculate
>> next
>> - *                   P state. This can be different than
>> core_pct_busy
>> + *                   P state. This can be different than
>> core_avg_perf
>>   *                   to account for cpu idle period
>>   * @aperf:           Difference of actual performance frequency
>> clock count
>>   *                   read from APERF MSR between last and
>> current sample
>> @@ -90,8 +93,8 @@ static inline int ceiling_fp(int32_t x)
>>   * data for choosing next P State.
>>   */
>>  struct sample {
>> -     int32_t core_pct_busy;
>>       int32_t busy_scaled;
>> +     u64 core_avg_perf;
>>       u64 aperf;
>>       u64 mperf;
>>       u64 tsc;
>> @@ -1147,15 +1150,12 @@ static void intel_pstate_get_cpu_pstates
>>       intel_pstate_set_min_pstate(cpu);
>>  }
>>
>> -static inline void intel_pstate_calc_busy(struct cpudata *cpu)
>> +static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
>>  {
>>       struct sample *sample = &cpu->sample;
>> -     int64_t core_pct;
>> -
>> -     core_pct = sample->aperf * int_tofp(100);
>> -     core_pct = div64_u64(core_pct, sample->mperf);
>>
>> -     sample->core_pct_busy = (int32_t)core_pct;
>> +     sample->core_avg_perf = div64_u64(sample->aperf <<
>> EXT_FRAC_BITS,
>> +                                       sample->mperf);
>>  }
>>
>>  static inline bool intel_pstate_sample(struct cpudata *cpu, u64
>> time)
>> @@ -1198,9 +1198,8 @@ static inline bool intel_pstate_sample(s
>>
>>  static inline int32_t get_avg_frequency(struct cpudata *cpu)
>>  {
>> -     return fp_toint(mul_fp(cpu->sample.core_pct_busy,
>> -                            int_tofp(cpu-
>> >pstate.max_pstate_physical *
>> -                                             cpu->pstate.scaling
>> / 100)));
>> +     return (cpu->sample.core_avg_perf * cpu-
>> >pstate.max_pstate_physical *
>> +                     cpu->pstate.scaling) >> EXT_FRAC_BITS;
>
> This breaks frequency display. Needs cast
> return ((u64)cpu->sample.core_avg_perf * cpu->
>         pstate.max_pstate_physical * cpu->pstate.scaling) >>
> EXT_FRAC_BITS;

Well, that's strange, because sample.core_avg_perf is a u64 after this
patch already.

But if we are to make explicit type conversions, I'd rather store
sample.core_avg_perf in 32 bit.

> Otherwise results are very close with the version without this change.

OK, let me resend the series with this patch reworked once again.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation
  2016-05-06 23:42 [PATCH 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2016-05-06 23:47 ` [PATCH 3/3] intel_pstate: Clean up get_target_pstate_use_performance() Rafael J. Wysocki
@ 2016-05-11 17:06 ` Rafael J. Wysocki
  2016-05-11 17:09   ` [PATCH v2, 1/3] intel_pstate: Clarify average performance computation Rafael J. Wysocki
                     ` (3 more replies)
  3 siblings, 4 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-11 17:06 UTC (permalink / raw)
  To: Linux PM list, Srinivas Pandruvada; +Cc: Linux Kernel Mailing List

On Saturday, May 07, 2016 01:42:53 AM Rafael J. Wysocki wrote:
> Hi,
> 
> As per the subject, on top of the current linux-next branch of linux-pm.git.

This version has been reworked to retain more APERF bits after carrying out the
\Delta_{APERF}/\Delta_{MPERF} which should lead to a bit more precise numbers
in sysfs etc.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2, 1/3] intel_pstate: Clarify average performance computation
  2016-05-11 17:06 ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
@ 2016-05-11 17:09   ` Rafael J. Wysocki
  2016-05-11 17:10   ` [PATCH v2, 2/3] intel_pstate: Use sample.core_avg_perf in get_avg_pstate() Rafael J. Wysocki
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-11 17:09 UTC (permalink / raw)
  To: Linux PM list, Srinivas Pandruvada; +Cc: Linux Kernel Mailing List

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utilization of the core as suggested by its name
which is confusing.

For this reason, change the name of that field to core_avg_perf
and rename the function that computes its value accordingly.

Also notice that storing this value as percentage requires a costly
integer multiplication to be carried out in a hot path, so instead
store it as an "extended fixed point" value with more fraction bits
and update the code using it accordingly (it is better to change the
name of the field along with its meaning in one go than to make those
two changes separately, as that would likely lead to more
confusion).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2: Preserve more APERF bits from APERF/MPERF to increase precision

---
 drivers/cpufreq/intel_pstate.c |   40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -49,6 +49,9 @@
 #define int_tofp(X) ((int64_t)(X) << FRAC_BITS)
 #define fp_toint(X) ((X) >> FRAC_BITS)
 
+#define EXT_BITS 6
+#define EXT_FRAC_BITS (EXT_BITS + FRAC_BITS)
+
 static inline int32_t mul_fp(int32_t x, int32_t y)
 {
 	return ((int64_t)x * (int64_t)y) >> FRAC_BITS;
@@ -70,12 +73,22 @@ static inline int ceiling_fp(int32_t x)
 	return ret;
 }
 
+static inline u64 mul_ext_fp(u64 x, u64 y)
+{
+	return (x * y) >> EXT_FRAC_BITS;
+}
+
+static inline u64 div_ext_fp(u64 x, u64 y)
+{
+	return div64_u64(x << EXT_FRAC_BITS, y);
+}
+
 /**
  * struct sample -	Store performance sample
- * @core_pct_busy:	Ratio of APERF/MPERF in percent, which is actual
+ * @core_avg_perf:	Ratio of APERF/MPERF which is the actual average
  *			performance during last sample period
  * @busy_scaled:	Scaled busy value which is used to calculate next
- *			P state. This can be different than core_pct_busy
+ *			P state. This can be different than core_avg_perf
  *			to account for cpu idle period
  * @aperf:		Difference of actual performance frequency clock count
  *			read from APERF MSR between last and current sample
@@ -90,7 +103,7 @@ static inline int ceiling_fp(int32_t x)
  * data for choosing next P State.
  */
 struct sample {
-	int32_t core_pct_busy;
+	int32_t core_avg_perf;
 	int32_t busy_scaled;
 	u64 aperf;
 	u64 mperf;
@@ -1147,15 +1160,11 @@ static void intel_pstate_get_cpu_pstates
 	intel_pstate_set_min_pstate(cpu);
 }
 
-static inline void intel_pstate_calc_busy(struct cpudata *cpu)
+static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
 {
 	struct sample *sample = &cpu->sample;
-	int64_t core_pct;
-
-	core_pct = sample->aperf * int_tofp(100);
-	core_pct = div64_u64(core_pct, sample->mperf);
 
-	sample->core_pct_busy = (int32_t)core_pct;
+	sample->core_avg_perf = div_ext_fp(sample->aperf, sample->mperf);
 }
 
 static inline bool intel_pstate_sample(struct cpudata *cpu, u64 time)
@@ -1198,9 +1207,8 @@ static inline bool intel_pstate_sample(s
 
 static inline int32_t get_avg_frequency(struct cpudata *cpu)
 {
-	return fp_toint(mul_fp(cpu->sample.core_pct_busy,
-			       int_tofp(cpu->pstate.max_pstate_physical *
-						cpu->pstate.scaling / 100)));
+	return mul_ext_fp(cpu->sample.core_avg_perf,
+			  cpu->pstate.max_pstate_physical * cpu->pstate.scaling);
 }
 
 static inline int32_t get_avg_pstate(struct cpudata *cpu)
@@ -1260,10 +1268,10 @@ static inline int32_t get_target_pstate_
 	 * period. The result will be a percentage of busy at a
 	 * specified pstate.
 	 */
-	core_busy = cpu->sample.core_pct_busy;
 	max_pstate = cpu->pstate.max_pstate_physical;
 	current_pstate = cpu->pstate.current_pstate;
-	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
+	core_busy = mul_ext_fp(cpu->sample.core_avg_perf,
+			       div_fp(100 * max_pstate, current_pstate));
 
 	/*
 	 * Since our utilization update callback will not run unless we are
@@ -1312,7 +1320,7 @@ static inline void intel_pstate_adjust_b
 	intel_pstate_update_pstate(cpu, target_pstate);
 
 	sample = &cpu->sample;
-	trace_pstate_sample(fp_toint(sample->core_pct_busy),
+	trace_pstate_sample(mul_ext_fp(100, sample->core_avg_perf),
 		fp_toint(sample->busy_scaled),
 		from,
 		cpu->pstate.current_pstate,
@@ -1332,7 +1340,7 @@ static void intel_pstate_update_util(str
 		bool sample_taken = intel_pstate_sample(cpu, time);
 
 		if (sample_taken) {
-			intel_pstate_calc_busy(cpu);
+			intel_pstate_calc_avg_perf(cpu);
 			if (!hwp_active)
 				intel_pstate_adjust_busy_pstate(cpu);
 		}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2, 2/3] intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
  2016-05-11 17:06 ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
  2016-05-11 17:09   ` [PATCH v2, 1/3] intel_pstate: Clarify average performance computation Rafael J. Wysocki
@ 2016-05-11 17:10   ` Rafael J. Wysocki
  2016-05-11 17:11   ` [PATCH v2, 3/3] intel_pstate: Clean up get_target_pstate_use_performance() Rafael J. Wysocki
  2016-05-13  0:34   ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Srinivas Pandruvada
  3 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-11 17:10 UTC (permalink / raw)
  To: Linux PM list, Srinivas Pandruvada; +Cc: Linux Kernel Mailing List

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Notice that get_avg_pstate() can use sample.core_avg_perf instead of
carrying the same division again, so make it do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2: Use mul_ext_fp() to carry out the computation in get_avg_pstate().

---
 drivers/cpufreq/intel_pstate.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1213,8 +1213,8 @@ static inline int32_t get_avg_frequency(
 
 static inline int32_t get_avg_pstate(struct cpudata *cpu)
 {
-	return div64_u64(cpu->pstate.max_pstate_physical * cpu->sample.aperf,
-			 cpu->sample.mperf);
+	return mul_ext_fp(cpu->pstate.max_pstate_physical,
+			  cpu->sample.core_avg_perf);
 }
 
 static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2, 3/3] intel_pstate: Clean up get_target_pstate_use_performance()
  2016-05-11 17:06 ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
  2016-05-11 17:09   ` [PATCH v2, 1/3] intel_pstate: Clarify average performance computation Rafael J. Wysocki
  2016-05-11 17:10   ` [PATCH v2, 2/3] intel_pstate: Use sample.core_avg_perf in get_avg_pstate() Rafael J. Wysocki
@ 2016-05-11 17:11   ` Rafael J. Wysocki
  2016-05-13  0:34   ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Srinivas Pandruvada
  3 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-05-11 17:11 UTC (permalink / raw)
  To: Linux PM list, Srinivas Pandruvada; +Cc: Linux Kernel Mailing List

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The comments and the core_busy variable name in
get_target_pstate_use_performance() are totally confusing,
so modify them to reflect what's going on.

The results of the computations should be the same as before.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |   27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1254,43 +1254,38 @@ static inline int32_t get_target_pstate_
 
 static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
 {
-	int32_t core_busy, max_pstate, current_pstate, sample_ratio;
+	int32_t perf_scaled, max_pstate, current_pstate, sample_ratio;
 	u64 duration_ns;
 
 	/*
-	 * core_busy is the ratio of actual performance to max
-	 * max_pstate is the max non turbo pstate available
-	 * current_pstate was the pstate that was requested during
-	 * 	the last sample period.
-	 *
-	 * We normalize core_busy, which was our actual percent
-	 * performance to what we requested during the last sample
-	 * period. The result will be a percentage of busy at a
-	 * specified pstate.
+	 * perf_scaled is the average performance during the last sampling
+	 * period scaled by the ratio of the maximum P-state to the P-state
+	 * requested last time (in percent).  That measures the system's
+	 * response to the previous P-state selection.
 	 */
 	max_pstate = cpu->pstate.max_pstate_physical;
 	current_pstate = cpu->pstate.current_pstate;
-	core_busy = mul_ext_fp(cpu->sample.core_avg_perf,
+	perf_scaled = mul_ext_fp(cpu->sample.core_avg_perf,
 			       div_fp(100 * max_pstate, current_pstate));
 
 	/*
 	 * Since our utilization update callback will not run unless we are
 	 * in C0, check if the actual elapsed time is significantly greater (3x)
 	 * than our sample interval.  If it is, then we were idle for a long
-	 * enough period of time to adjust our busyness.
+	 * enough period of time to adjust our performance metric.
 	 */
 	duration_ns = cpu->sample.time - cpu->last_sample_time;
 	if ((s64)duration_ns > pid_params.sample_rate_ns * 3) {
 		sample_ratio = div_fp(pid_params.sample_rate_ns, duration_ns);
-		core_busy = mul_fp(core_busy, sample_ratio);
+		perf_scaled = mul_fp(perf_scaled, sample_ratio);
 	} else {
 		sample_ratio = div_fp(100 * cpu->sample.mperf, cpu->sample.tsc);
 		if (sample_ratio < int_tofp(1))
-			core_busy = 0;
+			perf_scaled = 0;
 	}
 
-	cpu->sample.busy_scaled = core_busy;
-	return cpu->pstate.current_pstate - pid_calc(&cpu->pid, core_busy);
+	cpu->sample.busy_scaled = perf_scaled;
+	return cpu->pstate.current_pstate - pid_calc(&cpu->pid, perf_scaled);
 }
 
 static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation
  2016-05-11 17:06 ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2016-05-11 17:11   ` [PATCH v2, 3/3] intel_pstate: Clean up get_target_pstate_use_performance() Rafael J. Wysocki
@ 2016-05-13  0:34   ` Srinivas Pandruvada
  3 siblings, 0 replies; 16+ messages in thread
From: Srinivas Pandruvada @ 2016-05-13  0:34 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM list; +Cc: Linux Kernel Mailing List

On Wed, 2016-05-11 at 19:06 +0200, Rafael J. Wysocki wrote:
> On Saturday, May 07, 2016 01:42:53 AM Rafael J. Wysocki wrote:
> > Hi,
> > 
> > As per the subject, on top of the current linux-next branch of
> > linux-pm.git.
> 
> This version has been reworked to retain more APERF bits after
> carrying out the
> \Delta_{APERF}/\Delta_{MPERF} which should lead to a bit more precise
> numbers
> in sysfs etc.

Tested the series and looks fine now.

Thanks,
Srinivas

> 
> Thanks,
> Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-05-13  0:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-06 23:42 [PATCH 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
2016-05-06 23:44 ` [PATCH 1/3] intel_pstate: Clarify average performance computation Rafael J. Wysocki
2016-05-10  1:18   ` Srinivas Pandruvada
2016-05-10 19:21     ` Rafael J. Wysocki
2016-05-10 19:58       ` Srinivas Pandruvada
2016-05-10 20:57         ` Rafael J. Wysocki
2016-05-11  5:01           ` Srinivas Pandruvada
2016-05-11 13:46             ` Rafael J. Wysocki
2016-05-06 23:45 ` [PATCH 2/3] intel_pstate: Use sample.core_avg_perf in get_avg_pstate() Rafael J. Wysocki
2016-05-06 23:47 ` [PATCH 3/3] intel_pstate: Clean up get_target_pstate_use_performance() Rafael J. Wysocki
2016-05-10  1:24   ` Srinivas Pandruvada
2016-05-11 17:06 ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Rafael J. Wysocki
2016-05-11 17:09   ` [PATCH v2, 1/3] intel_pstate: Clarify average performance computation Rafael J. Wysocki
2016-05-11 17:10   ` [PATCH v2, 2/3] intel_pstate: Use sample.core_avg_perf in get_avg_pstate() Rafael J. Wysocki
2016-05-11 17:11   ` [PATCH v2, 3/3] intel_pstate: Clean up get_target_pstate_use_performance() Rafael J. Wysocki
2016-05-13  0:34   ` [PATCH v2, 0/3] intel_pstate: Improvements related to the APERF/MPERF computation Srinivas Pandruvada

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).