linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] cpufreq: schedutil: Optimize operations in hot path frequency switch
@ 2022-12-07 10:17 Lukasz Luba
  2022-12-07 10:17 ` [PATCH v2 1/2] cpufreq: schedutil: Introduce single max CPU capacity for freqency domain Lukasz Luba
  2022-12-07 10:17 ` [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity Lukasz Luba
  0 siblings, 2 replies; 11+ messages in thread
From: Lukasz Luba @ 2022-12-07 10:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, viresh.kumar, dietmar.eggemann, vincent.guittot,
	saravanak, wusamuel, isaacmanjarres, kernel-team, juri.lelli,
	peterz, mingo, rostedt, bsegall, mgorman

Hi all,

This v2 is an attempt to optimize some not needed operations in the frequency
change code path for the shared freq. domain. There was different approach
which was too aggressive and failed do to working on a state CPU capacity
information.

changes:
v2:
- split the patch into two (Viresh)
v1:
- simple approach which fetches CPU capacity every time it's needed,
  not relaying on the setup value, which was causing issues.

Regards,
Lukasz

Lukasz Luba (2):
  cpufreq: schedutil: Introduce single max CPU capacity for freqency
    domain
  cpufreq: schedutil: Optimize operations with single max CPU capacity

 kernel/sched/cpufreq_schedutil.c | 35 +++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 14 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] cpufreq: schedutil: Introduce single max CPU capacity for freqency domain
  2022-12-07 10:17 [PATCH v2 0/2] cpufreq: schedutil: Optimize operations in hot path frequency switch Lukasz Luba
@ 2022-12-07 10:17 ` Lukasz Luba
  2022-12-08  3:57   ` Viresh Kumar
  2022-12-07 10:17 ` [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity Lukasz Luba
  1 sibling, 1 reply; 11+ messages in thread
From: Lukasz Luba @ 2022-12-07 10:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, viresh.kumar, dietmar.eggemann, vincent.guittot,
	saravanak, wusamuel, isaacmanjarres, kernel-team, juri.lelli,
	peterz, mingo, rostedt, bsegall, mgorman

Prepare the code for optimizations and move the max CPU capacity
into the sugov_policy struct. This will allow to leverage the fact that
the max CPU capacity is the same for all CPUs in the frequency domain.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 1207c78f85c1..c19d6de67b7a 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -25,6 +25,9 @@ struct sugov_policy {
 	unsigned int		next_freq;
 	unsigned int		cached_raw_freq;
 
+	/* max CPU capacity, which is equal for all CPUs in freq. domain */
+	unsigned long           max;
+
 	/* The next fields are only needed if fast switch cannot be used: */
 	struct			irq_work irq_work;
 	struct			kthread_work work;
@@ -48,7 +51,6 @@ struct sugov_cpu {
 
 	unsigned long		util;
 	unsigned long		bw_dl;
-	unsigned long		max;
 
 	/* The field below is for single-CPU policies only: */
 #ifdef CONFIG_NO_HZ_COMMON
@@ -156,9 +158,10 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 
 static void sugov_get_util(struct sugov_cpu *sg_cpu)
 {
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct rq *rq = cpu_rq(sg_cpu->cpu);
 
-	sg_cpu->max = arch_scale_cpu_capacity(sg_cpu->cpu);
+	sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
 	sg_cpu->bw_dl = cpu_bw_dl(rq);
 	sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu),
 					  FREQUENCY_UTIL, NULL);
@@ -253,6 +256,7 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
  */
 static void sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time)
 {
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned long boost;
 
 	/* No boost currently required */
@@ -280,7 +284,8 @@ static void sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time)
 	 * sg_cpu->util is already in capacity scale; convert iowait_boost
 	 * into the same scale so we can compare.
 	 */
-	boost = (sg_cpu->iowait_boost * sg_cpu->max) >> SCHED_CAPACITY_SHIFT;
+	boost = sg_cpu->iowait_boost * sg_policy->max;
+	boost >>= SCHED_CAPACITY_SHIFT;
 	boost = uclamp_rq_util_with(cpu_rq(sg_cpu->cpu), boost, NULL);
 	if (sg_cpu->util < boost)
 		sg_cpu->util = boost;
@@ -337,7 +342,7 @@ static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
 	if (!sugov_update_single_common(sg_cpu, time, flags))
 		return;
 
-	next_f = get_next_freq(sg_policy, sg_cpu->util, sg_cpu->max);
+	next_f = get_next_freq(sg_policy, sg_cpu->util, sg_policy->max);
 	/*
 	 * Do not reduce the frequency if the CPU has not been idle
 	 * recently, as the reduction is likely to be premature then.
@@ -373,6 +378,7 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
 				     unsigned int flags)
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned long prev_util = sg_cpu->util;
 
 	/*
@@ -399,7 +405,7 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
 		sg_cpu->util = prev_util;
 
 	cpufreq_driver_adjust_perf(sg_cpu->cpu, map_util_perf(sg_cpu->bw_dl),
-				   map_util_perf(sg_cpu->util), sg_cpu->max);
+				   map_util_perf(sg_cpu->util), sg_policy->max);
 
 	sg_cpu->sg_policy->last_freq_update_time = time;
 }
@@ -760,6 +766,7 @@ static int sugov_start(struct cpufreq_policy *policy)
 	sg_policy->work_in_progress		= false;
 	sg_policy->limits_changed		= false;
 	sg_policy->cached_raw_freq		= 0;
+	sg_policy->max				= 0;
 
 	sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity
  2022-12-07 10:17 [PATCH v2 0/2] cpufreq: schedutil: Optimize operations in hot path frequency switch Lukasz Luba
  2022-12-07 10:17 ` [PATCH v2 1/2] cpufreq: schedutil: Introduce single max CPU capacity for freqency domain Lukasz Luba
@ 2022-12-07 10:17 ` Lukasz Luba
  2022-12-08  4:09   ` Viresh Kumar
  2022-12-08  8:37   ` Vincent Guittot
  1 sibling, 2 replies; 11+ messages in thread
From: Lukasz Luba @ 2022-12-07 10:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, rafael
  Cc: lukasz.luba, viresh.kumar, dietmar.eggemann, vincent.guittot,
	saravanak, wusamuel, isaacmanjarres, kernel-team, juri.lelli,
	peterz, mingo, rostedt, bsegall, mgorman

The max CPU capacity is the same for all CPUs sharing frequency domain
and thus 'policy' object. There is a way to avoid heavy operations
in a loop for each CPU by leveraging this knowledge. Thus, simplify
the looping code in the sugov_next_freq_shared() and drop heavy
multiplications. Instead, use simple max() to get the highest utilization
from these CPUs. This is useful for platforms with many (4 or 6) little
CPUs.

The max CPU capacity must be fetched every time we are called, due to
difficulties during the policy setup, where we are not able to get the
normalized CPU capacity at the right time.

The stored value in sugov_policy::max is also than used in
sugov_iowait_apply() to calculate the right boost. Thus, that field is
useful to have in that sugov_policy struct.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index c19d6de67b7a..f9881f3d9488 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -158,10 +158,8 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 
 static void sugov_get_util(struct sugov_cpu *sg_cpu)
 {
-	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct rq *rq = cpu_rq(sg_cpu->cpu);
 
-	sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
 	sg_cpu->bw_dl = cpu_bw_dl(rq);
 	sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu),
 					  FREQUENCY_UTIL, NULL);
@@ -317,6 +315,8 @@ static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
 static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
 					      u64 time, unsigned int flags)
 {
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
+
 	sugov_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 
@@ -325,6 +325,9 @@ static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
 	if (!sugov_should_update_freq(sg_cpu->sg_policy, time))
 		return false;
 
+	/* Fetch the latest CPU capcity to avoid stale data */
+	sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
+
 	sugov_get_util(sg_cpu);
 	sugov_iowait_apply(sg_cpu, time);
 
@@ -414,25 +417,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 {
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct cpufreq_policy *policy = sg_policy->policy;
-	unsigned long util = 0, max = 1;
+	unsigned long util = 0;
 	unsigned int j;
 
+	/* Fetch the latest CPU capcity to avoid stale data */
+	sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
+
 	for_each_cpu(j, policy->cpus) {
 		struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
-		unsigned long j_util, j_max;
 
 		sugov_get_util(j_sg_cpu);
 		sugov_iowait_apply(j_sg_cpu, time);
-		j_util = j_sg_cpu->util;
-		j_max = j_sg_cpu->max;
 
-		if (j_util * max > j_max * util) {
-			util = j_util;
-			max = j_max;
-		}
+		util = max(j_sg_cpu->util, util);
 	}
 
-	return get_next_freq(sg_policy, util, max);
+	return get_next_freq(sg_policy, util, sg_policy->max);
 }
 
 static void
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: schedutil: Introduce single max CPU capacity for freqency domain
  2022-12-07 10:17 ` [PATCH v2 1/2] cpufreq: schedutil: Introduce single max CPU capacity for freqency domain Lukasz Luba
@ 2022-12-08  3:57   ` Viresh Kumar
  0 siblings, 0 replies; 11+ messages in thread
From: Viresh Kumar @ 2022-12-08  3:57 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, rafael, dietmar.eggemann,
	vincent.guittot, saravanak, wusamuel, isaacmanjarres,
	kernel-team, juri.lelli, peterz, mingo, rostedt, bsegall,
	mgorman

On 07-12-22, 10:17, Lukasz Luba wrote:
> Prepare the code for optimizations and move the max CPU capacity
> into the sugov_policy struct. This will allow to leverage the fact that
> the max CPU capacity is the same for all CPUs in the frequency domain.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity
  2022-12-07 10:17 ` [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity Lukasz Luba
@ 2022-12-08  4:09   ` Viresh Kumar
  2022-12-08  8:43     ` Lukasz Luba
  2022-12-08  8:37   ` Vincent Guittot
  1 sibling, 1 reply; 11+ messages in thread
From: Viresh Kumar @ 2022-12-08  4:09 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, rafael, dietmar.eggemann,
	vincent.guittot, saravanak, wusamuel, isaacmanjarres,
	kernel-team, juri.lelli, peterz, mingo, rostedt, bsegall,
	mgorman

On 07-12-22, 10:17, Lukasz Luba wrote:
> The max CPU capacity is the same for all CPUs sharing frequency domain
> and thus 'policy' object. There is a way to avoid heavy operations
> in a loop for each CPU by leveraging this knowledge. Thus, simplify
> the looping code in the sugov_next_freq_shared() and drop heavy
> multiplications. Instead, use simple max() to get the highest utilization
> from these CPUs. This is useful for platforms with many (4 or 6) little
> CPUs.
> 
> The max CPU capacity must be fetched every time we are called, due to
> difficulties during the policy setup, where we are not able to get the
> normalized CPU capacity at the right time.
> 
> The stored value in sugov_policy::max is also than used in
> sugov_iowait_apply() to calculate the right boost. Thus, that field is
> useful to have in that sugov_policy struct.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)

Looks okay to me.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity
  2022-12-07 10:17 ` [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity Lukasz Luba
  2022-12-08  4:09   ` Viresh Kumar
@ 2022-12-08  8:37   ` Vincent Guittot
  2022-12-08 10:06     ` Lukasz Luba
  1 sibling, 1 reply; 11+ messages in thread
From: Vincent Guittot @ 2022-12-08  8:37 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, rafael, viresh.kumar, dietmar.eggemann,
	saravanak, wusamuel, isaacmanjarres, kernel-team, juri.lelli,
	peterz, mingo, rostedt, bsegall, mgorman

On Wed, 7 Dec 2022 at 11:17, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> The max CPU capacity is the same for all CPUs sharing frequency domain
> and thus 'policy' object. There is a way to avoid heavy operations
> in a loop for each CPU by leveraging this knowledge. Thus, simplify
> the looping code in the sugov_next_freq_shared() and drop heavy
> multiplications. Instead, use simple max() to get the highest utilization
> from these CPUs. This is useful for platforms with many (4 or 6) little
> CPUs.
>
> The max CPU capacity must be fetched every time we are called, due to
> difficulties during the policy setup, where we are not able to get the
> normalized CPU capacity at the right time.
>
> The stored value in sugov_policy::max is also than used in
> sugov_iowait_apply() to calculate the right boost. Thus, that field is
> useful to have in that sugov_policy struct.
>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index c19d6de67b7a..f9881f3d9488 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -158,10 +158,8 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>
>  static void sugov_get_util(struct sugov_cpu *sg_cpu)
>  {
> -       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>         struct rq *rq = cpu_rq(sg_cpu->cpu);
>
> -       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
>         sg_cpu->bw_dl = cpu_bw_dl(rq);
>         sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu),
>                                           FREQUENCY_UTIL, NULL);
> @@ -317,6 +315,8 @@ static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
>  static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
>                                               u64 time, unsigned int flags)
>  {
> +       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> +
>         sugov_iowait_boost(sg_cpu, time, flags);
>         sg_cpu->last_update = time;
>
> @@ -325,6 +325,9 @@ static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
>         if (!sugov_should_update_freq(sg_cpu->sg_policy, time))
>                 return false;
>
> +       /* Fetch the latest CPU capcity to avoid stale data */
> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> +
>         sugov_get_util(sg_cpu);
>         sugov_iowait_apply(sg_cpu, time);
>
> @@ -414,25 +417,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>  {
>         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>         struct cpufreq_policy *policy = sg_policy->policy;
> -       unsigned long util = 0, max = 1;
> +       unsigned long util = 0;
>         unsigned int j;
>
> +       /* Fetch the latest CPU capcity to avoid stale data */
> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> +
>         for_each_cpu(j, policy->cpus) {
>                 struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
> -               unsigned long j_util, j_max;
>
>                 sugov_get_util(j_sg_cpu);
>                 sugov_iowait_apply(j_sg_cpu, time);
> -               j_util = j_sg_cpu->util;
> -               j_max = j_sg_cpu->max;
>
> -               if (j_util * max > j_max * util) {
> -                       util = j_util;
> -                       max = j_max;
> -               }

With the code removed above, max is only used in 2 places:
- sugov_iowait_apply
- map_util_freq

I wonder if it would be better to just call arch_scale_cpu_capacity()
in these 2 places instead of saving a copy in sg_policy and then
reading it twice.

arch_scaleu_cpu_capacity is already a per_cpu variable so accessing it
should be pretty cheap.

Thought ?

> +               util = max(j_sg_cpu->util, util);
>         }
>
> -       return get_next_freq(sg_policy, util, max);
> +       return get_next_freq(sg_policy, util, sg_policy->max);
>  }
>
>  static void
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity
  2022-12-08  4:09   ` Viresh Kumar
@ 2022-12-08  8:43     ` Lukasz Luba
  0 siblings, 0 replies; 11+ messages in thread
From: Lukasz Luba @ 2022-12-08  8:43 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-pm, rafael, dietmar.eggemann,
	vincent.guittot, saravanak, wusamuel, isaacmanjarres,
	kernel-team, juri.lelli, peterz, mingo, rostedt, bsegall,
	mgorman



On 12/8/22 04:09, Viresh Kumar wrote:
> On 07-12-22, 10:17, Lukasz Luba wrote:
>> The max CPU capacity is the same for all CPUs sharing frequency domain
>> and thus 'policy' object. There is a way to avoid heavy operations
>> in a loop for each CPU by leveraging this knowledge. Thus, simplify
>> the looping code in the sugov_next_freq_shared() and drop heavy
>> multiplications. Instead, use simple max() to get the highest utilization
>> from these CPUs. This is useful for platforms with many (4 or 6) little
>> CPUs.
>>
>> The max CPU capacity must be fetched every time we are called, due to
>> difficulties during the policy setup, where we are not able to get the
>> normalized CPU capacity at the right time.
>>
>> The stored value in sugov_policy::max is also than used in
>> sugov_iowait_apply() to calculate the right boost. Thus, that field is
>> useful to have in that sugov_policy struct.
>>
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>> ---
>>   kernel/sched/cpufreq_schedutil.c | 22 +++++++++++-----------
>>   1 file changed, 11 insertions(+), 11 deletions(-)
> 
> Looks okay to me.
> 
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> 

Thank you Viresh!

As Rafael said, it will wait after 6.2-rc1 is out.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity
  2022-12-08  8:37   ` Vincent Guittot
@ 2022-12-08 10:06     ` Lukasz Luba
  2022-12-08 10:31       ` Vincent Guittot
  0 siblings, 1 reply; 11+ messages in thread
From: Lukasz Luba @ 2022-12-08 10:06 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, linux-pm, rafael, dietmar.eggemann, saravanak,
	wusamuel, isaacmanjarres, kernel-team, juri.lelli, peterz, mingo,
	rostedt, bsegall, mgorman, viresh.kumar



On 12/8/22 08:37, Vincent Guittot wrote:
> On Wed, 7 Dec 2022 at 11:17, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>> The max CPU capacity is the same for all CPUs sharing frequency domain
>> and thus 'policy' object. There is a way to avoid heavy operations
>> in a loop for each CPU by leveraging this knowledge. Thus, simplify
>> the looping code in the sugov_next_freq_shared() and drop heavy
>> multiplications. Instead, use simple max() to get the highest utilization
>> from these CPUs. This is useful for platforms with many (4 or 6) little
>> CPUs.
>>
>> The max CPU capacity must be fetched every time we are called, due to
>> difficulties during the policy setup, where we are not able to get the
>> normalized CPU capacity at the right time.
>>
>> The stored value in sugov_policy::max is also than used in
>> sugov_iowait_apply() to calculate the right boost. Thus, that field is
>> useful to have in that sugov_policy struct.
>>
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>> ---
>>   kernel/sched/cpufreq_schedutil.c | 22 +++++++++++-----------
>>   1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index c19d6de67b7a..f9881f3d9488 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -158,10 +158,8 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>>
>>   static void sugov_get_util(struct sugov_cpu *sg_cpu)
>>   {
>> -       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>>          struct rq *rq = cpu_rq(sg_cpu->cpu);
>>
>> -       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
>>          sg_cpu->bw_dl = cpu_bw_dl(rq);
>>          sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu),
>>                                            FREQUENCY_UTIL, NULL);
>> @@ -317,6 +315,8 @@ static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
>>   static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
>>                                                u64 time, unsigned int flags)
>>   {
>> +       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>> +
>>          sugov_iowait_boost(sg_cpu, time, flags);
>>          sg_cpu->last_update = time;
>>
>> @@ -325,6 +325,9 @@ static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
>>          if (!sugov_should_update_freq(sg_cpu->sg_policy, time))
>>                  return false;
>>
>> +       /* Fetch the latest CPU capcity to avoid stale data */
>> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
>> +
>>          sugov_get_util(sg_cpu);
>>          sugov_iowait_apply(sg_cpu, time);
>>
>> @@ -414,25 +417,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>>   {
>>          struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>>          struct cpufreq_policy *policy = sg_policy->policy;
>> -       unsigned long util = 0, max = 1;
>> +       unsigned long util = 0;
>>          unsigned int j;
>>
>> +       /* Fetch the latest CPU capcity to avoid stale data */
>> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
>> +
>>          for_each_cpu(j, policy->cpus) {
>>                  struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
>> -               unsigned long j_util, j_max;
>>
>>                  sugov_get_util(j_sg_cpu);
>>                  sugov_iowait_apply(j_sg_cpu, time);
>> -               j_util = j_sg_cpu->util;
>> -               j_max = j_sg_cpu->max;
>>
>> -               if (j_util * max > j_max * util) {
>> -                       util = j_util;
>> -                       max = j_max;
>> -               }
> 
> With the code removed above, max is only used in 2 places:
> - sugov_iowait_apply
> - map_util_freq
> 
> I wonder if it would be better to just call arch_scale_cpu_capacity()
> in these 2 places instead of saving a copy in sg_policy and then
> reading it twice.

The sugov_iowait_apply() is called in that loop, so probably I will
add a new argument to that call and just feed it with the capacity value
from one CPU, which was read before the loop. So, similarly what is in
this patch. Otherwise, all of those per-cpu capacity vars would be
accessed inside the sugov_iowait_apply() with sg_cpu->cpu.

> 
> arch_scaleu_cpu_capacity is already a per_cpu variable so accessing it
> should be pretty cheap.

Yes and no, as you said this is per-cpu variable and would access them
from one CPU, which is running that loop. They will have different pages
and addresses so cache lines on that CPU. to avoiding trashing a cache
lines on this running CPU let's read that capacity once, before the
loop. Let's use the new arg to pass that value via one of the
registers. In such, only one cache line would have to fetch that data
into.

So I thought this simple sg_policy->max would do the trick w/o a lot
of hassle.
> 
> Thought ?
> 

I can change that and drop the sg_policy->max and call differently
those capacity values. I will have to unfortunately drop Viresh's ACKs,
since this will be a way different code.

Thanks Vincent for the suggestion. Do you want me to go further with
such approach and send a v3?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity
  2022-12-08 10:06     ` Lukasz Luba
@ 2022-12-08 10:31       ` Vincent Guittot
  2022-12-08 10:56         ` Lukasz Luba
  0 siblings, 1 reply; 11+ messages in thread
From: Vincent Guittot @ 2022-12-08 10:31 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, rafael, dietmar.eggemann, saravanak,
	wusamuel, isaacmanjarres, kernel-team, juri.lelli, peterz, mingo,
	rostedt, bsegall, mgorman, viresh.kumar

On Thu, 8 Dec 2022 at 11:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 12/8/22 08:37, Vincent Guittot wrote:
> > On Wed, 7 Dec 2022 at 11:17, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >> The max CPU capacity is the same for all CPUs sharing frequency domain
> >> and thus 'policy' object. There is a way to avoid heavy operations
> >> in a loop for each CPU by leveraging this knowledge. Thus, simplify
> >> the looping code in the sugov_next_freq_shared() and drop heavy
> >> multiplications. Instead, use simple max() to get the highest utilization
> >> from these CPUs. This is useful for platforms with many (4 or 6) little
> >> CPUs.
> >>
> >> The max CPU capacity must be fetched every time we are called, due to
> >> difficulties during the policy setup, where we are not able to get the
> >> normalized CPU capacity at the right time.
> >>
> >> The stored value in sugov_policy::max is also than used in
> >> sugov_iowait_apply() to calculate the right boost. Thus, that field is
> >> useful to have in that sugov_policy struct.
> >>
> >> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> >> ---
> >>   kernel/sched/cpufreq_schedutil.c | 22 +++++++++++-----------
> >>   1 file changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> >> index c19d6de67b7a..f9881f3d9488 100644
> >> --- a/kernel/sched/cpufreq_schedutil.c
> >> +++ b/kernel/sched/cpufreq_schedutil.c
> >> @@ -158,10 +158,8 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >>
> >>   static void sugov_get_util(struct sugov_cpu *sg_cpu)
> >>   {
> >> -       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >>          struct rq *rq = cpu_rq(sg_cpu->cpu);
> >>
> >> -       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> >>          sg_cpu->bw_dl = cpu_bw_dl(rq);
> >>          sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu),
> >>                                            FREQUENCY_UTIL, NULL);
> >> @@ -317,6 +315,8 @@ static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
> >>   static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
> >>                                                u64 time, unsigned int flags)
> >>   {
> >> +       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >> +
> >>          sugov_iowait_boost(sg_cpu, time, flags);
> >>          sg_cpu->last_update = time;
> >>
> >> @@ -325,6 +325,9 @@ static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
> >>          if (!sugov_should_update_freq(sg_cpu->sg_policy, time))
> >>                  return false;
> >>
> >> +       /* Fetch the latest CPU capcity to avoid stale data */
> >> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> >> +
> >>          sugov_get_util(sg_cpu);
> >>          sugov_iowait_apply(sg_cpu, time);
> >>
> >> @@ -414,25 +417,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> >>   {
> >>          struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >>          struct cpufreq_policy *policy = sg_policy->policy;
> >> -       unsigned long util = 0, max = 1;
> >> +       unsigned long util = 0;
> >>          unsigned int j;
> >>
> >> +       /* Fetch the latest CPU capcity to avoid stale data */
> >> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> >> +
> >>          for_each_cpu(j, policy->cpus) {
> >>                  struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
> >> -               unsigned long j_util, j_max;
> >>
> >>                  sugov_get_util(j_sg_cpu);
> >>                  sugov_iowait_apply(j_sg_cpu, time);
> >> -               j_util = j_sg_cpu->util;
> >> -               j_max = j_sg_cpu->max;
> >>
> >> -               if (j_util * max > j_max * util) {
> >> -                       util = j_util;
> >> -                       max = j_max;
> >> -               }
> >
> > With the code removed above, max is only used in 2 places:
> > - sugov_iowait_apply
> > - map_util_freq
> >
> > I wonder if it would be better to just call arch_scale_cpu_capacity()
> > in these 2 places instead of saving a copy in sg_policy and then
> > reading it twice.
>
> The sugov_iowait_apply() is called in that loop, so probably I will
> add a new argument to that call and just feed it with the capacity value
> from one CPU, which was read before the loop. So, similarly what is in
> this patch. Otherwise, all of those per-cpu capacity vars would be
> accessed inside the sugov_iowait_apply() with sg_cpu->cpu.

Yes make sense

>
> >
> > arch_scaleu_cpu_capacity is already a per_cpu variable so accessing it
> > should be pretty cheap.
>
> Yes and no, as you said this is per-cpu variable and would access them
> from one CPU, which is running that loop. They will have different pages
> and addresses so cache lines on that CPU. to avoiding trashing a cache
> lines on this running CPU let's read that capacity once, before the
> loop. Let's use the new arg to pass that value via one of the
> registers. In such, only one cache line would have to fetch that data
> into.
>
> So I thought this simple sg_policy->max would do the trick w/o a lot
> of hassle.

For the shared mode, everything is located in sugov_next_freq_shared
so you don't need to save the max value with your proposal above to
change sugov_iowait_apply interface.

This should be doable as well for single mode

> >
> > Thought ?
> >
>
> I can change that and drop the sg_policy->max and call differently
> those capacity values. I will have to unfortunately drop Viresh's ACKs,
> since this will be a way different code.
>
> Thanks Vincent for the suggestion. Do you want me to go further with
> such approach and send a v3?

Don't know what Rafael and Viresh think but it seems that we don't
need to save the return of arch_scale_cpu_capacity in ->max field but
directly use it

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity
  2022-12-08 10:31       ` Vincent Guittot
@ 2022-12-08 10:56         ` Lukasz Luba
  2022-12-08 13:20           ` Vincent Guittot
  0 siblings, 1 reply; 11+ messages in thread
From: Lukasz Luba @ 2022-12-08 10:56 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, linux-pm, rafael, dietmar.eggemann, saravanak,
	wusamuel, isaacmanjarres, kernel-team, juri.lelli, peterz, mingo,
	rostedt, bsegall, mgorman, viresh.kumar



On 12/8/22 10:31, Vincent Guittot wrote:
> On Thu, 8 Dec 2022 at 11:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 12/8/22 08:37, Vincent Guittot wrote:
>>> On Wed, 7 Dec 2022 at 11:17, Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>> The max CPU capacity is the same for all CPUs sharing frequency domain
>>>> and thus 'policy' object. There is a way to avoid heavy operations
>>>> in a loop for each CPU by leveraging this knowledge. Thus, simplify
>>>> the looping code in the sugov_next_freq_shared() and drop heavy
>>>> multiplications. Instead, use simple max() to get the highest utilization
>>>> from these CPUs. This is useful for platforms with many (4 or 6) little
>>>> CPUs.
>>>>
>>>> The max CPU capacity must be fetched every time we are called, due to
>>>> difficulties during the policy setup, where we are not able to get the
>>>> normalized CPU capacity at the right time.
>>>>
>>>> The stored value in sugov_policy::max is also than used in
>>>> sugov_iowait_apply() to calculate the right boost. Thus, that field is
>>>> useful to have in that sugov_policy struct.
>>>>
>>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>>>> ---
>>>>    kernel/sched/cpufreq_schedutil.c | 22 +++++++++++-----------
>>>>    1 file changed, 11 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>>>> index c19d6de67b7a..f9881f3d9488 100644
>>>> --- a/kernel/sched/cpufreq_schedutil.c
>>>> +++ b/kernel/sched/cpufreq_schedutil.c
>>>> @@ -158,10 +158,8 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>>>>
>>>>    static void sugov_get_util(struct sugov_cpu *sg_cpu)
>>>>    {
>>>> -       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>>>>           struct rq *rq = cpu_rq(sg_cpu->cpu);
>>>>
>>>> -       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
>>>>           sg_cpu->bw_dl = cpu_bw_dl(rq);
>>>>           sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu),
>>>>                                             FREQUENCY_UTIL, NULL);
>>>> @@ -317,6 +315,8 @@ static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
>>>>    static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
>>>>                                                 u64 time, unsigned int flags)
>>>>    {
>>>> +       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>>>> +
>>>>           sugov_iowait_boost(sg_cpu, time, flags);
>>>>           sg_cpu->last_update = time;
>>>>
>>>> @@ -325,6 +325,9 @@ static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
>>>>           if (!sugov_should_update_freq(sg_cpu->sg_policy, time))
>>>>                   return false;
>>>>
>>>> +       /* Fetch the latest CPU capcity to avoid stale data */
>>>> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
>>>> +
>>>>           sugov_get_util(sg_cpu);
>>>>           sugov_iowait_apply(sg_cpu, time);
>>>>
>>>> @@ -414,25 +417,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>>>>    {
>>>>           struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>>>>           struct cpufreq_policy *policy = sg_policy->policy;
>>>> -       unsigned long util = 0, max = 1;
>>>> +       unsigned long util = 0;
>>>>           unsigned int j;
>>>>
>>>> +       /* Fetch the latest CPU capcity to avoid stale data */
>>>> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
>>>> +
>>>>           for_each_cpu(j, policy->cpus) {
>>>>                   struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
>>>> -               unsigned long j_util, j_max;
>>>>
>>>>                   sugov_get_util(j_sg_cpu);
>>>>                   sugov_iowait_apply(j_sg_cpu, time);
>>>> -               j_util = j_sg_cpu->util;
>>>> -               j_max = j_sg_cpu->max;
>>>>
>>>> -               if (j_util * max > j_max * util) {
>>>> -                       util = j_util;
>>>> -                       max = j_max;
>>>> -               }
>>>
>>> With the code removed above, max is only used in 2 places:
>>> - sugov_iowait_apply
>>> - map_util_freq
>>>
>>> I wonder if it would be better to just call arch_scale_cpu_capacity()
>>> in these 2 places instead of saving a copy in sg_policy and then
>>> reading it twice.
>>
>> The sugov_iowait_apply() is called in that loop, so probably I will
>> add a new argument to that call and just feed it with the capacity value
>> from one CPU, which was read before the loop. So, similarly what is in
>> this patch. Otherwise, all of those per-cpu capacity vars would be
>> accessed inside the sugov_iowait_apply() with sg_cpu->cpu.
> 
> Yes make sense
> 
>>
>>>
>>> arch_scaleu_cpu_capacity is already a per_cpu variable so accessing it
>>> should be pretty cheap.
>>
>> Yes and no, as you said this is per-cpu variable and would access them
>> from one CPU, which is running that loop. They will have different pages
>> and addresses so cache lines on that CPU. to avoiding trashing a cache
>> lines on this running CPU let's read that capacity once, before the
>> loop. Let's use the new arg to pass that value via one of the
>> registers. In such, only one cache line would have to fetch that data
>> into.
>>
>> So I thought this simple sg_policy->max would do the trick w/o a lot
>> of hassle.
> 
> For the shared mode, everything is located in sugov_next_freq_shared
> so you don't need to save the max value with your proposal above to
> change sugov_iowait_apply interface.
> 
> This should be doable as well for single mode
> 
>>>
>>> Thought ?
>>>
>>
>> I can change that and drop the sg_policy->max and call differently
>> those capacity values. I will have to unfortunately drop Viresh's ACKs,
>> since this will be a way different code.
>>
>> Thanks Vincent for the suggestion. Do you want me to go further with
>> such approach and send a v3?
> 
> Don't know what Rafael and Viresh think but it seems that we don't
> need to save the return of arch_scale_cpu_capacity in ->max field but
> directly use it

Yes I agree, we don't need to, but I will have to modify a few function
calls and args.

So IMO we have agreed. I won't call the call arch_scale_cpu_capacity()
in these 2 places, but I will make it with the local var and data
fetched as little as possible.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity
  2022-12-08 10:56         ` Lukasz Luba
@ 2022-12-08 13:20           ` Vincent Guittot
  0 siblings, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2022-12-08 13:20 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, rafael, dietmar.eggemann, saravanak,
	wusamuel, isaacmanjarres, kernel-team, juri.lelli, peterz, mingo,
	rostedt, bsegall, mgorman, viresh.kumar

On Thu, 8 Dec 2022 at 11:56, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 12/8/22 10:31, Vincent Guittot wrote:
> > On Thu, 8 Dec 2022 at 11:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >>
> >>
> >> On 12/8/22 08:37, Vincent Guittot wrote:
> >>> On Wed, 7 Dec 2022 at 11:17, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>
> >>>> The max CPU capacity is the same for all CPUs sharing frequency domain
> >>>> and thus 'policy' object. There is a way to avoid heavy operations
> >>>> in a loop for each CPU by leveraging this knowledge. Thus, simplify
> >>>> the looping code in the sugov_next_freq_shared() and drop heavy
> >>>> multiplications. Instead, use simple max() to get the highest utilization
> >>>> from these CPUs. This is useful for platforms with many (4 or 6) little
> >>>> CPUs.
> >>>>
> >>>> The max CPU capacity must be fetched every time we are called, due to
> >>>> difficulties during the policy setup, where we are not able to get the
> >>>> normalized CPU capacity at the right time.
> >>>>
> >>>> The stored value in sugov_policy::max is also than used in
> >>>> sugov_iowait_apply() to calculate the right boost. Thus, that field is
> >>>> useful to have in that sugov_policy struct.
> >>>>
> >>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> >>>> ---
> >>>>    kernel/sched/cpufreq_schedutil.c | 22 +++++++++++-----------
> >>>>    1 file changed, 11 insertions(+), 11 deletions(-)
> >>>>
> >>>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> >>>> index c19d6de67b7a..f9881f3d9488 100644
> >>>> --- a/kernel/sched/cpufreq_schedutil.c
> >>>> +++ b/kernel/sched/cpufreq_schedutil.c
> >>>> @@ -158,10 +158,8 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >>>>
> >>>>    static void sugov_get_util(struct sugov_cpu *sg_cpu)
> >>>>    {
> >>>> -       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >>>>           struct rq *rq = cpu_rq(sg_cpu->cpu);
> >>>>
> >>>> -       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> >>>>           sg_cpu->bw_dl = cpu_bw_dl(rq);
> >>>>           sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu),
> >>>>                                             FREQUENCY_UTIL, NULL);
> >>>> @@ -317,6 +315,8 @@ static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
> >>>>    static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
> >>>>                                                 u64 time, unsigned int flags)
> >>>>    {
> >>>> +       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >>>> +
> >>>>           sugov_iowait_boost(sg_cpu, time, flags);
> >>>>           sg_cpu->last_update = time;
> >>>>
> >>>> @@ -325,6 +325,9 @@ static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
> >>>>           if (!sugov_should_update_freq(sg_cpu->sg_policy, time))
> >>>>                   return false;
> >>>>
> >>>> +       /* Fetch the latest CPU capcity to avoid stale data */
> >>>> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> >>>> +
> >>>>           sugov_get_util(sg_cpu);
> >>>>           sugov_iowait_apply(sg_cpu, time);
> >>>>
> >>>> @@ -414,25 +417,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> >>>>    {
> >>>>           struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >>>>           struct cpufreq_policy *policy = sg_policy->policy;
> >>>> -       unsigned long util = 0, max = 1;
> >>>> +       unsigned long util = 0;
> >>>>           unsigned int j;
> >>>>
> >>>> +       /* Fetch the latest CPU capcity to avoid stale data */
> >>>> +       sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> >>>> +
> >>>>           for_each_cpu(j, policy->cpus) {
> >>>>                   struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
> >>>> -               unsigned long j_util, j_max;
> >>>>
> >>>>                   sugov_get_util(j_sg_cpu);
> >>>>                   sugov_iowait_apply(j_sg_cpu, time);
> >>>> -               j_util = j_sg_cpu->util;
> >>>> -               j_max = j_sg_cpu->max;
> >>>>
> >>>> -               if (j_util * max > j_max * util) {
> >>>> -                       util = j_util;
> >>>> -                       max = j_max;
> >>>> -               }
> >>>
> >>> With the code removed above, max is only used in 2 places:
> >>> - sugov_iowait_apply
> >>> - map_util_freq
> >>>
> >>> I wonder if it would be better to just call arch_scale_cpu_capacity()
> >>> in these 2 places instead of saving a copy in sg_policy and then
> >>> reading it twice.
> >>
> >> The sugov_iowait_apply() is called in that loop, so probably I will
> >> add a new argument to that call and just feed it with the capacity value
> >> from one CPU, which was read before the loop. So, similarly what is in
> >> this patch. Otherwise, all of those per-cpu capacity vars would be
> >> accessed inside the sugov_iowait_apply() with sg_cpu->cpu.
> >
> > Yes make sense
> >
> >>
> >>>
> >>> arch_scaleu_cpu_capacity is already a per_cpu variable so accessing it
> >>> should be pretty cheap.
> >>
> >> Yes and no, as you said this is per-cpu variable and would access them
> >> from one CPU, which is running that loop. They will have different pages
> >> and addresses so cache lines on that CPU. to avoiding trashing a cache
> >> lines on this running CPU let's read that capacity once, before the
> >> loop. Let's use the new arg to pass that value via one of the
> >> registers. In such, only one cache line would have to fetch that data
> >> into.
> >>
> >> So I thought this simple sg_policy->max would do the trick w/o a lot
> >> of hassle.
> >
> > For the shared mode, everything is located in sugov_next_freq_shared
> > so you don't need to save the max value with your proposal above to
> > change sugov_iowait_apply interface.
> >
> > This should be doable as well for single mode
> >
> >>>
> >>> Thought ?
> >>>
> >>
> >> I can change that and drop the sg_policy->max and call differently
> >> those capacity values. I will have to unfortunately drop Viresh's ACKs,
> >> since this will be a way different code.
> >>
> >> Thanks Vincent for the suggestion. Do you want me to go further with
> >> such approach and send a v3?
> >
> > Don't know what Rafael and Viresh think but it seems that we don't
> > need to save the return of arch_scale_cpu_capacity in ->max field but
> > directly use it
>
> Yes I agree, we don't need to, but I will have to modify a few function
> calls and args.
>
> So IMO we have agreed. I won't call the call arch_scale_cpu_capacity()
> in these 2 places, but I will make it with the local var and data
> fetched as little as possible.

yes

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-12-08 13:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-07 10:17 [PATCH v2 0/2] cpufreq: schedutil: Optimize operations in hot path frequency switch Lukasz Luba
2022-12-07 10:17 ` [PATCH v2 1/2] cpufreq: schedutil: Introduce single max CPU capacity for freqency domain Lukasz Luba
2022-12-08  3:57   ` Viresh Kumar
2022-12-07 10:17 ` [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity Lukasz Luba
2022-12-08  4:09   ` Viresh Kumar
2022-12-08  8:43     ` Lukasz Luba
2022-12-08  8:37   ` Vincent Guittot
2022-12-08 10:06     ` Lukasz Luba
2022-12-08 10:31       ` Vincent Guittot
2022-12-08 10:56         ` Lukasz Luba
2022-12-08 13:20           ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).