All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] uclamp_max vs schedutil fixes
@ 2021-12-16 22:53 Qais Yousef
  2021-12-16 22:53 ` [PATCH 1/2] sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max Qais Yousef
  2021-12-16 22:53 ` [PATCH 2/2] sched/uclamp: Fix iowait boost escaping uclamp restriction Qais Yousef
  0 siblings, 2 replies; 8+ messages in thread
From: Qais Yousef @ 2021-12-16 22:53 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Peter Zijlstra (Intel), Ingo Molnar
  Cc: Dietmar Eggemann, Vincent Guittot, Beata Michalska,
	Ionela Voinescu, linux-pm, linux-kernel, Qais Yousef

On systems that use sugov_update_single_{freq, perf}(), uclamp_max was
ineffective because of 'busy' filter which ignores requests to change frequency
if there's no idle time. A condition that is not true if uclamp is being used
on this system.

Similarly, io heavy tasks that are capped by uclamp_max can still obtain higher
frequencies because sugov_iowait_apply() doesn't clamp the boost with
uclamp_rq_util_with().

Patches 1 and 2 address these 2 problems.

Thanks!

--
Qais Yousef

Qais Yousef (2):
  sched/sugov: Ignore 'busy' filter when uclamp_is_used()
  sched/uclamp: Fix iowait boost escaping uclamp restriction

 kernel/sched/cpufreq_schedutil.c |  11 ++-
 kernel/sched/sched.h             | 139 +++++++++++++++++--------------
 2 files changed, 87 insertions(+), 63 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max
  2021-12-16 22:53 [PATCH 0/2] uclamp_max vs schedutil fixes Qais Yousef
@ 2021-12-16 22:53 ` Qais Yousef
  2021-12-17 15:54   ` Rafael J. Wysocki
  2022-01-28  7:40   ` [tip: sched/core] " tip-bot2 for Qais Yousef
  2021-12-16 22:53 ` [PATCH 2/2] sched/uclamp: Fix iowait boost escaping uclamp restriction Qais Yousef
  1 sibling, 2 replies; 8+ messages in thread
From: Qais Yousef @ 2021-12-16 22:53 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Peter Zijlstra (Intel), Ingo Molnar
  Cc: Dietmar Eggemann, Vincent Guittot, Beata Michalska,
	Ionela Voinescu, linux-pm, linux-kernel, Qais Yousef

sugov_update_single_{freq, perf}() contains a 'busy' filter that ensures
we don't bring the frqeuency down if there's no idle time (CPU is busy).

The problem is that with uclamp_max we will have scenarios where a busy
task is capped to run at a lower frequency and this filter prevents
applying the capping when this task starts running.

We handle this by skipping the filter when uclamp is enabled and the rq
is being capped by uclamp_max.

We introduce a new function uclamp_rq_is_capped() to help detecting when
this capping is taking effect. Some code shuffling was required to allow
using cpu_util_{cfs, rt}() in this new function.

On 2 Core SMT2 Intel laptop I see:

Without this patch:

	uclampset -M 0 sysbench --test=cpu --threads = 4 run

produces a score of ~3200 consistently. Which is the highest possible.

Compiling the kernel also results in frequency running at max 3.1GHz all
the time - running uclampset -M 400 to cap it has no effect without this
patch.

With this patch:

	uclampset -M 0 sysbench --test=cpu --threads = 4 run

produces a score of ~1100 with some outliers in ~1700. Uclamp max
aggregates the performance requirements, so having high values sometimes
is expected if some other task happens to require that frequency starts
running at the same time.

When compiling the kernel with uclampset -M 400 I can see the
frequencies mostly in the ~2GHz region. Helpful to conserve power and
prevent heating when not plugged in.

Fixes: 982d9cdc22c9 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---

I haven't dug much into the busy filter, but I assume it is something that is
still required right? If there's a better alternative we can take to make this
filter better instead, I'm happy to hear ideas. Otherwise hopefully this
proposal is logical too.

uclampset is a tool available in util-linux v2.37.2

 kernel/sched/cpufreq_schedutil.c |  10 ++-
 kernel/sched/sched.h             | 139 +++++++++++++++++--------------
 2 files changed, 86 insertions(+), 63 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e7af18857371..48327970552a 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -348,8 +348,11 @@ static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
 	/*
 	 * Do not reduce the frequency if the CPU has not been idle
 	 * recently, as the reduction is likely to be premature then.
+	 *
+	 * Except when the rq is capped by uclamp_max.
 	 */
-	if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
+	if (!uclamp_rq_is_capped(cpu_rq(sg_cpu->cpu)) &&
+	    sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
 		next_f = sg_policy->next_freq;
 
 		/* Restore cached freq as next_freq has changed */
@@ -395,8 +398,11 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
 	/*
 	 * Do not reduce the target performance level if the CPU has not been
 	 * idle recently, as the reduction is likely to be premature then.
+	 *
+	 * Except when the rq is capped by uclamp_max.
 	 */
-	if (sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util)
+	if (!uclamp_rq_is_capped(cpu_rq(sg_cpu->cpu)) &&
+	    sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util)
 		sg_cpu->util = prev_util;
 
 	cpufreq_driver_adjust_perf(sg_cpu->cpu, map_util_perf(sg_cpu->bw_dl),
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index eb971151e7e4..294ebc22413c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2841,6 +2841,67 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
 static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
 #endif /* CONFIG_CPU_FREQ */
 
+#ifdef arch_scale_freq_capacity
+# ifndef arch_scale_freq_invariant
+#  define arch_scale_freq_invariant()	true
+# endif
+#else
+# define arch_scale_freq_invariant()	false
+#endif
+
+#ifdef CONFIG_SMP
+static inline unsigned long capacity_orig_of(int cpu)
+{
+	return cpu_rq(cpu)->cpu_capacity_orig;
+}
+
+/**
+ * enum cpu_util_type - CPU utilization type
+ * @FREQUENCY_UTIL:	Utilization used to select frequency
+ * @ENERGY_UTIL:	Utilization used during energy calculation
+ *
+ * The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time
+ * need to be aggregated differently depending on the usage made of them. This
+ * enum is used within effective_cpu_util() to differentiate the types of
+ * utilization expected by the callers, and adjust the aggregation accordingly.
+ */
+enum cpu_util_type {
+	FREQUENCY_UTIL,
+	ENERGY_UTIL,
+};
+
+unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
+				 unsigned long max, enum cpu_util_type type,
+				 struct task_struct *p);
+
+static inline unsigned long cpu_bw_dl(struct rq *rq)
+{
+	return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
+}
+
+static inline unsigned long cpu_util_dl(struct rq *rq)
+{
+	return READ_ONCE(rq->avg_dl.util_avg);
+}
+
+static inline unsigned long cpu_util_cfs(struct rq *rq)
+{
+	unsigned long util = READ_ONCE(rq->cfs.avg.util_avg);
+
+	if (sched_feat(UTIL_EST)) {
+		util = max_t(unsigned long, util,
+			     READ_ONCE(rq->cfs.avg.util_est.enqueued));
+	}
+
+	return util;
+}
+
+static inline unsigned long cpu_util_rt(struct rq *rq)
+{
+	return READ_ONCE(rq->avg_rt.util_avg);
+}
+#endif
+
 #ifdef CONFIG_UCLAMP_TASK
 unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
 
@@ -2897,6 +2958,21 @@ unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
 	return clamp(util, min_util, max_util);
 }
 
+/* Is the rq being capped/throttled by uclamp_max? */
+static inline bool uclamp_rq_is_capped(struct rq *rq)
+{
+	unsigned long rq_util;
+	unsigned long max_util;
+
+	if (!static_branch_likely(&sched_uclamp_used))
+		return false;
+
+	rq_util = cpu_util_cfs(rq) + cpu_util_rt(rq);
+	max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
+
+	return max_util != SCHED_CAPACITY_SCALE && rq_util >= max_util;
+}
+
 /*
  * When uclamp is compiled in, the aggregation at rq level is 'turned off'
  * by default in the fast path and only gets turned on once userspace performs
@@ -2917,73 +2993,14 @@ unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
 	return util;
 }
 
+static inline bool uclamp_rq_is_capped(struct rq *rq) { return false; }
+
 static inline bool uclamp_is_used(void)
 {
 	return false;
 }
 #endif /* CONFIG_UCLAMP_TASK */
 
-#ifdef arch_scale_freq_capacity
-# ifndef arch_scale_freq_invariant
-#  define arch_scale_freq_invariant()	true
-# endif
-#else
-# define arch_scale_freq_invariant()	false
-#endif
-
-#ifdef CONFIG_SMP
-static inline unsigned long capacity_orig_of(int cpu)
-{
-	return cpu_rq(cpu)->cpu_capacity_orig;
-}
-
-/**
- * enum cpu_util_type - CPU utilization type
- * @FREQUENCY_UTIL:	Utilization used to select frequency
- * @ENERGY_UTIL:	Utilization used during energy calculation
- *
- * The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time
- * need to be aggregated differently depending on the usage made of them. This
- * enum is used within effective_cpu_util() to differentiate the types of
- * utilization expected by the callers, and adjust the aggregation accordingly.
- */
-enum cpu_util_type {
-	FREQUENCY_UTIL,
-	ENERGY_UTIL,
-};
-
-unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
-				 unsigned long max, enum cpu_util_type type,
-				 struct task_struct *p);
-
-static inline unsigned long cpu_bw_dl(struct rq *rq)
-{
-	return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
-}
-
-static inline unsigned long cpu_util_dl(struct rq *rq)
-{
-	return READ_ONCE(rq->avg_dl.util_avg);
-}
-
-static inline unsigned long cpu_util_cfs(struct rq *rq)
-{
-	unsigned long util = READ_ONCE(rq->cfs.avg.util_avg);
-
-	if (sched_feat(UTIL_EST)) {
-		util = max_t(unsigned long, util,
-			     READ_ONCE(rq->cfs.avg.util_est.enqueued));
-	}
-
-	return util;
-}
-
-static inline unsigned long cpu_util_rt(struct rq *rq)
-{
-	return READ_ONCE(rq->avg_rt.util_avg);
-}
-#endif
-
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 static inline unsigned long cpu_util_irq(struct rq *rq)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] sched/uclamp: Fix iowait boost escaping uclamp restriction
  2021-12-16 22:53 [PATCH 0/2] uclamp_max vs schedutil fixes Qais Yousef
  2021-12-16 22:53 ` [PATCH 1/2] sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max Qais Yousef
@ 2021-12-16 22:53 ` Qais Yousef
  2021-12-17 15:56   ` Rafael J. Wysocki
  2022-01-28  7:40   ` [tip: sched/core] " tip-bot2 for Qais Yousef
  1 sibling, 2 replies; 8+ messages in thread
From: Qais Yousef @ 2021-12-16 22:53 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Peter Zijlstra (Intel), Ingo Molnar
  Cc: Dietmar Eggemann, Vincent Guittot, Beata Michalska,
	Ionela Voinescu, linux-pm, linux-kernel, Qais Yousef

iowait_boost signal is applied independently of util and doesn't take
into account uclamp settings of the rq. An io heavy task that is capped
by uclamp_max could still request higher frequency because
sugov_iowait_apply() doesn't clamp the boost via uclamp_rq_util_with()
like effective_cpu_util() does.

Make sure that iowait_boost honours uclamp requests by calling
uclamp_rq_util_with() when applying the boost.

Fixes: 982d9cdc22c9 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 48327970552a..93dcea233c65 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -289,6 +289,7 @@ static void sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time)
 	 * into the same scale so we can compare.
 	 */
 	boost = (sg_cpu->iowait_boost * sg_cpu->max) >> SCHED_CAPACITY_SHIFT;
+	boost = uclamp_rq_util_with(cpu_rq(sg_cpu->cpu), boost, NULL);
 	if (sg_cpu->util < boost)
 		sg_cpu->util = boost;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max
  2021-12-16 22:53 ` [PATCH 1/2] sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max Qais Yousef
@ 2021-12-17 15:54   ` Rafael J. Wysocki
  2021-12-20 10:48     ` Qais Yousef
  2022-01-28  7:40   ` [tip: sched/core] " tip-bot2 for Qais Yousef
  1 sibling, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2021-12-17 15:54 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Rafael J. Wysocki, Viresh Kumar, Peter Zijlstra (Intel),
	Ingo Molnar, Dietmar Eggemann, Vincent Guittot, Beata Michalska,
	Ionela Voinescu, Linux PM, Linux Kernel Mailing List

On Thu, Dec 16, 2021 at 11:53 PM Qais Yousef <qais.yousef@arm.com> wrote:
>
> sugov_update_single_{freq, perf}() contains a 'busy' filter that ensures
> we don't bring the frqeuency down if there's no idle time (CPU is busy).
>
> The problem is that with uclamp_max we will have scenarios where a busy
> task is capped to run at a lower frequency and this filter prevents
> applying the capping when this task starts running.
>
> We handle this by skipping the filter when uclamp is enabled and the rq
> is being capped by uclamp_max.
>
> We introduce a new function uclamp_rq_is_capped() to help detecting when
> this capping is taking effect. Some code shuffling was required to allow
> using cpu_util_{cfs, rt}() in this new function.
>
> On 2 Core SMT2 Intel laptop I see:
>
> Without this patch:
>
>         uclampset -M 0 sysbench --test=cpu --threads = 4 run
>
> produces a score of ~3200 consistently. Which is the highest possible.
>
> Compiling the kernel also results in frequency running at max 3.1GHz all
> the time - running uclampset -M 400 to cap it has no effect without this
> patch.
>
> With this patch:
>
>         uclampset -M 0 sysbench --test=cpu --threads = 4 run
>
> produces a score of ~1100 with some outliers in ~1700. Uclamp max
> aggregates the performance requirements, so having high values sometimes
> is expected if some other task happens to require that frequency starts
> running at the same time.
>
> When compiling the kernel with uclampset -M 400 I can see the
> frequencies mostly in the ~2GHz region. Helpful to conserve power and
> prevent heating when not plugged in.
>
> Fixes: 982d9cdc22c9 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
>
> I haven't dug much into the busy filter, but I assume it is something that is
> still required right?

It is AFAICS.

> If there's a better alternative we can take to make this
> filter better instead, I'm happy to hear ideas. Otherwise hopefully this
> proposal is logical too.

It looks reasonable to me.

For the schedutil changes:

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> uclampset is a tool available in util-linux v2.37.2
>
>  kernel/sched/cpufreq_schedutil.c |  10 ++-
>  kernel/sched/sched.h             | 139 +++++++++++++++++--------------
>  2 files changed, 86 insertions(+), 63 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e7af18857371..48327970552a 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -348,8 +348,11 @@ static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
>         /*
>          * Do not reduce the frequency if the CPU has not been idle
>          * recently, as the reduction is likely to be premature then.
> +        *
> +        * Except when the rq is capped by uclamp_max.
>          */
> -       if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
> +       if (!uclamp_rq_is_capped(cpu_rq(sg_cpu->cpu)) &&
> +           sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
>                 next_f = sg_policy->next_freq;
>
>                 /* Restore cached freq as next_freq has changed */
> @@ -395,8 +398,11 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
>         /*
>          * Do not reduce the target performance level if the CPU has not been
>          * idle recently, as the reduction is likely to be premature then.
> +        *
> +        * Except when the rq is capped by uclamp_max.
>          */
> -       if (sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util)
> +       if (!uclamp_rq_is_capped(cpu_rq(sg_cpu->cpu)) &&
> +           sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util)
>                 sg_cpu->util = prev_util;
>
>         cpufreq_driver_adjust_perf(sg_cpu->cpu, map_util_perf(sg_cpu->bw_dl),
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index eb971151e7e4..294ebc22413c 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2841,6 +2841,67 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
>  static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
>  #endif /* CONFIG_CPU_FREQ */
>
> +#ifdef arch_scale_freq_capacity
> +# ifndef arch_scale_freq_invariant
> +#  define arch_scale_freq_invariant()  true
> +# endif
> +#else
> +# define arch_scale_freq_invariant()   false
> +#endif
> +
> +#ifdef CONFIG_SMP
> +static inline unsigned long capacity_orig_of(int cpu)
> +{
> +       return cpu_rq(cpu)->cpu_capacity_orig;
> +}
> +
> +/**
> + * enum cpu_util_type - CPU utilization type
> + * @FREQUENCY_UTIL:    Utilization used to select frequency
> + * @ENERGY_UTIL:       Utilization used during energy calculation
> + *
> + * The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time
> + * need to be aggregated differently depending on the usage made of them. This
> + * enum is used within effective_cpu_util() to differentiate the types of
> + * utilization expected by the callers, and adjust the aggregation accordingly.
> + */
> +enum cpu_util_type {
> +       FREQUENCY_UTIL,
> +       ENERGY_UTIL,
> +};
> +
> +unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> +                                unsigned long max, enum cpu_util_type type,
> +                                struct task_struct *p);
> +
> +static inline unsigned long cpu_bw_dl(struct rq *rq)
> +{
> +       return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> +}
> +
> +static inline unsigned long cpu_util_dl(struct rq *rq)
> +{
> +       return READ_ONCE(rq->avg_dl.util_avg);
> +}
> +
> +static inline unsigned long cpu_util_cfs(struct rq *rq)
> +{
> +       unsigned long util = READ_ONCE(rq->cfs.avg.util_avg);
> +
> +       if (sched_feat(UTIL_EST)) {
> +               util = max_t(unsigned long, util,
> +                            READ_ONCE(rq->cfs.avg.util_est.enqueued));
> +       }
> +
> +       return util;
> +}
> +
> +static inline unsigned long cpu_util_rt(struct rq *rq)
> +{
> +       return READ_ONCE(rq->avg_rt.util_avg);
> +}
> +#endif
> +
>  #ifdef CONFIG_UCLAMP_TASK
>  unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
>
> @@ -2897,6 +2958,21 @@ unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
>         return clamp(util, min_util, max_util);
>  }
>
> +/* Is the rq being capped/throttled by uclamp_max? */
> +static inline bool uclamp_rq_is_capped(struct rq *rq)
> +{
> +       unsigned long rq_util;
> +       unsigned long max_util;
> +
> +       if (!static_branch_likely(&sched_uclamp_used))
> +               return false;
> +
> +       rq_util = cpu_util_cfs(rq) + cpu_util_rt(rq);
> +       max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
> +
> +       return max_util != SCHED_CAPACITY_SCALE && rq_util >= max_util;
> +}
> +
>  /*
>   * When uclamp is compiled in, the aggregation at rq level is 'turned off'
>   * by default in the fast path and only gets turned on once userspace performs
> @@ -2917,73 +2993,14 @@ unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
>         return util;
>  }
>
> +static inline bool uclamp_rq_is_capped(struct rq *rq) { return false; }
> +
>  static inline bool uclamp_is_used(void)
>  {
>         return false;
>  }
>  #endif /* CONFIG_UCLAMP_TASK */
>
> -#ifdef arch_scale_freq_capacity
> -# ifndef arch_scale_freq_invariant
> -#  define arch_scale_freq_invariant()  true
> -# endif
> -#else
> -# define arch_scale_freq_invariant()   false
> -#endif
> -
> -#ifdef CONFIG_SMP
> -static inline unsigned long capacity_orig_of(int cpu)
> -{
> -       return cpu_rq(cpu)->cpu_capacity_orig;
> -}
> -
> -/**
> - * enum cpu_util_type - CPU utilization type
> - * @FREQUENCY_UTIL:    Utilization used to select frequency
> - * @ENERGY_UTIL:       Utilization used during energy calculation
> - *
> - * The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time
> - * need to be aggregated differently depending on the usage made of them. This
> - * enum is used within effective_cpu_util() to differentiate the types of
> - * utilization expected by the callers, and adjust the aggregation accordingly.
> - */
> -enum cpu_util_type {
> -       FREQUENCY_UTIL,
> -       ENERGY_UTIL,
> -};
> -
> -unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> -                                unsigned long max, enum cpu_util_type type,
> -                                struct task_struct *p);
> -
> -static inline unsigned long cpu_bw_dl(struct rq *rq)
> -{
> -       return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> -}
> -
> -static inline unsigned long cpu_util_dl(struct rq *rq)
> -{
> -       return READ_ONCE(rq->avg_dl.util_avg);
> -}
> -
> -static inline unsigned long cpu_util_cfs(struct rq *rq)
> -{
> -       unsigned long util = READ_ONCE(rq->cfs.avg.util_avg);
> -
> -       if (sched_feat(UTIL_EST)) {
> -               util = max_t(unsigned long, util,
> -                            READ_ONCE(rq->cfs.avg.util_est.enqueued));
> -       }
> -
> -       return util;
> -}
> -
> -static inline unsigned long cpu_util_rt(struct rq *rq)
> -{
> -       return READ_ONCE(rq->avg_rt.util_avg);
> -}
> -#endif
> -
>  #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
>  static inline unsigned long cpu_util_irq(struct rq *rq)
>  {
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Fix iowait boost escaping uclamp restriction
  2021-12-16 22:53 ` [PATCH 2/2] sched/uclamp: Fix iowait boost escaping uclamp restriction Qais Yousef
@ 2021-12-17 15:56   ` Rafael J. Wysocki
  2022-01-28  7:40   ` [tip: sched/core] " tip-bot2 for Qais Yousef
  1 sibling, 0 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2021-12-17 15:56 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Rafael J. Wysocki, Viresh Kumar, Peter Zijlstra (Intel),
	Ingo Molnar, Dietmar Eggemann, Vincent Guittot, Beata Michalska,
	Ionela Voinescu, Linux PM, Linux Kernel Mailing List

On Thu, Dec 16, 2021 at 11:53 PM Qais Yousef <qais.yousef@arm.com> wrote:
>
> iowait_boost signal is applied independently of util and doesn't take
> into account uclamp settings of the rq. An io heavy task that is capped
> by uclamp_max could still request higher frequency because
> sugov_iowait_apply() doesn't clamp the boost via uclamp_rq_util_with()
> like effective_cpu_util() does.
>
> Make sure that iowait_boost honours uclamp requests by calling
> uclamp_rq_util_with() when applying the boost.
>
> Fixes: 982d9cdc22c9 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  kernel/sched/cpufreq_schedutil.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 48327970552a..93dcea233c65 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -289,6 +289,7 @@ static void sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time)
>          * into the same scale so we can compare.
>          */
>         boost = (sg_cpu->iowait_boost * sg_cpu->max) >> SCHED_CAPACITY_SHIFT;
> +       boost = uclamp_rq_util_with(cpu_rq(sg_cpu->cpu), boost, NULL);
>         if (sg_cpu->util < boost)
>                 sg_cpu->util = boost;
>  }
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max
  2021-12-17 15:54   ` Rafael J. Wysocki
@ 2021-12-20 10:48     ` Qais Yousef
  0 siblings, 0 replies; 8+ messages in thread
From: Qais Yousef @ 2021-12-20 10:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Peter Zijlstra (Intel),
	Ingo Molnar, Dietmar Eggemann, Vincent Guittot, Beata Michalska,
	Ionela Voinescu, Linux PM, Linux Kernel Mailing List

On 12/17/21 16:54, Rafael J. Wysocki wrote:
> On Thu, Dec 16, 2021 at 11:53 PM Qais Yousef <qais.yousef@arm.com> wrote:
> >
> > sugov_update_single_{freq, perf}() contains a 'busy' filter that ensures
> > we don't bring the frqeuency down if there's no idle time (CPU is busy).
> >
> > The problem is that with uclamp_max we will have scenarios where a busy
> > task is capped to run at a lower frequency and this filter prevents
> > applying the capping when this task starts running.
> >
> > We handle this by skipping the filter when uclamp is enabled and the rq
> > is being capped by uclamp_max.
> >
> > We introduce a new function uclamp_rq_is_capped() to help detecting when
> > this capping is taking effect. Some code shuffling was required to allow
> > using cpu_util_{cfs, rt}() in this new function.
> >
> > On 2 Core SMT2 Intel laptop I see:
> >
> > Without this patch:
> >
> >         uclampset -M 0 sysbench --test=cpu --threads = 4 run
> >
> > produces a score of ~3200 consistently. Which is the highest possible.
> >
> > Compiling the kernel also results in frequency running at max 3.1GHz all
> > the time - running uclampset -M 400 to cap it has no effect without this
> > patch.
> >
> > With this patch:
> >
> >         uclampset -M 0 sysbench --test=cpu --threads = 4 run
> >
> > produces a score of ~1100 with some outliers in ~1700. Uclamp max
> > aggregates the performance requirements, so having high values sometimes
> > is expected if some other task happens to require that frequency starts
> > running at the same time.
> >
> > When compiling the kernel with uclampset -M 400 I can see the
> > frequencies mostly in the ~2GHz region. Helpful to conserve power and
> > prevent heating when not plugged in.
> >
> > Fixes: 982d9cdc22c9 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > ---
> >
> > I haven't dug much into the busy filter, but I assume it is something that is
> > still required right?
> 
> It is AFAICS.
> 
> > If there's a better alternative we can take to make this
> > filter better instead, I'm happy to hear ideas. Otherwise hopefully this
> > proposal is logical too.
> 
> It looks reasonable to me.
> 
> For the schedutil changes:
> 
> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Thanks for having a look!

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip: sched/core] sched/uclamp: Fix iowait boost escaping uclamp restriction
  2021-12-16 22:53 ` [PATCH 2/2] sched/uclamp: Fix iowait boost escaping uclamp restriction Qais Yousef
  2021-12-17 15:56   ` Rafael J. Wysocki
@ 2022-01-28  7:40   ` tip-bot2 for Qais Yousef
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot2 for Qais Yousef @ 2022-01-28  7:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Qais Yousef, Peter Zijlstra (Intel),
	Rafael J. Wysocki, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     d37aee9018e68b0d356195caefbb651910e0bbfa
Gitweb:        https://git.kernel.org/tip/d37aee9018e68b0d356195caefbb651910e0bbfa
Author:        Qais Yousef <qais.yousef@arm.com>
AuthorDate:    Thu, 16 Dec 2021 22:53:20 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 27 Jan 2022 12:57:19 +01:00

sched/uclamp: Fix iowait boost escaping uclamp restriction

iowait_boost signal is applied independently of util and doesn't take
into account uclamp settings of the rq. An io heavy task that is capped
by uclamp_max could still request higher frequency because
sugov_iowait_apply() doesn't clamp the boost via uclamp_rq_util_with()
like effective_cpu_util() does.

Make sure that iowait_boost honours uclamp requests by calling
uclamp_rq_util_with() when applying the boost.

Fixes: 982d9cdc22c9 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20211216225320.2957053-3-qais.yousef@arm.com
---
 kernel/sched/cpufreq_schedutil.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 62d98b0..6d65ab6 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -289,6 +289,7 @@ static void sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time)
 	 * into the same scale so we can compare.
 	 */
 	boost = (sg_cpu->iowait_boost * sg_cpu->max) >> SCHED_CAPACITY_SHIFT;
+	boost = uclamp_rq_util_with(cpu_rq(sg_cpu->cpu), boost, NULL);
 	if (sg_cpu->util < boost)
 		sg_cpu->util = boost;
 }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [tip: sched/core] sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max
  2021-12-16 22:53 ` [PATCH 1/2] sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max Qais Yousef
  2021-12-17 15:54   ` Rafael J. Wysocki
@ 2022-01-28  7:40   ` tip-bot2 for Qais Yousef
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot2 for Qais Yousef @ 2022-01-28  7:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Qais Yousef, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     7a17e1db1265471f7718af100cfc5e41280d53a7
Gitweb:        https://git.kernel.org/tip/7a17e1db1265471f7718af100cfc5e41280d53a7
Author:        Qais Yousef <qais.yousef@arm.com>
AuthorDate:    Thu, 16 Dec 2021 22:53:19 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 27 Jan 2022 12:57:19 +01:00

sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max

sugov_update_single_{freq, perf}() contains a 'busy' filter that ensures
we don't bring the frqeuency down if there's no idle time (CPU is busy).

The problem is that with uclamp_max we will have scenarios where a busy
task is capped to run at a lower frequency and this filter prevents
applying the capping when this task starts running.

We handle this by skipping the filter when uclamp is enabled and the rq
is being capped by uclamp_max.

We introduce a new function uclamp_rq_is_capped() to help detecting when
this capping is taking effect. Some code shuffling was required to allow
using cpu_util_{cfs, rt}() in this new function.

On 2 Core SMT2 Intel laptop I see:

Without this patch:

	uclampset -M 0 sysbench --test=cpu --threads = 4 run

produces a score of ~3200 consistently. Which is the highest possible.

Compiling the kernel also results in frequency running at max 3.1GHz all
the time - running uclampset -M 400 to cap it has no effect without this
patch.

With this patch:

	uclampset -M 0 sysbench --test=cpu --threads = 4 run

produces a score of ~1100 with some outliers in ~1700. Uclamp max
aggregates the performance requirements, so having high values sometimes
is expected if some other task happens to require that frequency starts
running at the same time.

When compiling the kernel with uclampset -M 400 I can see the
frequencies mostly in the ~2GHz region. Helpful to conserve power and
prevent heating when not plugged in.

Fixes: 982d9cdc22c9 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211216225320.2957053-2-qais.yousef@arm.com
---
 kernel/sched/cpufreq_schedutil.c |  10 +-
 kernel/sched/sched.h             | 181 ++++++++++++++++--------------
 2 files changed, 107 insertions(+), 84 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 2677888..62d98b0 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -348,8 +348,11 @@ static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
 	/*
 	 * Do not reduce the frequency if the CPU has not been idle
 	 * recently, as the reduction is likely to be premature then.
+	 *
+	 * Except when the rq is capped by uclamp_max.
 	 */
-	if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
+	if (!uclamp_rq_is_capped(cpu_rq(sg_cpu->cpu)) &&
+	    sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
 		next_f = sg_policy->next_freq;
 
 		/* Restore cached freq as next_freq has changed */
@@ -395,8 +398,11 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
 	/*
 	 * Do not reduce the target performance level if the CPU has not been
 	 * idle recently, as the reduction is likely to be premature then.
+	 *
+	 * Except when the rq is capped by uclamp_max.
 	 */
-	if (sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util)
+	if (!uclamp_rq_is_capped(cpu_rq(sg_cpu->cpu)) &&
+	    sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util)
 		sg_cpu->util = prev_util;
 
 	cpufreq_driver_adjust_perf(sg_cpu->cpu, map_util_perf(sg_cpu->bw_dl),
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index de53be9..9b33ba9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2841,88 +2841,6 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
 static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
 #endif /* CONFIG_CPU_FREQ */
 
-#ifdef CONFIG_UCLAMP_TASK
-unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
-
-/**
- * uclamp_rq_util_with - clamp @util with @rq and @p effective uclamp values.
- * @rq:		The rq to clamp against. Must not be NULL.
- * @util:	The util value to clamp.
- * @p:		The task to clamp against. Can be NULL if you want to clamp
- *		against @rq only.
- *
- * Clamps the passed @util to the max(@rq, @p) effective uclamp values.
- *
- * If sched_uclamp_used static key is disabled, then just return the util
- * without any clamping since uclamp aggregation at the rq level in the fast
- * path is disabled, rendering this operation a NOP.
- *
- * Use uclamp_eff_value() if you don't care about uclamp values at rq level. It
- * will return the correct effective uclamp value of the task even if the
- * static key is disabled.
- */
-static __always_inline
-unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
-				  struct task_struct *p)
-{
-	unsigned long min_util = 0;
-	unsigned long max_util = 0;
-
-	if (!static_branch_likely(&sched_uclamp_used))
-		return util;
-
-	if (p) {
-		min_util = uclamp_eff_value(p, UCLAMP_MIN);
-		max_util = uclamp_eff_value(p, UCLAMP_MAX);
-
-		/*
-		 * Ignore last runnable task's max clamp, as this task will
-		 * reset it. Similarly, no need to read the rq's min clamp.
-		 */
-		if (rq->uclamp_flags & UCLAMP_FLAG_IDLE)
-			goto out;
-	}
-
-	min_util = max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_MIN].value));
-	max_util = max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_MAX].value));
-out:
-	/*
-	 * Since CPU's {min,max}_util clamps are MAX aggregated considering
-	 * RUNNABLE tasks with _different_ clamps, we can end up with an
-	 * inversion. Fix it now when the clamps are applied.
-	 */
-	if (unlikely(min_util >= max_util))
-		return min_util;
-
-	return clamp(util, min_util, max_util);
-}
-
-/*
- * When uclamp is compiled in, the aggregation at rq level is 'turned off'
- * by default in the fast path and only gets turned on once userspace performs
- * an operation that requires it.
- *
- * Returns true if userspace opted-in to use uclamp and aggregation at rq level
- * hence is active.
- */
-static inline bool uclamp_is_used(void)
-{
-	return static_branch_likely(&sched_uclamp_used);
-}
-#else /* CONFIG_UCLAMP_TASK */
-static inline
-unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
-				  struct task_struct *p)
-{
-	return util;
-}
-
-static inline bool uclamp_is_used(void)
-{
-	return false;
-}
-#endif /* CONFIG_UCLAMP_TASK */
-
 #ifdef arch_scale_freq_capacity
 # ifndef arch_scale_freq_invariant
 #  define arch_scale_freq_invariant()	true
@@ -3020,6 +2938,105 @@ static inline unsigned long cpu_util_rt(struct rq *rq)
 }
 #endif
 
+#ifdef CONFIG_UCLAMP_TASK
+unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
+
+/**
+ * uclamp_rq_util_with - clamp @util with @rq and @p effective uclamp values.
+ * @rq:		The rq to clamp against. Must not be NULL.
+ * @util:	The util value to clamp.
+ * @p:		The task to clamp against. Can be NULL if you want to clamp
+ *		against @rq only.
+ *
+ * Clamps the passed @util to the max(@rq, @p) effective uclamp values.
+ *
+ * If sched_uclamp_used static key is disabled, then just return the util
+ * without any clamping since uclamp aggregation at the rq level in the fast
+ * path is disabled, rendering this operation a NOP.
+ *
+ * Use uclamp_eff_value() if you don't care about uclamp values at rq level. It
+ * will return the correct effective uclamp value of the task even if the
+ * static key is disabled.
+ */
+static __always_inline
+unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
+				  struct task_struct *p)
+{
+	unsigned long min_util = 0;
+	unsigned long max_util = 0;
+
+	if (!static_branch_likely(&sched_uclamp_used))
+		return util;
+
+	if (p) {
+		min_util = uclamp_eff_value(p, UCLAMP_MIN);
+		max_util = uclamp_eff_value(p, UCLAMP_MAX);
+
+		/*
+		 * Ignore last runnable task's max clamp, as this task will
+		 * reset it. Similarly, no need to read the rq's min clamp.
+		 */
+		if (rq->uclamp_flags & UCLAMP_FLAG_IDLE)
+			goto out;
+	}
+
+	min_util = max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_MIN].value));
+	max_util = max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_MAX].value));
+out:
+	/*
+	 * Since CPU's {min,max}_util clamps are MAX aggregated considering
+	 * RUNNABLE tasks with _different_ clamps, we can end up with an
+	 * inversion. Fix it now when the clamps are applied.
+	 */
+	if (unlikely(min_util >= max_util))
+		return min_util;
+
+	return clamp(util, min_util, max_util);
+}
+
+/* Is the rq being capped/throttled by uclamp_max? */
+static inline bool uclamp_rq_is_capped(struct rq *rq)
+{
+	unsigned long rq_util;
+	unsigned long max_util;
+
+	if (!static_branch_likely(&sched_uclamp_used))
+		return false;
+
+	rq_util = cpu_util_cfs(cpu_of(rq)) + cpu_util_rt(rq);
+	max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
+
+	return max_util != SCHED_CAPACITY_SCALE && rq_util >= max_util;
+}
+
+/*
+ * When uclamp is compiled in, the aggregation at rq level is 'turned off'
+ * by default in the fast path and only gets turned on once userspace performs
+ * an operation that requires it.
+ *
+ * Returns true if userspace opted-in to use uclamp and aggregation at rq level
+ * hence is active.
+ */
+static inline bool uclamp_is_used(void)
+{
+	return static_branch_likely(&sched_uclamp_used);
+}
+#else /* CONFIG_UCLAMP_TASK */
+static inline
+unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
+				  struct task_struct *p)
+{
+	return util;
+}
+
+static inline bool uclamp_rq_is_capped(struct rq *rq) { return false; }
+
+static inline bool uclamp_is_used(void)
+{
+	return false;
+}
+#endif /* CONFIG_UCLAMP_TASK */
+
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 static inline unsigned long cpu_util_irq(struct rq *rq)
 {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-01-28  7:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-16 22:53 [PATCH 0/2] uclamp_max vs schedutil fixes Qais Yousef
2021-12-16 22:53 ` [PATCH 1/2] sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max Qais Yousef
2021-12-17 15:54   ` Rafael J. Wysocki
2021-12-20 10:48     ` Qais Yousef
2022-01-28  7:40   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2021-12-16 22:53 ` [PATCH 2/2] sched/uclamp: Fix iowait boost escaping uclamp restriction Qais Yousef
2021-12-17 15:56   ` Rafael J. Wysocki
2022-01-28  7:40   ` [tip: sched/core] " tip-bot2 for Qais Yousef

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.