linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] sched/fair: Task placement biasing using uclamp
@ 2019-11-20 17:55 Valentin Schneider
  2019-11-20 17:55 ` [PATCH 1/3] sched/uclamp: Make uclamp_util_*() helpers use and return UL values Valentin Schneider
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Valentin Schneider @ 2019-11-20 17:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qperret, qais.yousef, morten.rasmussen

Hi,

While uclamp restrictions currently only impacts schedutil's frequency
selection, it would make sense to also let it impact CPU selection in
asymmetric topologies. This would let us steer specific tasks towards
certain CPU capacities regardless of their actual utilization - I give a
few examples in patch 3.

The first two patches are just paving the way for the meat of the thing
which is in patch 3.

Note that this is in the same spirit as what Patrick had proposed for EAS
on Android [1]

[1]: https://android.googlesource.com/kernel/common/+/b61876ed122f816660fe49e0de1b7ee4891deaa2%5E%21

Cheers,
Valentin

Valentin Schneider (3):
  sched/uclamp: Make uclamp_util_*() helpers use and return UL values
  sched/uclamp: Rename uclamp_util_*() into uclamp_rq_util_*()
  sched/fair: Consider uclamp for "task fits capacity" checks

 kernel/sched/cpufreq_schedutil.c |  2 +-
 kernel/sched/fair.c              | 11 +++++++++-
 kernel/sched/sched.h             | 35 ++++++++++++++++++++++----------
 3 files changed, 35 insertions(+), 13 deletions(-)

--
2.22.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] sched/uclamp: Make uclamp_util_*() helpers use and return UL values
  2019-11-20 17:55 [PATCH 0/3] sched/fair: Task placement biasing using uclamp Valentin Schneider
@ 2019-11-20 17:55 ` Valentin Schneider
  2019-11-20 17:55 ` [PATCH 2/3] sched/uclamp: Rename uclamp_util_*() into uclamp_rq_util_*() Valentin Schneider
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Valentin Schneider @ 2019-11-20 17:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qperret, qais.yousef, morten.rasmussen

Vincent pointed out recently that the canonical type for utilization
values is 'unsigned long'. Internally uclamp uses 'unsigned int' values for
cache optimization, but this doesn't have to be exported to its users.

Make the uclamp helpers that deal with utilization use and return unsigned
long values.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/sched.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 280a3c735935..f1d035e5df7e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
 unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
 
 static __always_inline
-unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
-			      struct task_struct *p)
+unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
+			       struct task_struct *p)
 {
-	unsigned int min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
-	unsigned int max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
+	unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
+	unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
 
 	if (p) {
-		min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
-		max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
+		min_util = max_t(unsigned long, min_util, uclamp_eff_value(p, UCLAMP_MIN));
+		max_util = max_t(unsigned long, max_util, uclamp_eff_value(p, UCLAMP_MAX));
 	}
 
 	/*
@@ -2325,17 +2325,17 @@ unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
 	return clamp(util, min_util, max_util);
 }
 
-static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
+static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
 {
 	return uclamp_util_with(rq, util, NULL);
 }
 #else /* CONFIG_UCLAMP_TASK */
-static inline unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
-					    struct task_struct *p)
+static inline unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
+					     struct task_struct *p)
 {
 	return util;
 }
-static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
+static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
 {
 	return util;
 }
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] sched/uclamp: Rename uclamp_util_*() into uclamp_rq_util_*()
  2019-11-20 17:55 [PATCH 0/3] sched/fair: Task placement biasing using uclamp Valentin Schneider
  2019-11-20 17:55 ` [PATCH 1/3] sched/uclamp: Make uclamp_util_*() helpers use and return UL values Valentin Schneider
@ 2019-11-20 17:55 ` Valentin Schneider
  2019-11-20 17:55 ` [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks Valentin Schneider
  2019-11-21 12:00 ` [PATCH 0/3] sched/fair: Task placement biasing using uclamp Quentin Perret
  3 siblings, 0 replies; 14+ messages in thread
From: Valentin Schneider @ 2019-11-20 17:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qperret, qais.yousef, morten.rasmussen

The current helpers return (CPU) rq utilization with uclamp restrictions
taken into account. As I'm about to introduce a task clamped utilization
variant, make it explicit that those deal with runqueues.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/cpufreq_schedutil.c |  2 +-
 kernel/sched/sched.h             | 14 +++++++-------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 86800b4d5453..9deb3a13416d 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -240,7 +240,7 @@ unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs,
 	 */
 	util = util_cfs + cpu_util_rt(rq);
 	if (type == FREQUENCY_UTIL)
-		util = uclamp_util_with(rq, util, p);
+		util = uclamp_rq_util_with(rq, util, p);
 
 	dl_util = cpu_util_dl(rq);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f1d035e5df7e..900328c4eeef 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2303,8 +2303,8 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
 unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
 
 static __always_inline
-unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
-			       struct task_struct *p)
+unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
+				  struct task_struct *p)
 {
 	unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
 	unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
@@ -2325,17 +2325,17 @@ unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
 	return clamp(util, min_util, max_util);
 }
 
-static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
+static inline unsigned long uclamp_rq_util(struct rq *rq, unsigned long util)
 {
-	return uclamp_util_with(rq, util, NULL);
+	return uclamp_rq_util_with(rq, util, NULL);
 }
 #else /* CONFIG_UCLAMP_TASK */
-static inline unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
-					     struct task_struct *p)
+static inline unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
+						struct task_struct *p)
 {
 	return util;
 }
-static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
+static inline unsigned long uclamp_rq_util(struct rq *rq, unsigned long util)
 {
 	return util;
 }
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-20 17:55 [PATCH 0/3] sched/fair: Task placement biasing using uclamp Valentin Schneider
  2019-11-20 17:55 ` [PATCH 1/3] sched/uclamp: Make uclamp_util_*() helpers use and return UL values Valentin Schneider
  2019-11-20 17:55 ` [PATCH 2/3] sched/uclamp: Rename uclamp_util_*() into uclamp_rq_util_*() Valentin Schneider
@ 2019-11-20 17:55 ` Valentin Schneider
  2019-11-21 11:56   ` Quentin Perret
  2019-11-24 22:20   ` Qais Yousef
  2019-11-21 12:00 ` [PATCH 0/3] sched/fair: Task placement biasing using uclamp Quentin Perret
  3 siblings, 2 replies; 14+ messages in thread
From: Valentin Schneider @ 2019-11-20 17:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qperret, qais.yousef, morten.rasmussen

task_fits_capacity() drives CPU selection at wakeup time, and is also used
to detect misfit tasks. Right now it does so by comparing task_util_est()
with a CPU's capacity, but doesn't take into account uclamp restrictions.

There's a few interesting uses that can come out of doing this. For
instance, a low uclamp.max value could prevent certain tasks from being
flagged as misfit tasks, so they could merrily remain on low-capacity CPUs.
Similarly, a high uclamp.min value would steer tasks towards high capacity
CPU at wakeup (and, should that fail, later steered via misfit balancing),
so such "boosted" tasks would favor CPUs of higher capacity.

Introduce uclamp_task_util() and make task_fits_capacity() use it. To keep
things consistent between overutilized wakeup CPU selection (wake_cap())
and !overutilized wakeup CPU selection (find_energy_efficient_cpu()),
add a task_fits_capacity() check to find_energy_efficient_cpu().

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c  | 11 ++++++++++-
 kernel/sched/sched.h | 13 +++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08a233e97a01..446409252b23 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3822,7 +3822,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
 
 static inline int task_fits_capacity(struct task_struct *p, long capacity)
 {
-	return fits_capacity(task_util_est(p), capacity);
+	return fits_capacity(uclamp_task_util(p, task_util_est(p)), capacity);
 }
 
 static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
@@ -6274,6 +6274,15 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 			if (!fits_capacity(util, cpu_cap))
 				continue;
 
+			/*
+			 * Skip CPUs that don't satisfy uclamp requests. Note
+			 * that the above already ensures the CPU has enough
+			 * spare capacity for the task; this is only really for
+			 * uclamp restrictions.
+			 */
+			if (!task_fits_capacity(p, capacity_orig_of(cpu)))
+				continue;
+
 			/* Always use prev_cpu as a candidate. */
 			if (cpu == prev_cpu) {
 				prev_delta = compute_energy(p, prev_cpu, pd);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 900328c4eeef..74bf08ec1a01 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2329,6 +2329,14 @@ static inline unsigned long uclamp_rq_util(struct rq *rq, unsigned long util)
 {
 	return uclamp_rq_util_with(rq, util, NULL);
 }
+
+static inline
+unsigned long uclamp_task_util(struct task_struct *p, unsigned long util)
+{
+	return clamp(util,
+		     (unsigned long)uclamp_eff_value(p, UCLAMP_MIN),
+		     (unsigned long)uclamp_eff_value(p, UCLAMP_MAX));
+}
 #else /* CONFIG_UCLAMP_TASK */
 static inline unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
 						struct task_struct *p)
@@ -2339,6 +2347,11 @@ static inline unsigned long uclamp_rq_util(struct rq *rq, unsigned long util)
 {
 	return util;
 }
+static inline
+unsigned long uclamp_task_util(struct task_struct *p, unsigned long util)
+{
+	return util;
+}
 #endif /* CONFIG_UCLAMP_TASK */
 
 #ifdef arch_scale_freq_capacity
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-20 17:55 ` [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks Valentin Schneider
@ 2019-11-21 11:56   ` Quentin Perret
  2019-11-21 12:56     ` Valentin Schneider
  2019-11-24 22:20   ` Qais Yousef
  1 sibling, 1 reply; 14+ messages in thread
From: Quentin Perret @ 2019-11-21 11:56 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qais.yousef, morten.rasmussen

On Wednesday 20 Nov 2019 at 17:55:33 (+0000), Valentin Schneider wrote:
> +static inline
> +unsigned long uclamp_task_util(struct task_struct *p, unsigned long util)

This 'util' parameter is always task_util_est(p) right ? You might want
to remove it.

> +{
> +	return clamp(util,
> +		     (unsigned long)uclamp_eff_value(p, UCLAMP_MIN),
> +		     (unsigned long)uclamp_eff_value(p, UCLAMP_MAX));
> +}

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] sched/fair: Task placement biasing using uclamp
  2019-11-20 17:55 [PATCH 0/3] sched/fair: Task placement biasing using uclamp Valentin Schneider
                   ` (2 preceding siblings ...)
  2019-11-20 17:55 ` [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks Valentin Schneider
@ 2019-11-21 12:00 ` Quentin Perret
  3 siblings, 0 replies; 14+ messages in thread
From: Quentin Perret @ 2019-11-21 12:00 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qais.yousef, morten.rasmussen

On Wednesday 20 Nov 2019 at 17:55:30 (+0000), Valentin Schneider wrote:
> While uclamp restrictions currently only impacts schedutil's frequency
> selection, it would make sense to also let it impact CPU selection in
> asymmetric topologies. This would let us steer specific tasks towards
> certain CPU capacities regardless of their actual utilization - I give a
> few examples in patch 3.
> 
> The first two patches are just paving the way for the meat of the thing
> which is in patch 3.

This makes sense, and is in line with what Qais proposed for RT I think.
So, with the nit applied to patch 3:

  Reviewed-by: Quentin Perret <qperret@google.com>

Thanks!
Quentin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-21 11:56   ` Quentin Perret
@ 2019-11-21 12:56     ` Valentin Schneider
  2019-11-21 13:30       ` Quentin Perret
  0 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2019-11-21 12:56 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qais.yousef, morten.rasmussen

On 21/11/2019 11:56, Quentin Perret wrote:
> On Wednesday 20 Nov 2019 at 17:55:33 (+0000), Valentin Schneider wrote:
>> +static inline
>> +unsigned long uclamp_task_util(struct task_struct *p, unsigned long util)
> 
> This 'util' parameter is always task_util_est(p) right ? You might want
> to remove it.
> 

I went with copying uclamp_rq_util()'s API, but you're right in that I don't
see what other value (than util_est) would make sense for this helper. If
there is no objections I'll kill the parameter for v2.

>> +{
>> +	return clamp(util,
>> +		     (unsigned long)uclamp_eff_value(p, UCLAMP_MIN),
>> +		     (unsigned long)uclamp_eff_value(p, UCLAMP_MAX));
>> +}
> 
> Thanks,
> Quentin
> 

Another thing I realized overnight; tell me what you think:

> @@ -6274,6 +6274,15 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  			if (!fits_capacity(util, cpu_cap))
>  				continue;
>  
> +			/*
> +			 * Skip CPUs that don't satisfy uclamp requests. Note
> +			 * that the above already ensures the CPU has enough
> +			 * spare capacity for the task; this is only really for
> +			 * uclamp restrictions.
> +			 */
> +			if (!task_fits_capacity(p, capacity_orig_of(cpu)))
> +				continue;

This is partly redundant with the above, I think. What we really want here
is just

fits_capacity(uclamp_eff_value(p, UCLAMP_MIN), capacity_orig_of(cpu))

but this would require some inline #ifdeffery.

> +
>  			/* Always use prev_cpu as a candidate. */
>  			if (cpu == prev_cpu) {
>  				prev_delta = compute_energy(p, prev_cpu, pd);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-21 12:56     ` Valentin Schneider
@ 2019-11-21 13:30       ` Quentin Perret
  2019-11-21 14:51         ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Quentin Perret @ 2019-11-21 13:30 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qais.yousef, morten.rasmussen

On Thursday 21 Nov 2019 at 12:56:39 (+0000), Valentin Schneider wrote:
> > @@ -6274,6 +6274,15 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >  			if (!fits_capacity(util, cpu_cap))
> >  				continue;
> >  
> > +			/*
> > +			 * Skip CPUs that don't satisfy uclamp requests. Note
> > +			 * that the above already ensures the CPU has enough
> > +			 * spare capacity for the task; this is only really for
> > +			 * uclamp restrictions.
> > +			 */
> > +			if (!task_fits_capacity(p, capacity_orig_of(cpu)))
> > +				continue;
> 
> This is partly redundant with the above, I think. What we really want here
> is just
> 
> fits_capacity(uclamp_eff_value(p, UCLAMP_MIN), capacity_orig_of(cpu))
> 
> but this would require some inline #ifdeffery.

This suggested change lacks the UCLAMP_MAX part, which is a shame
because this is precisely in the EAS path that we should try and
down-migrate tasks if they have an appropriate max_clamp. So, your first
proposal made sense, IMO.

Another option to avoid the redundancy would be to do something along
the lines of the totally untested diff below.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69a81a5709ff..38cb5fe7ba65 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6372,9 +6372,12 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
                        if (!cpumask_test_cpu(cpu, p->cpus_ptr))
                                continue;
 
-                       /* Skip CPUs that will be overutilized. */
                        util = cpu_util_next(cpu, p, cpu);
                        cpu_cap = capacity_of(cpu);
+                       spare_cap = cpu_cap - util;
+                       util = uclamp_util_with(cpu_rq(cpu), util, p);
+
+                       /* Skip CPUs that will be overutilized. */
                        if (!fits_capacity(util, cpu_cap))
                                continue;
 
@@ -6389,7 +6392,6 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
                         * Find the CPU with the maximum spare capacity in
                         * the performance domain
                         */
-                       spare_cap = cpu_cap - util;
                        if (spare_cap > max_spare_cap) {
                                max_spare_cap = spare_cap;
                                max_spare_cap_cpu = cpu;

Thoughts ?

Thanks,
Quentin

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-21 13:30       ` Quentin Perret
@ 2019-11-21 14:51         ` Valentin Schneider
  2019-11-21 15:30           ` Quentin Perret
  0 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2019-11-21 14:51 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qais.yousef, morten.rasmussen

On 21/11/2019 13:30, Quentin Perret wrote:
> On Thursday 21 Nov 2019 at 12:56:39 (+0000), Valentin Schneider wrote:
>>> @@ -6274,6 +6274,15 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>>>  			if (!fits_capacity(util, cpu_cap))
>>>  				continue;
>>>  
>>> +			/*
>>> +			 * Skip CPUs that don't satisfy uclamp requests. Note
>>> +			 * that the above already ensures the CPU has enough
>>> +			 * spare capacity for the task; this is only really for
>>> +			 * uclamp restrictions.
>>> +			 */
>>> +			if (!task_fits_capacity(p, capacity_orig_of(cpu)))
>>> +				continue;
>>
>> This is partly redundant with the above, I think. What we really want here
>> is just
>>
>> fits_capacity(uclamp_eff_value(p, UCLAMP_MIN), capacity_orig_of(cpu))
>>
>> but this would require some inline #ifdeffery.
> 
> This suggested change lacks the UCLAMP_MAX part, which is a shame
> because this is precisely in the EAS path that we should try and
> down-migrate tasks if they have an appropriate max_clamp. So, your first
> proposal made sense, IMO.
> 
 
Hm right, had to let that spin in my head for a while but I think I got it.

I was only really thinking of:

  (h960: LITTLE = 462 cap, big = 1024)
  p.uclamp.min = 512 -> skip LITTLEs regardless of the actual util_est

but your point is we also want stuff like:

  p.uclamp.max = 300 -> accept LITTLEs regardless of the actual util_est

I'll keep the feec() change as-is and add something like the above in the
changelog for v2.

> Another option to avoid the redundancy would be to do something along
> the lines of the totally untested diff below.
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 69a81a5709ff..38cb5fe7ba65 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6372,9 +6372,12 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>                         if (!cpumask_test_cpu(cpu, p->cpus_ptr))
>                                 continue;
>  
> -                       /* Skip CPUs that will be overutilized. */
>                         util = cpu_util_next(cpu, p, cpu);
>                         cpu_cap = capacity_of(cpu);
> +                       spare_cap = cpu_cap - util;
> +                       util = uclamp_util_with(cpu_rq(cpu), util, p);
> +
> +                       /* Skip CPUs that will be overutilized. */
>                         if (!fits_capacity(util, cpu_cap))
>                                 continue;
>  
> @@ -6389,7 +6392,6 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>                          * Find the CPU with the maximum spare capacity in
>                          * the performance domain
>                          */
> -                       spare_cap = cpu_cap - util;
>                         if (spare_cap > max_spare_cap) {
>                                 max_spare_cap = spare_cap;
>                                 max_spare_cap_cpu = cpu;
> 
> Thoughts ?
> 

uclamp_util_with() (or uclamp_rq_util_with() ;)) picks the max between the
rq-aggregated clamps and the task clamps, which isn't what we want. If the
task has a low-ish uclamp.max (e.g. the 300 example from above) but the
rq-wide max-aggregated uclamp.max is ~800, we'd clamp using that 800. It
makes sense for frequency selection, but not for task placement IMO.

> Thanks,
> Quentin
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-21 14:51         ` Valentin Schneider
@ 2019-11-21 15:30           ` Quentin Perret
  2019-11-21 17:22             ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Quentin Perret @ 2019-11-21 15:30 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qais.yousef, morten.rasmussen

On Thursday 21 Nov 2019 at 14:51:06 (+0000), Valentin Schneider wrote:
> On 21/11/2019 13:30, Quentin Perret wrote:
> > On Thursday 21 Nov 2019 at 12:56:39 (+0000), Valentin Schneider wrote:
> >>> @@ -6274,6 +6274,15 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >>>  			if (!fits_capacity(util, cpu_cap))
> >>>  				continue;
> >>>  
> >>> +			/*
> >>> +			 * Skip CPUs that don't satisfy uclamp requests. Note
> >>> +			 * that the above already ensures the CPU has enough
> >>> +			 * spare capacity for the task; this is only really for
> >>> +			 * uclamp restrictions.
> >>> +			 */
> >>> +			if (!task_fits_capacity(p, capacity_orig_of(cpu)))
> >>> +				continue;
> >>
> >> This is partly redundant with the above, I think. What we really want here
> >> is just
> >>
> >> fits_capacity(uclamp_eff_value(p, UCLAMP_MIN), capacity_orig_of(cpu))
> >>
> >> but this would require some inline #ifdeffery.
> > 
> > This suggested change lacks the UCLAMP_MAX part, which is a shame
> > because this is precisely in the EAS path that we should try and
> > down-migrate tasks if they have an appropriate max_clamp. So, your first
> > proposal made sense, IMO.
> > 
>  
> Hm right, had to let that spin in my head for a while but I think I got it.
> 
> I was only really thinking of:
> 
>   (h960: LITTLE = 462 cap, big = 1024)
>   p.uclamp.min = 512 -> skip LITTLEs regardless of the actual util_est
> 
> but your point is we also want stuff like:
> 
>   p.uclamp.max = 300 -> accept LITTLEs regardless of the actual util_est

Right, sorry if my message wasn't clear.

> I'll keep the feec() change as-is and add something like the above in the
> changelog for v2.
> 
> > Another option to avoid the redundancy would be to do something along
> > the lines of the totally untested diff below.
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 69a81a5709ff..38cb5fe7ba65 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6372,9 +6372,12 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >                         if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> >                                 continue;
> >  
> > -                       /* Skip CPUs that will be overutilized. */
> >                         util = cpu_util_next(cpu, p, cpu);
> >                         cpu_cap = capacity_of(cpu);
> > +                       spare_cap = cpu_cap - util;
> > +                       util = uclamp_util_with(cpu_rq(cpu), util, p);
> > +
> > +                       /* Skip CPUs that will be overutilized. */
> >                         if (!fits_capacity(util, cpu_cap))
> >                                 continue;
> >  
> > @@ -6389,7 +6392,6 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >                          * Find the CPU with the maximum spare capacity in
> >                          * the performance domain
> >                          */
> > -                       spare_cap = cpu_cap - util;
> >                         if (spare_cap > max_spare_cap) {
> >                                 max_spare_cap = spare_cap;
> >                                 max_spare_cap_cpu = cpu;
> > 
> > Thoughts ?
> > 
> 
> uclamp_util_with() (or uclamp_rq_util_with() ;)) picks the max between the
> rq-aggregated clamps and the task clamps, which isn't what we want. If the
> task has a low-ish uclamp.max (e.g. the 300 example from above) but the
> rq-wide max-aggregated uclamp.max is ~800, we'd clamp using that 800. It
> makes sense for frequency selection, but not for task placement IMO.

Right, but you could argue that this is in fact a correct behaviour.
What we want to know is 'is this CPU big enough to meet the capacity
request if I enqueue p there ?'. And the 'capacity request' is the
aggregated rq-wide clamped util, IMO.

If enqueuing 'p' on a given CPU will cause the rq-wide clamped util to
go above the CPU capacity, we want to skip that CPU.

The obvious case is if p's min_clamp is larger than the CPU capacity.

But similarly, if p's max_clamp is going to be ignored because of
another task with a larger max_clamp on the same rq, this is relevant
information too --  the resulting capacity request might be above the
CPU capacity if p's util_avg is large, so we should probably skip the
CPU too no ?

Are we gaining anything if we decide to not align the EAS path and the
sugov path ?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-21 15:30           ` Quentin Perret
@ 2019-11-21 17:22             ` Valentin Schneider
  0 siblings, 0 replies; 14+ messages in thread
From: Valentin Schneider @ 2019-11-21 17:22 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qais.yousef, morten.rasmussen

On 21/11/2019 15:30, Quentin Perret wrote:
>> uclamp_util_with() (or uclamp_rq_util_with() ;)) picks the max between the
>> rq-aggregated clamps and the task clamps, which isn't what we want. If the
>> task has a low-ish uclamp.max (e.g. the 300 example from above) but the
>> rq-wide max-aggregated uclamp.max is ~800, we'd clamp using that 800. It
>> makes sense for frequency selection, but not for task placement IMO.
> 
> Right, but you could argue that this is in fact a correct behaviour.
> What we want to know is 'is this CPU big enough to meet the capacity
> request if I enqueue p there ?'. And the 'capacity request' is the
> aggregated rq-wide clamped util, IMO.
> 
> If enqueuing 'p' on a given CPU will cause the rq-wide clamped util to
> go above the CPU capacity, we want to skip that CPU.
> 
> The obvious case is if p's min_clamp is larger than the CPU capacity.

Right, so far we get that with task_fits_capacity().

> 
> But similarly, if p's max_clamp is going to be ignored because of
> another task with a larger max_clamp on the same rq, this is relevant
> information too --  the resulting capacity request might be above the
> CPU capacity if p's util_avg is large, so we should probably skip the
> CPU too no ?
> 

Hmph thinking is hard but I think I agree with you.


Say you have

  rq.cpu_capacity_orig = 1024
  rq.util_avg = 300
  rq.uclamp.max = 600

  p.util_est = 600
  p.uclamp.max = 300

If we enqueue p on that rq, we shouldn't go above 600 util (or something
close, depending on the frequency this lets us select). But, AFAICT,
cpu_util_next() will see this as going to 900 util and we'll thus skip this
CPU for this task (because that would make us overutilized). With your
suggested change, we wouldn't skip this CPU. Plus this is what we end up
using in compute_energy(), so this would keep both ends aligned.


I think we have a similar problem for downmigration (with the current patch),
say you have:

  rq.cpu_capacity_orig = 462
  rq.util_avg = 50
  rq.uclamp.max = 100

  p.util_est = 512
  p.uclamp.max = 200

In this case I think we should get 200 cpu util, but we first do

  /* Skip CPUs that will be overutilized. */                                                         
  util = cpu_util_next(cpu, p, cpu);                                                                 
  cpu_cap = capacity_of(cpu);                                                                        
  if (!fits_capacity(util, cpu_cap))                                                                 
          continue;                    

which *doesn't* look at the clamps, so we would see ~562 util which doesn't
fit on that small CPU, so we'd skip it. With your approach we would correctly
clamp this to 200 and carry on.


One thing I'd like to point out is if we have tasks with default clamp values
(.min=0, .max=1024) enqueued, we won't "throttle" tasks with low uclamp.max.
So something to keep in mind.

One last thing: this makes CPU selection slightly different from what
wake_cap() will do (that latter only uses uclamp_task_util()), but I think
that is fine.

> Are we gaining anything if we decide to not align the EAS path and the
> sugov path ?
> 
> Thanks,
> Quentin
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-20 17:55 ` [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks Valentin Schneider
  2019-11-21 11:56   ` Quentin Perret
@ 2019-11-24 22:20   ` Qais Yousef
  2019-11-25 17:33     ` Valentin Schneider
  1 sibling, 1 reply; 14+ messages in thread
From: Qais Yousef @ 2019-11-24 22:20 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qperret, morten.rasmussen

On 11/20/19 17:55, Valentin Schneider wrote:
> +static inline
> +unsigned long uclamp_task_util(struct task_struct *p, unsigned long util)
> +{
> +	return clamp(util,
> +		     (unsigned long)uclamp_eff_value(p, UCLAMP_MIN),
> +		     (unsigned long)uclamp_eff_value(p, UCLAMP_MAX));
> +}

uclamp_eff_value() will check if a task belongs to a cgroup, and if it does
apply its uclamp. The funny thing about the cgroup settings is that they can
have a uclamp_max < uclamp_min. uclamp_util_with() does check for this
condition but this function doesn't.

I would prefer to teach uclamp_util_with() to accept a NULL rq argument, then
we can have 2 convenient function uclamp_rq_util() and uclamp_task_util() that
are just simple wrappers around it. It'd would be a lot better to keep the
intelligence of dealing with the correct details of clamping in a single
function IMO.

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-24 22:20   ` Qais Yousef
@ 2019-11-25 17:33     ` Valentin Schneider
  2019-11-26 10:06       ` Qais Yousef
  0 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2019-11-25 17:33 UTC (permalink / raw)
  To: Qais Yousef
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qperret, morten.rasmussen

On 24/11/2019 22:20, Qais Yousef wrote:
> On 11/20/19 17:55, Valentin Schneider wrote:
>> +static inline
>> +unsigned long uclamp_task_util(struct task_struct *p, unsigned long util)
>> +{
>> +	return clamp(util,
>> +		     (unsigned long)uclamp_eff_value(p, UCLAMP_MIN),
>> +		     (unsigned long)uclamp_eff_value(p, UCLAMP_MAX));
>> +}
> 
> uclamp_eff_value() will check if a task belongs to a cgroup, and if it does
> apply its uclamp. The funny thing about the cgroup settings is that they can
> have a uclamp_max < uclamp_min. uclamp_util_with() does check for this
> condition but this function doesn't.
> 

So according to comments, uclamp_util_with() has to check for inversion
because it is comparing task clamps with rq-wide aggregated clamps.

However I have to admit I still am not clear as to how a clamp inversion
can occur. Without cgroups, it all goes through

  __sched_setscheduler() -> uclamp_validate()

which enforces uclamp.min <= uclamp.max. So the rq-wide aggregation could
not lead to an inversion.

With cgroups, I do recall something about allowing the cgroup *knobs* to be
inverted, but AFAICT that gets sanitized when it trickles down to the
scheduler via cpu_util_update_eff():

    eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX]);

So I don't see how inversion could happen within uclamp_task_util().
Patrick, any chance you could light up my torch?

FWIW I gave this a spin but couldn't create an inversion inside the
scheduler (debug prints appended at the end):

  $ cgcreate -g cpu:cgtest

  $ ./loop.sh &
  $ PID=$!

  $ echo $PID > /sys/fs/cgroup/cpu/cgtest/cgroup.procs

  $ echo 20 > /sys/fs/cgroup/cpu/cgtest/cpu.uclamp.max
  [   24.442204] tg=00000000b546653d clamp_id=1 value=205
  [   24.447169] css=00000000b546653d uclamp.min=0 uclamp.max=205
  [   24.452932] p=loop.sh tg.uclamp.min=0 tg.uclamp.max=205 p.uclamp.min=0 p.uclamp.max=1024
  [   24.461021] p=loop.sh uclamp.min=0 uclamp.max=205

  $ echo 80 > /sys/fs/cgroup/cpu/cgtest/cpu.uclamp.min
  [   25.472174] tg=00000000b546653d clamp_id=0 value=819
  [   25.477135] css=00000000b546653d uclamp.min=205 uclamp.max=205
  [   25.483066] p=loop.sh tg.uclamp.min=205 tg.uclamp.max=205 p.uclamp.min=0 p.uclamp.max=1024
  [   25.491311] p=loop.sh uclamp.min=205 uclamp.max=205

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2da6a005bb3f..7b31fdc7cc90 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1088,6 +1088,17 @@ uclamp_update_active_tasks(struct cgroup_subsys_state *css,
 			if ((0x1 << clamp_id) & clamps)
 				uclamp_update_active(p, clamp_id);
 		}
+		pr_info("p=%s tg.uclamp.min=%u tg.uclamp.max=%u "
+			"p.uclamp.min=%u p.uclamp.max=%u\n",
+			p->comm, task_group(p)->uclamp[UCLAMP_MIN].value,
+			task_group(p)->uclamp[UCLAMP_MAX].value,
+			p->uclamp_req[UCLAMP_MIN].value,
+			p->uclamp_req[UCLAMP_MAX].value);
+
+
+		pr_info("p=%s uclamp.min=%u uclamp.max=%u\n",
+			p->comm, uclamp_tg_restrict(p, UCLAMP_MIN).value,
+			uclamp_tg_restrict(p, UCLAMP_MAX).value);
 	}
 	css_task_iter_end(&it);
 }
@@ -7195,6 +7206,8 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css)
 		/* Ensure protection is always capped by limit */
 		eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX]);
 
+		pr_info("css=%p uclamp.min=%i uclamp.max=%i\n", css, eff[UCLAMP_MIN], eff[UCLAMP_MAX]);
+
 		/* Propagate most restrictive effective clamps */
 		clamps = 0x0;
 		uc_se = css_tg(css)->uclamp;
@@ -7273,8 +7286,10 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
 	rcu_read_lock();
 
 	tg = css_tg(of_css(of));
-	if (tg->uclamp_req[clamp_id].value != req.util)
+	if (tg->uclamp_req[clamp_id].value != req.util) {
+		pr_info("tg=%p clamp_id=%i value=%llu\n", tg, clamp_id, req.util);
 		uclamp_se_set(&tg->uclamp_req[clamp_id], req.util, false);
+	}
 
 	/*
 	 * Because of not recoverable conversion rounding we keep track of the




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks
  2019-11-25 17:33     ` Valentin Schneider
@ 2019-11-26 10:06       ` Qais Yousef
  0 siblings, 0 replies; 14+ messages in thread
From: Qais Yousef @ 2019-11-26 10:06 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, vincent.guittot, dietmar.eggemann,
	patrick.bellasi, qperret, morten.rasmussen

On 11/25/19 17:33, Valentin Schneider wrote:
> With cgroups, I do recall something about allowing the cgroup *knobs* to be
> inverted, but AFAICT that gets sanitized when it trickles down to the
> scheduler via cpu_util_update_eff():
> 
>     eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX]);

I missed that we cap here too.

> 
> So I don't see how inversion could happen within uclamp_task_util().
> Patrick, any chance you could light up my torch?

I think I am equally confused now. A clarification that can help improving the
comment would be useful.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-11-26 10:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-20 17:55 [PATCH 0/3] sched/fair: Task placement biasing using uclamp Valentin Schneider
2019-11-20 17:55 ` [PATCH 1/3] sched/uclamp: Make uclamp_util_*() helpers use and return UL values Valentin Schneider
2019-11-20 17:55 ` [PATCH 2/3] sched/uclamp: Rename uclamp_util_*() into uclamp_rq_util_*() Valentin Schneider
2019-11-20 17:55 ` [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks Valentin Schneider
2019-11-21 11:56   ` Quentin Perret
2019-11-21 12:56     ` Valentin Schneider
2019-11-21 13:30       ` Quentin Perret
2019-11-21 14:51         ` Valentin Schneider
2019-11-21 15:30           ` Quentin Perret
2019-11-21 17:22             ` Valentin Schneider
2019-11-24 22:20   ` Qais Yousef
2019-11-25 17:33     ` Valentin Schneider
2019-11-26 10:06       ` Qais Yousef
2019-11-21 12:00 ` [PATCH 0/3] sched/fair: Task placement biasing using uclamp Quentin Perret

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).