linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
@ 2021-06-02 12:38 Xuewen Yan
  2021-06-02 13:22 ` Quentin Perret
  0 siblings, 1 reply; 15+ messages in thread
From: Xuewen Yan @ 2021-06-02 12:38 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann
  Cc: rostedt, bsegall, mgorman, bristot, linux-kernel, zhang.lyra, xuewyan

From: Xuewen Yan <xuewen.yan@unisoc.com>

When setting cpu.uclamp.min/max in cgroup, there is no validating
like uclamp_validate() in __sched_setscheduler(). It may cause the
cpu.uclamp.min is bigger than cpu.uclamp.max.

Although there is protection in cpu_util_update_eff():
“eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX])”, it's better
not to let it happen.

Judging the uclamp value before setting uclamp_min/max, avoid
the cpu.uclamp.min is bigger than cpu.uclamp.max.

Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
 kernel/sched/core.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5226cc26a095..520a2da40dc9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8867,6 +8867,30 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
 	rcu_read_lock();
 
 	tg = css_tg(of_css(of));
+
+	switch (clamp_id) {
+	case UCLAMP_MIN: {
+		unsigned int uc_req_max = tg->uclamp_req[UCLAMP_MAX].value;
+
+		if (req.util > uc_req_max) {
+			nbytes = -EINVAL;
+			goto unlock;
+		}
+		break;
+	}
+	case UCLAMP_MAX: {
+		unsigned int uc_req_min = tg->uclamp_req[UCLAMP_MIN].value;
+
+		if (req.util < uc_req_min) {
+			nbytes = -EINVAL;
+			goto unlock;
+		}
+		break;
+	}
+	default:
+		nbytes = -EINVAL;
+		goto unlock;
+	}
 	if (tg->uclamp_req[clamp_id].value != req.util)
 		uclamp_se_set(&tg->uclamp_req[clamp_id], req.util, false);
 
@@ -8878,7 +8902,7 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
 
 	/* Update effective clamps to track the most restrictive value */
 	cpu_util_update_eff(of_css(of));
-
+unlock:
 	rcu_read_unlock();
 	mutex_unlock(&uclamp_mutex);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-02 12:38 [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max Xuewen Yan
@ 2021-06-02 13:22 ` Quentin Perret
  2021-06-03  2:24   ` Xuewen Yan
  0 siblings, 1 reply; 15+ messages in thread
From: Quentin Perret @ 2021-06-02 13:22 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, linux-kernel, zhang.lyra,
	xuewyan, patrick.bellasi, tj

+CC Patrick and Tejun

On Wednesday 02 Jun 2021 at 20:38:03 (+0800), Xuewen Yan wrote:
> From: Xuewen Yan <xuewen.yan@unisoc.com>
> 
> When setting cpu.uclamp.min/max in cgroup, there is no validating
> like uclamp_validate() in __sched_setscheduler(). It may cause the
> cpu.uclamp.min is bigger than cpu.uclamp.max.

ISTR this was intentional. We also allow child groups to ask for
whatever clamps they want, but that is always limited by the parent, and
reflected in the 'effective' values, as per the cgroup delegation model.

> Although there is protection in cpu_util_update_eff():
> “eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX])”, it's better
> not to let it happen.
> 
> Judging the uclamp value before setting uclamp_min/max, avoid
> the cpu.uclamp.min is bigger than cpu.uclamp.max.
> 
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> ---
>  kernel/sched/core.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5226cc26a095..520a2da40dc9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8867,6 +8867,30 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
>  	rcu_read_lock();
>  
>  	tg = css_tg(of_css(of));
> +
> +	switch (clamp_id) {
> +	case UCLAMP_MIN: {
> +		unsigned int uc_req_max = tg->uclamp_req[UCLAMP_MAX].value;
> +
> +		if (req.util > uc_req_max) {
> +			nbytes = -EINVAL;
> +			goto unlock;
> +		}
> +		break;
> +	}
> +	case UCLAMP_MAX: {
> +		unsigned int uc_req_min = tg->uclamp_req[UCLAMP_MIN].value;
> +
> +		if (req.util < uc_req_min) {
> +			nbytes = -EINVAL;
> +			goto unlock;
> +		}
> +		break;
> +	}
> +	default:
> +		nbytes = -EINVAL;
> +		goto unlock;
> +	}
>  	if (tg->uclamp_req[clamp_id].value != req.util)
>  		uclamp_se_set(&tg->uclamp_req[clamp_id], req.util, false);
>  
> @@ -8878,7 +8902,7 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
>  
>  	/* Update effective clamps to track the most restrictive value */
>  	cpu_util_update_eff(of_css(of));
> -
> +unlock:
>  	rcu_read_unlock();
>  	mutex_unlock(&uclamp_mutex);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-02 13:22 ` Quentin Perret
@ 2021-06-03  2:24   ` Xuewen Yan
  2021-06-04 16:08     ` Qais Yousef
  0 siblings, 1 reply; 15+ messages in thread
From: Xuewen Yan @ 2021-06-03  2:24 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel, Chunyan Zhang, Ryan Y,
	Patrick Bellasi, tj, qais.yousef

+CC Qais


Hi Quentin

On Wed, Jun 2, 2021 at 9:22 PM Quentin Perret <qperret@google.com> wrote:
>
> +CC Patrick and Tejun
>
> On Wednesday 02 Jun 2021 at 20:38:03 (+0800), Xuewen Yan wrote:
> > From: Xuewen Yan <xuewen.yan@unisoc.com>
> >
> > When setting cpu.uclamp.min/max in cgroup, there is no validating
> > like uclamp_validate() in __sched_setscheduler(). It may cause the
> > cpu.uclamp.min is bigger than cpu.uclamp.max.
>
> ISTR this was intentional. We also allow child groups to ask for
> whatever clamps they want, but that is always limited by the parent, and
> reflected in the 'effective' values, as per the cgroup delegation model.

It does not affect the 'effective' value. That because there is
protection in cpu_util_update_eff():
/* Ensure protection is always capped by limit */
eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX]);

When users set the cpu.uclamp.min > cpu.uclamp.max:
cpu.uclamp.max = 50;
to set : cpu.uclamp.min = 60;
That would make the uclamp_req[UCLAMP_MIN].value = 1024* 60% = 614,
uclamp_req[UCLAMP_MAX].value = 1024* 50% = 512;
But finally, the  uclamp[UCLAMP_MIN].value = uclamp[UCLAMP_MAX].value
= 1024* 50% = 512;

Is it deliberately set not to validate because of the above?

>
> > Although there is protection in cpu_util_update_eff():
> > “eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX])”, it's better
> > not to let it happen.
> >
> > Judging the uclamp value before setting uclamp_min/max, avoid
> > the cpu.uclamp.min is bigger than cpu.uclamp.max.
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > ---
> >  kernel/sched/core.c | 26 +++++++++++++++++++++++++-
> >  1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 5226cc26a095..520a2da40dc9 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -8867,6 +8867,30 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
> >       rcu_read_lock();
> >
> >       tg = css_tg(of_css(of));
> > +
> > +     switch (clamp_id) {
> > +     case UCLAMP_MIN: {
> > +             unsigned int uc_req_max = tg->uclamp_req[UCLAMP_MAX].value;
> > +
> > +             if (req.util > uc_req_max) {
> > +                     nbytes = -EINVAL;
> > +                     goto unlock;
> > +             }
> > +             break;
> > +     }
> > +     case UCLAMP_MAX: {
> > +             unsigned int uc_req_min = tg->uclamp_req[UCLAMP_MIN].value;
> > +
> > +             if (req.util < uc_req_min) {
> > +                     nbytes = -EINVAL;
> > +                     goto unlock;
> > +             }
> > +             break;
> > +     }
> > +     default:
> > +             nbytes = -EINVAL;
> > +             goto unlock;
> > +     }
> >       if (tg->uclamp_req[clamp_id].value != req.util)
> >               uclamp_se_set(&tg->uclamp_req[clamp_id], req.util, false);
> >
> > @@ -8878,7 +8902,7 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
> >
> >       /* Update effective clamps to track the most restrictive value */
> >       cpu_util_update_eff(of_css(of));
> > -
> > +unlock:
> >       rcu_read_unlock();
> >       mutex_unlock(&uclamp_mutex);
> >
> > --
> > 2.25.1
> >

When I change the code,I found the patch:

6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6a5124c..f97eb73 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1405,7 +1405,6 @@ uclamp_tg_restrict(struct task_struct *p, enum
uclamp_id clamp_id)
 {
  struct uclamp_se uc_req = p->uclamp_req[clamp_id];
 #ifdef CONFIG_UCLAMP_TASK_GROUP
- struct uclamp_se uc_max;

  /*
  * Tasks in autogroups or root task group will be
@@ -1416,9 +1415,23 @@ uclamp_tg_restrict(struct task_struct *p, enum
uclamp_id clamp_id)
  if (task_group(p) == &root_task_group)
  return uc_req;

- uc_max = task_group(p)->uclamp[clamp_id];
- if (uc_req.value > uc_max.value || !uc_req.user_defined)
- return uc_max;
+ switch (clamp_id) {
+ case UCLAMP_MIN: {
+ struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
+ if (uc_req.value < uc_min.value)
+ return uc_min;
+ break;
+ }
+ case UCLAMP_MAX: {
+ struct uclamp_se uc_max = task_group(p)->uclamp[clamp_id];
+ if (uc_req.value > uc_max.value)
+ return uc_max;
+ break;
+ }
+ default:
+ WARN_ON_ONCE(1);
+ break;
+ }
 #endif

When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
bigger than task_group(p)->uclamp[UCLAMP_MAX] ?
Because when the p->uclamp_req[UCLAMP_MIN] >  task_group(p)->uclamp[UCLAMP_MAX],
the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into [
task_group(p)->uclamp[UCLAMP_MAX], task_group(p)->uclamp[UCLAMP_MAX]
].

Thanks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-03  2:24   ` Xuewen Yan
@ 2021-06-04 16:08     ` Qais Yousef
  2021-06-04 16:22       ` Dietmar Eggemann
  2021-06-05  2:12       ` Xuewen Yan
  0 siblings, 2 replies; 15+ messages in thread
From: Qais Yousef @ 2021-06-04 16:08 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

On 06/03/21 10:24, Xuewen Yan wrote:
> +CC Qais

Thanks for the CC :)

> 
> 
> Hi Quentin
> 
> On Wed, Jun 2, 2021 at 9:22 PM Quentin Perret <qperret@google.com> wrote:
> >
> > +CC Patrick and Tejun
> >
> > On Wednesday 02 Jun 2021 at 20:38:03 (+0800), Xuewen Yan wrote:
> > > From: Xuewen Yan <xuewen.yan@unisoc.com>
> > >
> > > When setting cpu.uclamp.min/max in cgroup, there is no validating
> > > like uclamp_validate() in __sched_setscheduler(). It may cause the
> > > cpu.uclamp.min is bigger than cpu.uclamp.max.
> >
> > ISTR this was intentional. We also allow child groups to ask for
> > whatever clamps they want, but that is always limited by the parent, and
> > reflected in the 'effective' values, as per the cgroup delegation model.

As Quentin said. This intentional to comply with cgroup model.

See Limits and Protections sections in Documentation/admin-guide/cgroup-v2.rst

Specifically

	"all configuration combinations are valid"

So user can set cpu.uclamp.min higher than cpu.uclamp.max. But when we apply
the setting, cpu.uclamp.min will be capped by cpu.uclamp.max. I can see you
found the cpu_util_update_eff() logic.

> 
> It does not affect the 'effective' value. That because there is
> protection in cpu_util_update_eff():
> /* Ensure protection is always capped by limit */
> eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX]);
> 
> When users set the cpu.uclamp.min > cpu.uclamp.max:
> cpu.uclamp.max = 50;
> to set : cpu.uclamp.min = 60;
> That would make the uclamp_req[UCLAMP_MIN].value = 1024* 60% = 614,
> uclamp_req[UCLAMP_MAX].value = 1024* 50% = 512;
> But finally, the  uclamp[UCLAMP_MIN].value = uclamp[UCLAMP_MAX].value
> = 1024* 50% = 512;
> 
> Is it deliberately set not to validate because of the above?

Sorry I'm not following you here. What code paths were you trying to explain
here?

Did you actually hit any problem here?

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-04 16:08     ` Qais Yousef
@ 2021-06-04 16:22       ` Dietmar Eggemann
  2021-06-05  2:14         ` Xuewen Yan
  2021-06-05  2:12       ` Xuewen Yan
  1 sibling, 1 reply; 15+ messages in thread
From: Dietmar Eggemann @ 2021-06-04 16:22 UTC (permalink / raw)
  To: Qais Yousef, Xuewen Yan
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Steven Rostedt, Benjamin Segall, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel, Chunyan Zhang, Ryan Y,
	Patrick Bellasi, tj

On 04/06/2021 18:08, Qais Yousef wrote:
> On 06/03/21 10:24, Xuewen Yan wrote:
>> +CC Qais
> 
> Thanks for the CC :)
> 
>>
>>
>> Hi Quentin
>>
>> On Wed, Jun 2, 2021 at 9:22 PM Quentin Perret <qperret@google.com> wrote:
>>>
>>> +CC Patrick and Tejun
>>>
>>> On Wednesday 02 Jun 2021 at 20:38:03 (+0800), Xuewen Yan wrote:
>>>> From: Xuewen Yan <xuewen.yan@unisoc.com>
>>>>
>>>> When setting cpu.uclamp.min/max in cgroup, there is no validating
>>>> like uclamp_validate() in __sched_setscheduler(). It may cause the
>>>> cpu.uclamp.min is bigger than cpu.uclamp.max.
>>>
>>> ISTR this was intentional. We also allow child groups to ask for
>>> whatever clamps they want, but that is always limited by the parent, and
>>> reflected in the 'effective' values, as per the cgroup delegation model.
> 
> As Quentin said. This intentional to comply with cgroup model.
> 
> See Limits and Protections sections in Documentation/admin-guide/cgroup-v2.rst
> 
> Specifically
> 
> 	"all configuration combinations are valid"
> 
> So user can set cpu.uclamp.min higher than cpu.uclamp.max. But when we apply
> the setting, cpu.uclamp.min will be capped by cpu.uclamp.max. I can see you
> found the cpu_util_update_eff() logic.

To support this:

Patrick had appropriate checks in his `[PATCH v10 12/16] sched/core:
uclamp: Extend CPU's cgroup controller`.

https://lkml.kernel.org/r/20190621084217.8167-13-patrick.bellasi@arm.com

But is was discussed that cgroup v2 `resource distribution model`
configurations (here protection/limit: uclamp_min/uclamp_max) should not
be restricted.

Further down in this thread:

"... Limits always trump protection in effect of course but please don't
limit what can be configured..."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-04 16:08     ` Qais Yousef
  2021-06-04 16:22       ` Dietmar Eggemann
@ 2021-06-05  2:12       ` Xuewen Yan
  2021-06-05 11:49         ` Qais Yousef
  1 sibling, 1 reply; 15+ messages in thread
From: Xuewen Yan @ 2021-06-05  2:12 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

Hi Qais,

On Sat, Jun 5, 2021 at 12:08 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 06/03/21 10:24, Xuewen Yan wrote:
> > +CC Qais
>
> Thanks for the CC :)
>
> >
> >
> > Hi Quentin
> >
> > On Wed, Jun 2, 2021 at 9:22 PM Quentin Perret <qperret@google.com> wrote:
> > >
> > > +CC Patrick and Tejun
> > >
> > > On Wednesday 02 Jun 2021 at 20:38:03 (+0800), Xuewen Yan wrote:
> > > > From: Xuewen Yan <xuewen.yan@unisoc.com>
> > > >
> > > > When setting cpu.uclamp.min/max in cgroup, there is no validating
> > > > like uclamp_validate() in __sched_setscheduler(). It may cause the
> > > > cpu.uclamp.min is bigger than cpu.uclamp.max.
> > >
> > > ISTR this was intentional. We also allow child groups to ask for
> > > whatever clamps they want, but that is always limited by the parent, and
> > > reflected in the 'effective' values, as per the cgroup delegation model.
>
> As Quentin said. This intentional to comply with cgroup model.
>
> See Limits and Protections sections in Documentation/admin-guide/cgroup-v2.rst
>
> Specifically
>
>         "all configuration combinations are valid"
>
> So user can set cpu.uclamp.min higher than cpu.uclamp.max. But when we apply
> the setting, cpu.uclamp.min will be capped by cpu.uclamp.max. I can see you
> found the cpu_util_update_eff() logic.
>

Thanks a lot for your patience to explain, sorry for my ignorance of
Documentation/admin-guide/cgroup-v2.rst.

> >
> > It does not affect the 'effective' value. That because there is
> > protection in cpu_util_update_eff():
> > /* Ensure protection is always capped by limit */
> > eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX]);
> >
> > When users set the cpu.uclamp.min > cpu.uclamp.max:
> > cpu.uclamp.max = 50;
> > to set : cpu.uclamp.min = 60;
> > That would make the uclamp_req[UCLAMP_MIN].value = 1024* 60% = 614,
> > uclamp_req[UCLAMP_MAX].value = 1024* 50% = 512;
> > But finally, the  uclamp[UCLAMP_MIN].value = uclamp[UCLAMP_MAX].value
> > = 1024* 50% = 512;
> >
> > Is it deliberately set not to validate because of the above?
>
> Sorry I'm not following you here. What code paths were you trying to explain
> here?
>
> Did you actually hit any problem here?

I just gave an example of the difference of uclamp_req and uclamp
without my patch, and can ignore it.

>
In addition,In your patch:
6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com

+ switch (clamp_id) {
+ case UCLAMP_MIN: {
+ struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
+ if (uc_req.value < uc_min.value)
+ return uc_min;
+ break;

When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
bigger than task_group(p)->uclamp[UCLAMP_MAX] ?
Because when the p->uclamp_req[UCLAMP_MIN] >  task_group(p)->uclamp[UCLAMP_MAX],
the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into
[ task_group(p)->uclamp[UCLAMP_MAX],  task_group(p)->uclamp[UCLAMP_MAX] ].

Is it necessary to fix it here?

Thanks
xuewen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-04 16:22       ` Dietmar Eggemann
@ 2021-06-05  2:14         ` Xuewen Yan
  0 siblings, 0 replies; 15+ messages in thread
From: Xuewen Yan @ 2021-06-05  2:14 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Qais Yousef, Quentin Perret, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Steven Rostedt, Benjamin Segall,
	Mel Gorman, Daniel Bristot de Oliveira, linux-kernel,
	Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

On Sat, Jun 5, 2021 at 12:22 AM Dietmar Eggemann
<dietmar.eggemann@arm.com> wrote:
>
> On 04/06/2021 18:08, Qais Yousef wrote:
> > On 06/03/21 10:24, Xuewen Yan wrote:
> >> +CC Qais
> >
> > Thanks for the CC :)
> >
> >>
> >>
> >> Hi Quentin
> >>
> >> On Wed, Jun 2, 2021 at 9:22 PM Quentin Perret <qperret@google.com> wrote:
> >>>
> >>> +CC Patrick and Tejun
> >>>
> >>> On Wednesday 02 Jun 2021 at 20:38:03 (+0800), Xuewen Yan wrote:
> >>>> From: Xuewen Yan <xuewen.yan@unisoc.com>
> >>>>
> >>>> When setting cpu.uclamp.min/max in cgroup, there is no validating
> >>>> like uclamp_validate() in __sched_setscheduler(). It may cause the
> >>>> cpu.uclamp.min is bigger than cpu.uclamp.max.
> >>>
> >>> ISTR this was intentional. We also allow child groups to ask for
> >>> whatever clamps they want, but that is always limited by the parent, and
> >>> reflected in the 'effective' values, as per the cgroup delegation model.
> >
> > As Quentin said. This intentional to comply with cgroup model.
> >
> > See Limits and Protections sections in Documentation/admin-guide/cgroup-v2.rst
> >
> > Specifically
> >
> >       "all configuration combinations are valid"
> >
> > So user can set cpu.uclamp.min higher than cpu.uclamp.max. But when we apply
> > the setting, cpu.uclamp.min will be capped by cpu.uclamp.max. I can see you
> > found the cpu_util_update_eff() logic.
>
> To support this:
>
> Patrick had appropriate checks in his `[PATCH v10 12/16] sched/core:
> uclamp: Extend CPU's cgroup controller`.
>
> https://lkml.kernel.org/r/20190621084217.8167-13-patrick.bellasi@arm.com
>
> But is was discussed that cgroup v2 `resource distribution model`
> configurations (here protection/limit: uclamp_min/uclamp_max) should not
> be restricted.
>
> Further down in this thread:
>
> "... Limits always trump protection in effect of course but please don't
> limit what can be configured..."

Okay, I have got it. Thanks a lot!

BR
xuewen.yan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-05  2:12       ` Xuewen Yan
@ 2021-06-05 11:49         ` Qais Yousef
  2021-06-05 13:24           ` Xuewen Yan
  0 siblings, 1 reply; 15+ messages in thread
From: Qais Yousef @ 2021-06-05 11:49 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

On 06/05/21 10:12, Xuewen Yan wrote:
> Hi Qais,
> 
> On Sat, Jun 5, 2021 at 12:08 AM Qais Yousef <qais.yousef@arm.com> wrote:
> >
> > On 06/03/21 10:24, Xuewen Yan wrote:
> > > +CC Qais
> >
> > Thanks for the CC :)
> >
> > >
> > >
> > > Hi Quentin
> > >
> > > On Wed, Jun 2, 2021 at 9:22 PM Quentin Perret <qperret@google.com> wrote:
> > > >
> > > > +CC Patrick and Tejun
> > > >
> > > > On Wednesday 02 Jun 2021 at 20:38:03 (+0800), Xuewen Yan wrote:
> > > > > From: Xuewen Yan <xuewen.yan@unisoc.com>
> > > > >
> > > > > When setting cpu.uclamp.min/max in cgroup, there is no validating
> > > > > like uclamp_validate() in __sched_setscheduler(). It may cause the
> > > > > cpu.uclamp.min is bigger than cpu.uclamp.max.
> > > >
> > > > ISTR this was intentional. We also allow child groups to ask for
> > > > whatever clamps they want, but that is always limited by the parent, and
> > > > reflected in the 'effective' values, as per the cgroup delegation model.
> >
> > As Quentin said. This intentional to comply with cgroup model.
> >
> > See Limits and Protections sections in Documentation/admin-guide/cgroup-v2.rst
> >
> > Specifically
> >
> >         "all configuration combinations are valid"
> >
> > So user can set cpu.uclamp.min higher than cpu.uclamp.max. But when we apply
> > the setting, cpu.uclamp.min will be capped by cpu.uclamp.max. I can see you
> > found the cpu_util_update_eff() logic.
> >
> 
> Thanks a lot for your patience to explain, sorry for my ignorance of
> Documentation/admin-guide/cgroup-v2.rst.

No problem :)

> 
> > >
> > > It does not affect the 'effective' value. That because there is
> > > protection in cpu_util_update_eff():
> > > /* Ensure protection is always capped by limit */
> > > eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX]);
> > >
> > > When users set the cpu.uclamp.min > cpu.uclamp.max:
> > > cpu.uclamp.max = 50;
> > > to set : cpu.uclamp.min = 60;
> > > That would make the uclamp_req[UCLAMP_MIN].value = 1024* 60% = 614,
> > > uclamp_req[UCLAMP_MAX].value = 1024* 50% = 512;
> > > But finally, the  uclamp[UCLAMP_MIN].value = uclamp[UCLAMP_MAX].value
> > > = 1024* 50% = 512;
> > >
> > > Is it deliberately set not to validate because of the above?
> >
> > Sorry I'm not following you here. What code paths were you trying to explain
> > here?
> >
> > Did you actually hit any problem here?
> 
> I just gave an example of the difference of uclamp_req and uclamp
> without my patch, and can ignore it.

Cool.

> 
> >
> In addition,In your patch:
> 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> 
> + switch (clamp_id) {
> + case UCLAMP_MIN: {
> + struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> + if (uc_req.value < uc_min.value)
> + return uc_min;
> + break;
> 
> When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
> bigger than task_group(p)->uclamp[UCLAMP_MAX] ?

Because of the requirement I pointed you to in cgroup-v2.rst. We must allow any
value to be requested.

Ultimately if we had

	cpu.uclamp.min = 80
	cpu.uclamp.max = 50

then we want to remember the original request but make sure the effective value
is capped.

For the user in the future modifies the values such that

	cpu.uclamp.max = max

Then we want to remember cpu.uclamp.min = 80 and apply it since now the
cpu.uclamp.max was relaxed to allow the boost value.

> Because when the p->uclamp_req[UCLAMP_MIN] >  task_group(p)->uclamp[UCLAMP_MAX],
> the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into
> [ task_group(p)->uclamp[UCLAMP_MAX],  task_group(p)->uclamp[UCLAMP_MAX] ].
> 
> Is it necessary to fix it here?

Nope. We must allow any combination values to be accepted and remember them so
if one changes we ensure the new effective value is updated accordingly.
This is how cgroups API works.

Hope this makes sense.

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-05 11:49         ` Qais Yousef
@ 2021-06-05 13:24           ` Xuewen Yan
  2021-06-05 14:14             ` Qais Yousef
  2021-06-07 13:49             ` Qais Yousef
  0 siblings, 2 replies; 15+ messages in thread
From: Xuewen Yan @ 2021-06-05 13:24 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

Hi Qais

On Sat, Jun 5, 2021 at 7:49 PM Qais Yousef <qais.yousef@arm.com> wrote:
> > >
> > In addition,In your patch:
> > 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> > https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> >
> > + switch (clamp_id) {
> > + case UCLAMP_MIN: {
> > + struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> > + if (uc_req.value < uc_min.value)
> > + return uc_min;
> > + break;
> >
> > When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
> > bigger than task_group(p)->uclamp[UCLAMP_MAX] ?
>
> Because of the requirement I pointed you to in cgroup-v2.rst. We must allow any
> value to be requested.
>
> Ultimately if we had
>
>         cpu.uclamp.min = 80
>         cpu.uclamp.max = 50
>
> then we want to remember the original request but make sure the effective value
> is capped.
>
> For the user in the future modifies the values such that
>
>         cpu.uclamp.max = max
>
> Then we want to remember cpu.uclamp.min = 80 and apply it since now the
> cpu.uclamp.max was relaxed to allow the boost value.
>
> > Because when the p->uclamp_req[UCLAMP_MIN] >  task_group(p)->uclamp[UCLAMP_MAX],
> > the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into
> > [ task_group(p)->uclamp[UCLAMP_MAX],  task_group(p)->uclamp[UCLAMP_MAX] ].
> >
> > Is it necessary to fix it here?
>
> Nope. We must allow any combination values to be accepted and remember them so
> if one changes we ensure the new effective value is updated accordingly.
> This is how cgroups API works.

Sorry. I may not have expressed it clearly. In your patch (which has
not yet merged into the mainline):

6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
 https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com

This patch will not affect p->uclamp_req, but consider the following situation:

tg->cpu.uclamp.min = 0
tg->cpu.uclamp.max = 50%

p->uclamp_req[UCLAMP_MIN] = 60%
p->uclamp_req[UCLAMP_MIN] = 80%

The function call process is as follows:
uclamp_eff_value() -> uclamp_eff_get() ->uclamp_tg_restrict()

with your patch, the result is:

p->effective_uclamp_min = 60%
p->effective_uclamp_max = 50%

It would not affect the uclamp_task_util(p), but affect the rq:
when p enqueued:
rq->uclamp[UCLAMP_MIN] = 60%
rq->uclamp[UCLAMP_MIN] = 50%

futher more,  in uclamp_rq_util_with() {
...

min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value); //60%
max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);//50%
...
if (unlikely(min_util >= max_util))
return min_util;

return clamp(util, min_util, max_util);
...
}
as a result, it would return 60%.

Thanks!
xuewen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-05 13:24           ` Xuewen Yan
@ 2021-06-05 14:14             ` Qais Yousef
  2021-06-07 13:49             ` Qais Yousef
  1 sibling, 0 replies; 15+ messages in thread
From: Qais Yousef @ 2021-06-05 14:14 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

On 06/05/21 21:24, Xuewen Yan wrote:
> Hi Qais
> 
> On Sat, Jun 5, 2021 at 7:49 PM Qais Yousef <qais.yousef@arm.com> wrote:
> > > >
> > > In addition,In your patch:
> > > 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> > > https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> > >
> > > + switch (clamp_id) {
> > > + case UCLAMP_MIN: {
> > > + struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> > > + if (uc_req.value < uc_min.value)
> > > + return uc_min;
> > > + break;
> > >
> > > When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
> > > bigger than task_group(p)->uclamp[UCLAMP_MAX] ?
> >
> > Because of the requirement I pointed you to in cgroup-v2.rst. We must allow any
> > value to be requested.
> >
> > Ultimately if we had
> >
> >         cpu.uclamp.min = 80
> >         cpu.uclamp.max = 50
> >
> > then we want to remember the original request but make sure the effective value
> > is capped.
> >
> > For the user in the future modifies the values such that
> >
> >         cpu.uclamp.max = max
> >
> > Then we want to remember cpu.uclamp.min = 80 and apply it since now the
> > cpu.uclamp.max was relaxed to allow the boost value.
> >
> > > Because when the p->uclamp_req[UCLAMP_MIN] >  task_group(p)->uclamp[UCLAMP_MAX],
> > > the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into
> > > [ task_group(p)->uclamp[UCLAMP_MAX],  task_group(p)->uclamp[UCLAMP_MAX] ].
> > >
> > > Is it necessary to fix it here?
> >
> > Nope. We must allow any combination values to be accepted and remember them so
> > if one changes we ensure the new effective value is updated accordingly.
> > This is how cgroups API works.
> 
> Sorry. I may not have expressed it clearly. In your patch (which has
> not yet merged into the mainline):
> 
> 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
>  https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> 
> This patch will not affect p->uclamp_req, but consider the following situation:
> 
> tg->cpu.uclamp.min = 0
> tg->cpu.uclamp.max = 50%
> 
> p->uclamp_req[UCLAMP_MIN] = 60%
> p->uclamp_req[UCLAMP_MIN] = 80%
> 
> The function call process is as follows:
> uclamp_eff_value() -> uclamp_eff_get() ->uclamp_tg_restrict()
> 
> with your patch, the result is:
> 
> p->effective_uclamp_min = 60%
> p->effective_uclamp_max = 50%

Are you saying my patch introduced a regression? If there's a bug I would not
expect my patch to have had an impact in this area.

uclamp_tg_restrict() uses taskgroup(p)->uclamp[] which is the effective uclamp
that is capped in cpu_util_update_eff().

Are you statically analyzing the code or this is the outcome of an experiment
you ran on hardware?

Cheers

--
Qais Yousef

> 
> It would not affect the uclamp_task_util(p), but affect the rq:
> when p enqueued:
> rq->uclamp[UCLAMP_MIN] = 60%
> rq->uclamp[UCLAMP_MIN] = 50%
> 
> futher more,  in uclamp_rq_util_with() {
> ...
> 
> min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value); //60%
> max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);//50%
> ...
> if (unlikely(min_util >= max_util))
> return min_util;
> 
> return clamp(util, min_util, max_util);
> ...
> }
> as a result, it would return 60%.
> 
> Thanks!
> xuewen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-05 13:24           ` Xuewen Yan
  2021-06-05 14:14             ` Qais Yousef
@ 2021-06-07 13:49             ` Qais Yousef
  2021-06-08 11:45               ` Xuewen Yan
  1 sibling, 1 reply; 15+ messages in thread
From: Qais Yousef @ 2021-06-07 13:49 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

On 06/05/21 21:24, Xuewen Yan wrote:
> Hi Qais
> 
> On Sat, Jun 5, 2021 at 7:49 PM Qais Yousef <qais.yousef@arm.com> wrote:
> > > >
> > > In addition,In your patch:
> > > 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> > > https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> > >
> > > + switch (clamp_id) {
> > > + case UCLAMP_MIN: {
> > > + struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> > > + if (uc_req.value < uc_min.value)
> > > + return uc_min;
> > > + break;
> > >
> > > When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
> > > bigger than task_group(p)->uclamp[UCLAMP_MAX] ?
> >
> > Because of the requirement I pointed you to in cgroup-v2.rst. We must allow any
> > value to be requested.
> >
> > Ultimately if we had
> >
> >         cpu.uclamp.min = 80
> >         cpu.uclamp.max = 50
> >
> > then we want to remember the original request but make sure the effective value
> > is capped.
> >
> > For the user in the future modifies the values such that
> >
> >         cpu.uclamp.max = max
> >
> > Then we want to remember cpu.uclamp.min = 80 and apply it since now the
> > cpu.uclamp.max was relaxed to allow the boost value.
> >
> > > Because when the p->uclamp_req[UCLAMP_MIN] >  task_group(p)->uclamp[UCLAMP_MAX],
> > > the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into
> > > [ task_group(p)->uclamp[UCLAMP_MAX],  task_group(p)->uclamp[UCLAMP_MAX] ].
> > >
> > > Is it necessary to fix it here?
> >
> > Nope. We must allow any combination values to be accepted and remember them so
> > if one changes we ensure the new effective value is updated accordingly.
> > This is how cgroups API works.
> 
> Sorry. I may not have expressed it clearly. In your patch (which has
> not yet merged into the mainline):
> 
> 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
>  https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> 
> This patch will not affect p->uclamp_req, but consider the following situation:
> 
> tg->cpu.uclamp.min = 0
> tg->cpu.uclamp.max = 50%
> 
> p->uclamp_req[UCLAMP_MIN] = 60%
> p->uclamp_req[UCLAMP_MIN] = 80%
> 
> The function call process is as follows:
> uclamp_eff_value() -> uclamp_eff_get() ->uclamp_tg_restrict()
> 
> with your patch, the result is:
> 
> p->effective_uclamp_min = 60%
> p->effective_uclamp_max = 50%
> 
> It would not affect the uclamp_task_util(p), but affect the rq:
> when p enqueued:
> rq->uclamp[UCLAMP_MIN] = 60%
> rq->uclamp[UCLAMP_MIN] = 50%
> 
> futher more,  in uclamp_rq_util_with() {
> ...
> 
> min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value); //60%
> max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);//50%
> ...
> if (unlikely(min_util >= max_util))
> return min_util;
> 
> return clamp(util, min_util, max_util);
> ...
> }
> as a result, it would return 60%.

Looking at this again now, I better understand what you were trying to say.
I got confused that you were still arguing about cgroup inverted
cpu.uclamp.min/max, but you're actually talking about something else.

It would be a lot easier to not cross talk threads and reply to my patch
directly with this remark.

Anyways, still well spotted!

What you're saying is we need something like the patch below to ensure that the
*task request* is within tg uclamp range, right? The worry is that the task's
uclamp_min is higher than the tg's uclamp_min, so we end up with the inversion
because of that which will not be corrected later.

Hmm I need to think a bit more about this..

Cheers

--
Qais Yousef

--->8---

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9e9a5be35cde..e867813b9d5e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1405,6 +1405,7 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
 {
        struct uclamp_se uc_req = p->uclamp_req[clamp_id];
 #ifdef CONFIG_UCLAMP_TASK_GROUP
+       unsigned long uc_min, uc_max, val;

        /*
         * Tasks in autogroups or root task group will be
@@ -1415,23 +1416,10 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
        if (task_group(p) == &root_task_group)
                return uc_req;

-       switch (clamp_id) {
-       case UCLAMP_MIN: {
-               struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
-               if (uc_req.value < uc_min.value)
-                       return uc_min;
-               break;
-       }
-       case UCLAMP_MAX: {
-               struct uclamp_se uc_max = task_group(p)->uclamp[clamp_id];
-               if (uc_req.value > uc_max.value)
-                       return uc_max;
-               break;
-       }
-       default:
-               WARN_ON_ONCE(1);
-               break;
-       }
+       uc_min = task_group(p)->uclamp[UCLAMP_MIN].value;
+       uc_max = task_group(p)->uclamp[UCLAMP_MAX].value;
+       val = uc_req.value;
+       uc_req.value = clamp(val, uc_min, uc_max);
 #endif

        return uc_req;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-07 13:49             ` Qais Yousef
@ 2021-06-08 11:45               ` Xuewen Yan
  2021-06-08 14:25                 ` Qais Yousef
  0 siblings, 1 reply; 15+ messages in thread
From: Xuewen Yan @ 2021-06-08 11:45 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

First of all, sorry for the late reply..

On Mon, Jun 7, 2021 at 9:49 PM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 06/05/21 21:24, Xuewen Yan wrote:
> > Hi Qais
> >
> > On Sat, Jun 5, 2021 at 7:49 PM Qais Yousef <qais.yousef@arm.com> wrote:
> > > > >
> > > > In addition,In your patch:
> > > > 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> > > > https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> > > >
> > > > + switch (clamp_id) {
> > > > + case UCLAMP_MIN: {
> > > > + struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> > > > + if (uc_req.value < uc_min.value)
> > > > + return uc_min;
> > > > + break;
> > > >
> > > > When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
> > > > bigger than task_group(p)->uclamp[UCLAMP_MAX] ?
> > >
> > > Because of the requirement I pointed you to in cgroup-v2.rst. We must allow any
> > > value to be requested.
> > >
> > > Ultimately if we had
> > >
> > >         cpu.uclamp.min = 80
> > >         cpu.uclamp.max = 50
> > >
> > > then we want to remember the original request but make sure the effective value
> > > is capped.
> > >
> > > For the user in the future modifies the values such that
> > >
> > >         cpu.uclamp.max = max
> > >
> > > Then we want to remember cpu.uclamp.min = 80 and apply it since now the
> > > cpu.uclamp.max was relaxed to allow the boost value.
> > >
> > > > Because when the p->uclamp_req[UCLAMP_MIN] >  task_group(p)->uclamp[UCLAMP_MAX],
> > > > the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into
> > > > [ task_group(p)->uclamp[UCLAMP_MAX],  task_group(p)->uclamp[UCLAMP_MAX] ].
> > > >
> > > > Is it necessary to fix it here?
> > >
> > > Nope. We must allow any combination values to be accepted and remember them so
> > > if one changes we ensure the new effective value is updated accordingly.
> > > This is how cgroups API works.
> >
> > Sorry. I may not have expressed it clearly. In your patch (which has
> > not yet merged into the mainline):
> >
> > 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> >  https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> >
> > This patch will not affect p->uclamp_req, but consider the following situation:
> >
> > tg->cpu.uclamp.min = 0
> > tg->cpu.uclamp.max = 50%
> >
> > p->uclamp_req[UCLAMP_MIN] = 60%
> > p->uclamp_req[UCLAMP_MIN] = 80%

sorry, here should be
p->uclamp_req[UCLAMP_MIN] = 60%
p->uclamp_req[UCLAMP_MAX] = 80%

> >
> > The function call process is as follows:
> > uclamp_eff_value() -> uclamp_eff_get() ->uclamp_tg_restrict()
> >
> > with your patch, the result is:
> >
> > p->effective_uclamp_min = 60%
> > p->effective_uclamp_max = 50%
> >
> > It would not affect the uclamp_task_util(p), but affect the rq:
> > when p enqueued:
> > rq->uclamp[UCLAMP_MIN] = 60%
> > rq->uclamp[UCLAMP_MIN] = 50%

sorry, here should be
rq->uclamp[UCLAMP_MIN] = 60%
rq->uclamp[UCLAMP_MAX] = 50%

> >
> > futher more,  in uclamp_rq_util_with() {
> > ...
> >
> > min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value); //60%
> > max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);//50%
> > ...
> > if (unlikely(min_util >= max_util))
> > return min_util;
> >
> > return clamp(util, min_util, max_util);
> > ...
> > }
> > as a result, it would return 60%.
>
> Looking at this again now, I better understand what you were trying to say.
> I got confused that you were still arguing about cgroup inverted
> cpu.uclamp.min/max, but you're actually talking about something else.

Generally speaking, this kind of situation does not basically exist,
but I just consider all the situations that can occur when users use
it.

>
> It would be a lot easier to not cross talk threads and reply to my patch
> directly with this remark.
Sorry for the trouble because of my unfamiliar with the maillist, I
will pay attention next time :)

>
> Anyways, still well spotted!
>
> What you're saying is we need something like the patch below to ensure that the
> *task request* is within tg uclamp range, right? The worry is that the task's
> uclamp_min is higher than the tg's uclamp_min, so we end up with the inversion
> because of that which will not be corrected later.

Yeah,  the task's uclamp_min is higher than the tg's uclamp_max.
>
> Hmm I need to think a bit more about this..
>
> Cheers
>
> --
> Qais Yousef
>
> --->8---
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9e9a5be35cde..e867813b9d5e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1405,6 +1405,7 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>  {
>         struct uclamp_se uc_req = p->uclamp_req[clamp_id];
>  #ifdef CONFIG_UCLAMP_TASK_GROUP
> +       unsigned long uc_min, uc_max, val;
>
>         /*
>          * Tasks in autogroups or root task group will be
> @@ -1415,23 +1416,10 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>         if (task_group(p) == &root_task_group)
>                 return uc_req;
>
> -       switch (clamp_id) {
> -       case UCLAMP_MIN: {
> -               struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> -               if (uc_req.value < uc_min.value)
> -                       return uc_min;
> -               break;
> -       }
> -       case UCLAMP_MAX: {
> -               struct uclamp_se uc_max = task_group(p)->uclamp[clamp_id];
> -               if (uc_req.value > uc_max.value)
> -                       return uc_max;
> -               break;
> -       }
> -       default:
> -               WARN_ON_ONCE(1);
> -               break;
> -       }
> +       uc_min = task_group(p)->uclamp[UCLAMP_MIN].value;
> +       uc_max = task_group(p)->uclamp[UCLAMP_MAX].value;
> +       val = uc_req.value;
> +       uc_req.value = clamp(val, uc_min, uc_max);

This is not a good solution, because it just clamp the uc_req.value,
but the  uc_req.bucket_id is not changed.

>  #endif
>
>         return uc_req;
>

Thanks!
xuewen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-08 11:45               ` Xuewen Yan
@ 2021-06-08 14:25                 ` Qais Yousef
  2021-06-08 15:01                   ` Xuewen Yan
  0 siblings, 1 reply; 15+ messages in thread
From: Qais Yousef @ 2021-06-08 14:25 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

On 06/08/21 19:45, Xuewen Yan wrote:
> > Looking at this again now, I better understand what you were trying to say.
> > I got confused that you were still arguing about cgroup inverted
> > cpu.uclamp.min/max, but you're actually talking about something else.
> 
> Generally speaking, this kind of situation does not basically exist,
> but I just consider all the situations that can occur when users use
> it.

+1

> 
> >
> > It would be a lot easier to not cross talk threads and reply to my patch
> > directly with this remark.
> Sorry for the trouble because of my unfamiliar with the maillist, I
> will pay attention next time :)

Not really a problem, it was just a bit confusing to get the right context :)

> > +       uc_min = task_group(p)->uclamp[UCLAMP_MIN].value;
> > +       uc_max = task_group(p)->uclamp[UCLAMP_MAX].value;
> > +       val = uc_req.value;
> > +       uc_req.value = clamp(val, uc_min, uc_max);
> 
> This is not a good solution, because it just clamp the uc_req.value,
> but the  uc_req.bucket_id is not changed.


This is what I actually have now. I did move to using uclamp_se_set().
I also needed to modify uclamp_update_active_tasks() so that both
uclamp_min/max unconditionally.

I still need to sleep on it to make sure I haven't missed something else, but
it looks fine so far.

Thanks!

--
Qais Yousef


--->8---

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9e9a5be35cde..1d2d3e6648a6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1403,38 +1403,28 @@ static void uclamp_sync_util_min_rt_default(void)
 static inline struct uclamp_se
 uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
 {
-       struct uclamp_se uc_req = p->uclamp_req[clamp_id];
+       /* Copy by value as we could modify it */
+       struct uclamp_se uc_eff = p->uclamp_req[clamp_id];
 #ifdef CONFIG_UCLAMP_TASK_GROUP
+       unsigned long tg_min, tg_max, value;

        /*
         * Tasks in autogroups or root task group will be
         * restricted by system defaults.
         */
        if (task_group_is_autogroup(task_group(p)))
-               return uc_req;
+               return uc_eff;
        if (task_group(p) == &root_task_group)
-               return uc_req;
+               return uc_eff;

-       switch (clamp_id) {
-       case UCLAMP_MIN: {
-               struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
-               if (uc_req.value < uc_min.value)
-                       return uc_min;
-               break;
-       }
-       case UCLAMP_MAX: {
-               struct uclamp_se uc_max = task_group(p)->uclamp[clamp_id];
-               if (uc_req.value > uc_max.value)
-                       return uc_max;
-               break;
-       }
-       default:
-               WARN_ON_ONCE(1);
-               break;
-       }
+       tg_min = task_group(p)->uclamp[UCLAMP_MIN].value;
+       tg_max = task_group(p)->uclamp[UCLAMP_MAX].value;
+       value = uc_eff.value;
+       value = clamp(value, tg_min, tg_max);
+       uclamp_se_set(&uc_eff, value, false);
 #endif

-       return uc_req;
+       return uc_eff;
 }
 
 /*
@@ -1661,8 +1651,7 @@ uclamp_update_active(struct task_struct *p, enum uclamp_id clamp_id)
 
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 static inline void
-uclamp_update_active_tasks(struct cgroup_subsys_state *css,
-                          unsigned int clamps)
+uclamp_update_active_tasks(struct cgroup_subsys_state *css)
 {
        enum uclamp_id clamp_id;
        struct css_task_iter it;
@@ -1670,10 +1659,8 @@ uclamp_update_active_tasks(struct cgroup_subsys_state *css,
 
        css_task_iter_start(css, 0, &it);
        while ((p = css_task_iter_next(&it))) {
-               for_each_clamp_id(clamp_id) {
-                       if ((0x1 << clamp_id) & clamps)
-                               uclamp_update_active(p, clamp_id);
-               }
+               for_each_clamp_id(clamp_id)
+                       uclamp_update_active(p, clamp_id);
        }
        css_task_iter_end(&it);
 }
@@ -9626,7 +9613,7 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css)
                }
 
                /* Immediately update descendants RUNNABLE tasks */
-               uclamp_update_active_tasks(css, clamps);
+               uclamp_update_active_tasks(css);
        }
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-08 14:25                 ` Qais Yousef
@ 2021-06-08 15:01                   ` Xuewen Yan
  2021-06-08 18:21                     ` Qais Yousef
  0 siblings, 1 reply; 15+ messages in thread
From: Xuewen Yan @ 2021-06-08 15:01 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

Hi

On Tue, Jun 8, 2021 at 10:25 PM Qais Yousef <qais.yousef@arm.com> wrote:
>
> --->8---
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9e9a5be35cde..1d2d3e6648a6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1403,38 +1403,28 @@ static void uclamp_sync_util_min_rt_default(void)
>  static inline struct uclamp_se
>  uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>  {
> -       struct uclamp_se uc_req = p->uclamp_req[clamp_id];
> +       /* Copy by value as we could modify it */
> +       struct uclamp_se uc_eff = p->uclamp_req[clamp_id];
>  #ifdef CONFIG_UCLAMP_TASK_GROUP
> +       unsigned long tg_min, tg_max, value;
>
>         /*
>          * Tasks in autogroups or root task group will be
>          * restricted by system defaults.
>          */
>         if (task_group_is_autogroup(task_group(p)))
> -               return uc_req;
> +               return uc_eff;
>         if (task_group(p) == &root_task_group)
> -               return uc_req;
> +               return uc_eff;
>
> -       switch (clamp_id) {
> -       case UCLAMP_MIN: {
> -               struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> -               if (uc_req.value < uc_min.value)
> -                       return uc_min;
> -               break;
> -       }
> -       case UCLAMP_MAX: {
> -               struct uclamp_se uc_max = task_group(p)->uclamp[clamp_id];
> -               if (uc_req.value > uc_max.value)
> -                       return uc_max;
> -               break;
> -       }
> -       default:
> -               WARN_ON_ONCE(1);
> -               break;
> -       }
> +       tg_min = task_group(p)->uclamp[UCLAMP_MIN].value;
> +       tg_max = task_group(p)->uclamp[UCLAMP_MAX].value;
> +       value = uc_eff.value;
> +       value = clamp(value, tg_min, tg_max);
> +       uclamp_se_set(&uc_eff, value, false);

Is it reasonable to set user_defined to be false here?

>  #endif
>
> -       return uc_req;
> +       return uc_eff;
>  }
>
>  /*
> @@ -1661,8 +1651,7 @@ uclamp_update_active(struct task_struct *p, enum uclamp_id clamp_id)
>
>  #ifdef CONFIG_UCLAMP_TASK_GROUP
>  static inline void
> -uclamp_update_active_tasks(struct cgroup_subsys_state *css,
> -                          unsigned int clamps)
> +uclamp_update_active_tasks(struct cgroup_subsys_state *css)
>  {
>         enum uclamp_id clamp_id;
>         struct css_task_iter it;
> @@ -1670,10 +1659,8 @@ uclamp_update_active_tasks(struct cgroup_subsys_state *css,
>
>         css_task_iter_start(css, 0, &it);
>         while ((p = css_task_iter_next(&it))) {
> -               for_each_clamp_id(clamp_id) {
> -                       if ((0x1 << clamp_id) & clamps)
> -                               uclamp_update_active(p, clamp_id);
> -               }
> +               for_each_clamp_id(clamp_id)
> +                       uclamp_update_active(p, clamp_id);
>         }
>         css_task_iter_end(&it);
>  }
> @@ -9626,7 +9613,7 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css)
>                 }
>
>                 /* Immediately update descendants RUNNABLE tasks */
> -               uclamp_update_active_tasks(css, clamps);
> +               uclamp_update_active_tasks(css);
>         }
>  }

Would you resend another email? maybe it would be better to resend an
email with a new subject?

BR
xuewen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
  2021-06-08 15:01                   ` Xuewen Yan
@ 2021-06-08 18:21                     ` Qais Yousef
  0 siblings, 0 replies; 15+ messages in thread
From: Qais Yousef @ 2021-06-08 18:21 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Quentin Perret, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, Daniel Bristot de Oliveira,
	linux-kernel, Chunyan Zhang, Ryan Y, Patrick Bellasi, tj

On 06/08/21 23:01, Xuewen Yan wrote:
> Hi
> 
> On Tue, Jun 8, 2021 at 10:25 PM Qais Yousef <qais.yousef@arm.com> wrote:
> >
> > --->8---
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 9e9a5be35cde..1d2d3e6648a6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1403,38 +1403,28 @@ static void uclamp_sync_util_min_rt_default(void)
> >  static inline struct uclamp_se
> >  uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> >  {
> > -       struct uclamp_se uc_req = p->uclamp_req[clamp_id];
> > +       /* Copy by value as we could modify it */
> > +       struct uclamp_se uc_eff = p->uclamp_req[clamp_id];
> >  #ifdef CONFIG_UCLAMP_TASK_GROUP
> > +       unsigned long tg_min, tg_max, value;
> >
> >         /*
> >          * Tasks in autogroups or root task group will be
> >          * restricted by system defaults.
> >          */
> >         if (task_group_is_autogroup(task_group(p)))
> > -               return uc_req;
> > +               return uc_eff;
> >         if (task_group(p) == &root_task_group)
> > -               return uc_req;
> > +               return uc_eff;
> >
> > -       switch (clamp_id) {
> > -       case UCLAMP_MIN: {
> > -               struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> > -               if (uc_req.value < uc_min.value)
> > -                       return uc_min;
> > -               break;
> > -       }
> > -       case UCLAMP_MAX: {
> > -               struct uclamp_se uc_max = task_group(p)->uclamp[clamp_id];
> > -               if (uc_req.value > uc_max.value)
> > -                       return uc_max;
> > -               break;
> > -       }
> > -       default:
> > -               WARN_ON_ONCE(1);
> > -               break;
> > -       }
> > +       tg_min = task_group(p)->uclamp[UCLAMP_MIN].value;
> > +       tg_max = task_group(p)->uclamp[UCLAMP_MAX].value;
> > +       value = uc_eff.value;
> > +       value = clamp(value, tg_min, tg_max);
> > +       uclamp_se_set(&uc_eff, value, false);
> 
> Is it reasonable to set user_defined to be false here?

Yep, it doesn't really matter for the effective value. It matters for the
actual task request.

> 
> >  #endif
> >
> > -       return uc_req;
> > +       return uc_eff;
> >  }
> >
> >  /*
> > @@ -1661,8 +1651,7 @@ uclamp_update_active(struct task_struct *p, enum uclamp_id clamp_id)
> >
> >  #ifdef CONFIG_UCLAMP_TASK_GROUP
> >  static inline void
> > -uclamp_update_active_tasks(struct cgroup_subsys_state *css,
> > -                          unsigned int clamps)
> > +uclamp_update_active_tasks(struct cgroup_subsys_state *css)
> >  {
> >         enum uclamp_id clamp_id;
> >         struct css_task_iter it;
> > @@ -1670,10 +1659,8 @@ uclamp_update_active_tasks(struct cgroup_subsys_state *css,
> >
> >         css_task_iter_start(css, 0, &it);
> >         while ((p = css_task_iter_next(&it))) {
> > -               for_each_clamp_id(clamp_id) {
> > -                       if ((0x1 << clamp_id) & clamps)
> > -                               uclamp_update_active(p, clamp_id);
> > -               }
> > +               for_each_clamp_id(clamp_id)
> > +                       uclamp_update_active(p, clamp_id);
> >         }
> >         css_task_iter_end(&it);
> >  }
> > @@ -9626,7 +9613,7 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css)
> >                 }
> >
> >                 /* Immediately update descendants RUNNABLE tasks */
> > -               uclamp_update_active_tasks(css, clamps);
> > +               uclamp_update_active_tasks(css);
> >         }
> >  }
> 
> Would you resend another email? maybe it would be better to resend an
> email with a new subject?

Yeah I will do a proper posting. But I need to stare at this a bit more then
will do.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-06-08 18:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02 12:38 [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max Xuewen Yan
2021-06-02 13:22 ` Quentin Perret
2021-06-03  2:24   ` Xuewen Yan
2021-06-04 16:08     ` Qais Yousef
2021-06-04 16:22       ` Dietmar Eggemann
2021-06-05  2:14         ` Xuewen Yan
2021-06-05  2:12       ` Xuewen Yan
2021-06-05 11:49         ` Qais Yousef
2021-06-05 13:24           ` Xuewen Yan
2021-06-05 14:14             ` Qais Yousef
2021-06-07 13:49             ` Qais Yousef
2021-06-08 11:45               ` Xuewen Yan
2021-06-08 14:25                 ` Qais Yousef
2021-06-08 15:01                   ` Xuewen Yan
2021-06-08 18:21                     ` Qais Yousef

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox