Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max

From: Xuewen Yan <xuewen.yan94@gmail.com>
To: Qais Yousef <qais.yousef@arm.com>
Cc: Quentin Perret <qperret@google.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Benjamin Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Chunyan Zhang <zhang.lyra@gmail.com>,
	Ryan Y <xuewyan@foxmail.com>,
	Patrick Bellasi <patrick.bellasi@matbug.net>,
	tj@kernel.org
Subject: Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max
Date: Tue, 8 Jun 2021 19:45:32 +0800	[thread overview]
Message-ID: <CAB8ipk8FaovUYY8ncDgLHO7k_EoEHtsfm+1QYsFTMf4fb7ix_A@mail.gmail.com> (raw)
In-Reply-To: <20210607134902.nlgvqtzj35rhjg7x@e107158-lin.cambridge.arm.com>

First of all, sorry for the late reply..

On Mon, Jun 7, 2021 at 9:49 PM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 06/05/21 21:24, Xuewen Yan wrote:
> > Hi Qais
> >
> > On Sat, Jun 5, 2021 at 7:49 PM Qais Yousef <qais.yousef@arm.com> wrote:
> > > > >
> > > > In addition，In your patch:
> > > > 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> > > > https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> > > >
> > > > + switch (clamp_id) {
> > > > + case UCLAMP_MIN: {
> > > > + struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> > > > + if (uc_req.value < uc_min.value)
> > > > + return uc_min;
> > > > + break;
> > > >
> > > > When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
> > > > bigger than task_group(p)->uclamp[UCLAMP_MAX] ?
> > >
> > > Because of the requirement I pointed you to in cgroup-v2.rst. We must allow any
> > > value to be requested.
> > >
> > > Ultimately if we had
> > >
> > >         cpu.uclamp.min = 80
> > >         cpu.uclamp.max = 50
> > >
> > > then we want to remember the original request but make sure the effective value
> > > is capped.
> > >
> > > For the user in the future modifies the values such that
> > >
> > >         cpu.uclamp.max = max
> > >
> > > Then we want to remember cpu.uclamp.min = 80 and apply it since now the
> > > cpu.uclamp.max was relaxed to allow the boost value.
> > >
> > > > Because when the p->uclamp_req[UCLAMP_MIN] >  task_group(p)->uclamp[UCLAMP_MAX],
> > > > the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into
> > > > [ task_group(p)->uclamp[UCLAMP_MAX],  task_group(p)->uclamp[UCLAMP_MAX] ].
> > > >
> > > > Is it necessary to fix it here？
> > >
> > > Nope. We must allow any combination values to be accepted and remember them so
> > > if one changes we ensure the new effective value is updated accordingly.
> > > This is how cgroups API works.
> >
> > Sorry. I may not have expressed it clearly. In your patch (which has
> > not yet merged into the mainline):
> >
> > 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> >  https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
> >
> > This patch will not affect p->uclamp_req, but consider the following situation:
> >
> > tg->cpu.uclamp.min = 0
> > tg->cpu.uclamp.max = 50%
> >
> > p->uclamp_req[UCLAMP_MIN] = 60%
> > p->uclamp_req[UCLAMP_MIN] = 80%

sorry, here should be
p->uclamp_req[UCLAMP_MIN] = 60%
p->uclamp_req[UCLAMP_MAX] = 80%

> >
> > The function call process is as follows：
> > uclamp_eff_value() -> uclamp_eff_get() ->uclamp_tg_restrict()
> >
> > with your patch, the result is:
> >
> > p->effective_uclamp_min = 60%
> > p->effective_uclamp_max = 50%
> >
> > It would not affect the uclamp_task_util(p), but affect the rq:
> > when p enqueued:
> > rq->uclamp[UCLAMP_MIN] = 60%
> > rq->uclamp[UCLAMP_MIN] = 50%

sorry, here should be
rq->uclamp[UCLAMP_MIN] = 60%
rq->uclamp[UCLAMP_MAX] = 50%

> >
> > futher more,  in uclamp_rq_util_with() {
> > ...
> >
> > min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value); //60%
> > max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);//50%
> > ...
> > if (unlikely(min_util >= max_util))
> > return min_util;
> >
> > return clamp(util, min_util, max_util);
> > ...
> > }
> > as a result, it would return 60%.
>
> Looking at this again now, I better understand what you were trying to say.
> I got confused that you were still arguing about cgroup inverted
> cpu.uclamp.min/max, but you're actually talking about something else.

Generally speaking, this kind of situation does not basically exist,
but I just consider all the situations that can occur when users use
it.

>
> It would be a lot easier to not cross talk threads and reply to my patch
> directly with this remark.
Sorry for the trouble because of my unfamiliar with the maillist, I
will pay attention next time :）

>
> Anyways, still well spotted!
>
> What you're saying is we need something like the patch below to ensure that the
> *task request* is within tg uclamp range, right? The worry is that the task's
> uclamp_min is higher than the tg's uclamp_min, so we end up with the inversion
> because of that which will not be corrected later.

Yeah,  the task's uclamp_min is higher than the tg's uclamp_max.
>
> Hmm I need to think a bit more about this..
>
> Cheers
>
> --
> Qais Yousef
>
> --->8---
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9e9a5be35cde..e867813b9d5e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1405,6 +1405,7 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>  {
>         struct uclamp_se uc_req = p->uclamp_req[clamp_id];
>  #ifdef CONFIG_UCLAMP_TASK_GROUP
> +       unsigned long uc_min, uc_max, val;
>
>         /*
>          * Tasks in autogroups or root task group will be
> @@ -1415,23 +1416,10 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>         if (task_group(p) == &root_task_group)
>                 return uc_req;
>
> -       switch (clamp_id) {
> -       case UCLAMP_MIN: {
> -               struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> -               if (uc_req.value < uc_min.value)
> -                       return uc_min;
> -               break;
> -       }
> -       case UCLAMP_MAX: {
> -               struct uclamp_se uc_max = task_group(p)->uclamp[clamp_id];
> -               if (uc_req.value > uc_max.value)
> -                       return uc_max;
> -               break;
> -       }
> -       default:
> -               WARN_ON_ONCE(1);
> -               break;
> -       }
> +       uc_min = task_group(p)->uclamp[UCLAMP_MIN].value;
> +       uc_max = task_group(p)->uclamp[UCLAMP_MAX].value;
> +       val = uc_req.value;
> +       uc_req.value = clamp(val, uc_min, uc_max);

This is not a good solution, because it just clamp the uc_req.value,
but the  uc_req.bucket_id is not changed.

>  #endif
>
>         return uc_req;
>

Thanks!
xuewen