linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: Qais Yousef <qais.yousef@arm.com>,
	Xuewen Yan <xuewen.yan@unisoc.com>,
	dietmar.eggemann@arm.com, lukasz.luba@arm.com, rafael@kernel.org,
	viresh.kumar@linaro.org, mingo@redhat.com, peterz@infradead.org,
	rostedt@goodmis.org, linux-kernel@vger.kernel.org,
	di.shen@unisoc.com
Subject: Re: [PATCH] sched: Take thermal pressure into account when determine rt fits capacity
Date: Tue, 26 Apr 2022 10:09:35 +0200	[thread overview]
Message-ID: <CAKfTPtBswQ6bk8MrvcLmqc-9V2-SeWn3H_-gRvF5isdjL_acqA@mail.gmail.com> (raw)
In-Reply-To: <CAB8ipk9hZXDcTV3hakRV+dE5dwKtg-Ka93WZ60ds0=4ErN1-0w@mail.gmail.com>

On Tue, 26 Apr 2022 at 04:07, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>
> On Tue, Apr 26, 2022 at 12:12 AM Qais Yousef <qais.yousef@arm.com> wrote:
> >
> > On 04/25/22 09:31, Xuewen Yan wrote:
> > > On Fri, Apr 22, 2022 at 12:15 AM Qais Yousef <qais.yousef@arm.com> wrote:
> > > > Is it okay to share what the capacities of the littles, mediums and bigs on
> > > > your system? And how they change under worst case scenario thermal pressure?
> > > > Only IF you have these numbers handy :-)
> > >
> > > Okay, the little/mid/big cpu scale capacity is 350/930/1024, but the
> > > cpu frequency point is discrete, the big core's high freq point may is
> > > just a few more than the mid core's highest.
> > > In this case, once the thermal decrease the scaling_max_freq, the
> > > maximum frequency of the large core is easily lower than that of the
> > > medium core.
> > > Of course, the corner case is due to the frequency design of the soc
> > > and  our thermal algorithm.
> >
> > Okay, thanks for the info!
> >
> > >
> > > >
> > > > Is it actually an indication of a potential other problem if you swing into
> > > > capacity inversion in the bigs that often? I've seen a lot of systems where the
> > > > difference between the meds and bigs is small. But frequent inversion could be
> > > > suspicious still.
> > > >
> > > > Do the littles and the mediums experience any significant thermal pressure too?
> > >
> > > In our platform, it's not.
> >
> > Good.
> >
> > > > It doesn't seem it'll cause a significant error, but still it seems to me this
> > > > function wants the original capacity passed to it.
> > > >
> > > > There are similar questions to be asked since you modify sg_cpu->max. Every
> > > > user needs to be audited if they're fine with this change or not.
> > > >
> > > > I'm not sure still what we are achieving here. You want to force schedutil not
> > > > to request higher frequencies if thermal pressure is high? Should schedutil
> > > > actually care? Shouldn't the cpufreq driver reject this request and pick the
> > > > next best thing if it can't satisfy it? I could be missing something, I haven't
> > > > looked that hard tbh :-)
> > >
> > > I changed this just want to make it more responsive to the real
> > > capacity of the cpu, if it will cause other problems, maybe it would
> > > be better not to change it.:)
> >
> > There are others who can give you a better opinion. But AFAICS we're not fixing
> > anything but risking breaking other things. So I vote for not to change it :)
> >
> > > > It depends on the severity of the problem. The simplest thing I can suggest is
> > > > to check if the cpu is in capacity inversion state, and if it is, then make
> > > > rt_task_fits_capacity() return false always.
> > > >
> > > > If we need a generic solution to handle thermal pressure omitting OPPs, then
> > > > the search needs to become more complex. The proposal in this patch is not
> > > > adequate because tasks that want to run at capacity_orig_of(cpu) will wrongly
> > > > omit some cpus because of any tiny thermal pressure. For example if the
> > > > capacity_orig_of(medium_cpu) = 700, and uclamp_min for RT is set to 700, then
> > > > any small thermal pressure on mediums will cause these tasks to run on big cpus
> > > > only, which is not what we want. Especially if these big cpus can end up in
> > > > capacity inversion later ;-)
> > > >
> > > > So if we want to handle this case, then we need to ensure the search returns
> > > > false only if
> > > >
> > > >         1. Thermal pressure results in real OPP to be omitted.
> > > >         2. Another CPU that can provide this performance level is available.
> > > >
> > > > Otherwise we should still fit it on this CPU because it'll give us the closest
> > > > thing to what was requested.
> > > >
> > > > I can think of 2 ways to implement this, but none of them seem particularly
> > > > pretty :-/
> > >
> > > Maybe as Lukasz Luba said:
> > >
> > > https://lore.kernel.org/all/ae98a861-8945-e630-8d4c-8112723d1007@arm.com/
> > >
> > > > Let's meet in the middle:
> > > > 1) use the thermal PELT signal in RT:
> > > > capacity = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu))
> > > > 2) introduce a more configurable thermal_pressure shifter instead
> > > > 'sched_thermal_decay_shift', which would allow not only to make the
> > > > decaying longer, but also shorter when the platform already might do
> > > > that, to not cause too much traffic.
> > >
> > > But even if this is changed, there will still be the same problem, I
> > > look forward to Lukasz's patch:)
> >
> > This will not address my concern unless I missed something.
> >
> > The best (simplest) way forward IMHO is to introduce a new function
> >
> >         bool cpu_in_capacity_inversion(int cpu);
> >
> > (feel free to pick another name) which will detect the scenario you're in. You
> > can use this function then in rt_task_fits_capacity()
> >
> >         diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> >         index a32c46889af8..d48811a7e956 100644
> >         --- a/kernel/sched/rt.c
> >         +++ b/kernel/sched/rt.c
> >         @@ -462,6 +462,9 @@ static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
> >                 if (!static_branch_unlikely(&sched_asym_cpucapacity))
> >                         return true;
> >
> >         +       if (cpu_in_capacity_inversion(cpu))
> >         +               return false;
> >         +
> >                 min_cap = uclamp_eff_value(p, UCLAMP_MIN);
> >                 max_cap = uclamp_eff_value(p, UCLAMP_MAX);
> >
> > You'll probably need to do something similar in dl_task_fits_capacity().
> >
> > This might be a bit aggressive though as we'll steer away all RT tasks from
> > this CPU (as long as there's another CPU that can fit it). I need to think more
> > about it. But we could do something like this too
> >
> >         diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> >         index a32c46889af8..f2a34946a7ab 100644
> >         --- a/kernel/sched/rt.c
> >         +++ b/kernel/sched/rt.c
> >         @@ -462,11 +462,14 @@ static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
> >                 if (!static_branch_unlikely(&sched_asym_cpucapacity))
> >                         return true;
> >
> >         +       cpu_cap = capacity_orig_of(cpu);
> >         +
> >         +       if (cpu_in_capacity_inversion(cpu))
>
> It's  a good idea, but as you said, in mainline, the
> sysctl_sched_uclamp_util_min_rt_default is always 1024,
> Maybe it's better to add it to the judgment?
>
>  +       if (sysctl_sched_uclamp_util_min_rt_default !=
> SCHED_CAPACITY_SCALE && cpu_in_capacity_inversion(cpu))
>
> >         +               cpu_cap -= thermal_load_avg(cpu_rq(cpu));
>
> Why use thermal_load_avg? If thermal is always in effect,the
> thermal_load_avg would get bigger and bigger, as a result, the cpu_cap
> maybe smaller than (capacity_orig - thermal_pressure).

For a fixed thermal_pressure(), thermal_load_avg() will not be higher
than thermal_pressure() but will increase to reach thermal_pressure()

In the current implementation for sched_asym_cpucapacity topology, we
do a 1st iteration trying to find a cpu that fits a task's capacity
but if it fails, we run a normal cpupri_find that doesn't care about
capacity.

Do I understand correctly that in your case you would like to run
a 1st iteration that takes into account capacity_orig_of(cpu) -
thermal_load_avg(cpu_rq(cpu))
If it fails run another iteration only with capacity_orig_of(cpu)
and finally tries without capacity constraint

>
> Thanks!
>
> >         +
> >                 min_cap = uclamp_eff_value(p, UCLAMP_MIN);
> >                 max_cap = uclamp_eff_value(p, UCLAMP_MAX);
> >
> >         -       cpu_cap = capacity_orig_of(cpu);
> >         -
> >                 return cpu_cap >= min(min_cap, max_cap);
> >          }
> >          #else
> >
> > Thoughts?
>
> >
> >
> > Thanks!
> >
> > --
> > Qais Yousef

  reply	other threads:[~2022-04-26  8:09 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07  5:19 [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Xuewen Yan
2022-04-11  8:21 ` Dietmar Eggemann
2022-04-11  8:52   ` Xuewen Yan
2022-04-11 14:07     ` Dietmar Eggemann
2022-04-13 13:25       ` Lukasz Luba
2022-04-16  2:47         ` Xuewen Yan
2022-04-19  7:14           ` Vincent Guittot
2022-04-19 12:01             ` Lukasz Luba
2022-04-19 12:51               ` Vincent Guittot
2022-04-19 14:13                 ` Lukasz Luba
2022-04-21  8:29                   ` Vincent Guittot
2022-04-21 10:57                     ` Lukasz Luba
2022-04-26  7:39                       ` Vincent Guittot
2022-04-29  9:27                         ` Lukasz Luba
2022-04-20 13:51 ` Qais Yousef
2022-04-21  8:07   ` Xuewen Yan
2022-04-21 16:15     ` Qais Yousef
2022-04-25  1:31       ` Xuewen Yan
2022-04-25 16:12         ` Qais Yousef
2022-04-26  2:07           ` Xuewen Yan
2022-04-26  8:09             ` Vincent Guittot [this message]
2022-04-26  9:30               ` Qais Yousef
2022-04-26 10:06                 ` Vincent Guittot
2022-04-26 13:06                   ` Qais Yousef
2022-04-26  9:21             ` Qais Yousef
2022-04-27  1:38               ` Xuewen Yan
2022-04-27 10:58                 ` Qais Yousef
2022-05-01  3:20                   ` Xuewen Yan
2022-05-03 14:43                     ` Qais Yousef
2022-05-09  2:29                       ` Xuewen Yan
2022-05-10 14:56                         ` Qais Yousef
2022-05-10 17:44                           ` Lukasz Luba
2022-05-10 18:44                             ` Qais Yousef
2022-05-10 22:03                               ` Lukasz Luba
2022-05-14 15:01                                 ` Xuewen Yan
2022-05-14 23:55                                   ` Qais Yousef
2022-05-15  0:53                                     ` [PATCH] sched/rt: Support multi-criterion fitness search for kernel test robot
2022-05-15  1:43                                     ` kernel test robot
2022-05-19 14:16                                     ` [sched] 0eee64011b: canonical_address#:#[##] kernel test robot
2022-06-15 10:13                                     ` [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Qais Yousef
2022-06-15 11:17                                       ` Xuewen Yan
2022-06-15 13:54                                         ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtBswQ6bk8MrvcLmqc-9V2-SeWn3H_-gRvF5isdjL_acqA@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=di.shen@unisoc.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=viresh.kumar@linaro.org \
    --cc=xuewen.yan94@gmail.com \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).