All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qais Yousef <qais.yousef@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>,
	Xuewen Yan <xuewen.yan@unisoc.com>,
	dietmar.eggemann@arm.com, lukasz.luba@arm.com, rafael@kernel.org,
	viresh.kumar@linaro.org, mingo@redhat.com, peterz@infradead.org,
	rostedt@goodmis.org, linux-kernel@vger.kernel.org,
	di.shen@unisoc.com
Subject: Re: [PATCH] sched: Take thermal pressure into account when determine rt fits capacity
Date: Tue, 26 Apr 2022 10:30:56 +0100	[thread overview]
Message-ID: <20220426093056.uxnsz4tv4vhvbipe@airbuntu> (raw)
In-Reply-To: <CAKfTPtBswQ6bk8MrvcLmqc-9V2-SeWn3H_-gRvF5isdjL_acqA@mail.gmail.com>

On 04/26/22 10:09, Vincent Guittot wrote:
> On Tue, 26 Apr 2022 at 04:07, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
> >
> > On Tue, Apr 26, 2022 at 12:12 AM Qais Yousef <qais.yousef@arm.com> wrote:
> > >
> > > On 04/25/22 09:31, Xuewen Yan wrote:
> > > > On Fri, Apr 22, 2022 at 12:15 AM Qais Yousef <qais.yousef@arm.com> wrote:
> > > > > Is it okay to share what the capacities of the littles, mediums and bigs on
> > > > > your system? And how they change under worst case scenario thermal pressure?
> > > > > Only IF you have these numbers handy :-)
> > > >
> > > > Okay, the little/mid/big cpu scale capacity is 350/930/1024, but the
> > > > cpu frequency point is discrete, the big core's high freq point may is
> > > > just a few more than the mid core's highest.
> > > > In this case, once the thermal decrease the scaling_max_freq, the
> > > > maximum frequency of the large core is easily lower than that of the
> > > > medium core.
> > > > Of course, the corner case is due to the frequency design of the soc
> > > > and  our thermal algorithm.
> > >
> > > Okay, thanks for the info!
> > >
> > > >
> > > > >
> > > > > Is it actually an indication of a potential other problem if you swing into
> > > > > capacity inversion in the bigs that often? I've seen a lot of systems where the
> > > > > difference between the meds and bigs is small. But frequent inversion could be
> > > > > suspicious still.
> > > > >
> > > > > Do the littles and the mediums experience any significant thermal pressure too?
> > > >
> > > > In our platform, it's not.
> > >
> > > Good.
> > >
> > > > > It doesn't seem it'll cause a significant error, but still it seems to me this
> > > > > function wants the original capacity passed to it.
> > > > >
> > > > > There are similar questions to be asked since you modify sg_cpu->max. Every
> > > > > user needs to be audited if they're fine with this change or not.
> > > > >
> > > > > I'm not sure still what we are achieving here. You want to force schedutil not
> > > > > to request higher frequencies if thermal pressure is high? Should schedutil
> > > > > actually care? Shouldn't the cpufreq driver reject this request and pick the
> > > > > next best thing if it can't satisfy it? I could be missing something, I haven't
> > > > > looked that hard tbh :-)
> > > >
> > > > I changed this just want to make it more responsive to the real
> > > > capacity of the cpu, if it will cause other problems, maybe it would
> > > > be better not to change it.:)
> > >
> > > There are others who can give you a better opinion. But AFAICS we're not fixing
> > > anything but risking breaking other things. So I vote for not to change it :)
> > >
> > > > > It depends on the severity of the problem. The simplest thing I can suggest is
> > > > > to check if the cpu is in capacity inversion state, and if it is, then make
> > > > > rt_task_fits_capacity() return false always.
> > > > >
> > > > > If we need a generic solution to handle thermal pressure omitting OPPs, then
> > > > > the search needs to become more complex. The proposal in this patch is not
> > > > > adequate because tasks that want to run at capacity_orig_of(cpu) will wrongly
> > > > > omit some cpus because of any tiny thermal pressure. For example if the
> > > > > capacity_orig_of(medium_cpu) = 700, and uclamp_min for RT is set to 700, then
> > > > > any small thermal pressure on mediums will cause these tasks to run on big cpus
> > > > > only, which is not what we want. Especially if these big cpus can end up in
> > > > > capacity inversion later ;-)
> > > > >
> > > > > So if we want to handle this case, then we need to ensure the search returns
> > > > > false only if
> > > > >
> > > > >         1. Thermal pressure results in real OPP to be omitted.
> > > > >         2. Another CPU that can provide this performance level is available.
> > > > >
> > > > > Otherwise we should still fit it on this CPU because it'll give us the closest
> > > > > thing to what was requested.
> > > > >
> > > > > I can think of 2 ways to implement this, but none of them seem particularly
> > > > > pretty :-/
> > > >
> > > > Maybe as Lukasz Luba said:
> > > >
> > > > https://lore.kernel.org/all/ae98a861-8945-e630-8d4c-8112723d1007@arm.com/
> > > >
> > > > > Let's meet in the middle:
> > > > > 1) use the thermal PELT signal in RT:
> > > > > capacity = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu))
> > > > > 2) introduce a more configurable thermal_pressure shifter instead
> > > > > 'sched_thermal_decay_shift', which would allow not only to make the
> > > > > decaying longer, but also shorter when the platform already might do
> > > > > that, to not cause too much traffic.
> > > >
> > > > But even if this is changed, there will still be the same problem, I
> > > > look forward to Lukasz's patch:)
> > >
> > > This will not address my concern unless I missed something.
> > >
> > > The best (simplest) way forward IMHO is to introduce a new function
> > >
> > >         bool cpu_in_capacity_inversion(int cpu);
> > >
> > > (feel free to pick another name) which will detect the scenario you're in. You
> > > can use this function then in rt_task_fits_capacity()
> > >
> > >         diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > >         index a32c46889af8..d48811a7e956 100644
> > >         --- a/kernel/sched/rt.c
> > >         +++ b/kernel/sched/rt.c
> > >         @@ -462,6 +462,9 @@ static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
> > >                 if (!static_branch_unlikely(&sched_asym_cpucapacity))
> > >                         return true;
> > >
> > >         +       if (cpu_in_capacity_inversion(cpu))
> > >         +               return false;
> > >         +
> > >                 min_cap = uclamp_eff_value(p, UCLAMP_MIN);
> > >                 max_cap = uclamp_eff_value(p, UCLAMP_MAX);
> > >
> > > You'll probably need to do something similar in dl_task_fits_capacity().
> > >
> > > This might be a bit aggressive though as we'll steer away all RT tasks from
> > > this CPU (as long as there's another CPU that can fit it). I need to think more
> > > about it. But we could do something like this too
> > >
> > >         diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > >         index a32c46889af8..f2a34946a7ab 100644
> > >         --- a/kernel/sched/rt.c
> > >         +++ b/kernel/sched/rt.c
> > >         @@ -462,11 +462,14 @@ static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
> > >                 if (!static_branch_unlikely(&sched_asym_cpucapacity))
> > >                         return true;
> > >
> > >         +       cpu_cap = capacity_orig_of(cpu);
> > >         +
> > >         +       if (cpu_in_capacity_inversion(cpu))
> >
> > It's  a good idea, but as you said, in mainline, the
> > sysctl_sched_uclamp_util_min_rt_default is always 1024,
> > Maybe it's better to add it to the judgment?
> >
> >  +       if (sysctl_sched_uclamp_util_min_rt_default !=
> > SCHED_CAPACITY_SCALE && cpu_in_capacity_inversion(cpu))
> >
> > >         +               cpu_cap -= thermal_load_avg(cpu_rq(cpu));
> >
> > Why use thermal_load_avg? If thermal is always in effect,the
> > thermal_load_avg would get bigger and bigger, as a result, the cpu_cap
> > maybe smaller than (capacity_orig - thermal_pressure).
> 
> For a fixed thermal_pressure(), thermal_load_avg() will not be higher
> than thermal_pressure() but will increase to reach thermal_pressure()
> 
> In the current implementation for sched_asym_cpucapacity topology, we
> do a 1st iteration trying to find a cpu that fits a task's capacity
> but if it fails, we run a normal cpupri_find that doesn't care about
> capacity.
> 
> Do I understand correctly that in your case you would like to run
> a 1st iteration that takes into account capacity_orig_of(cpu) -
> thermal_load_avg(cpu_rq(cpu))
> If it fails run another iteration only with capacity_orig_of(cpu)
> and finally tries without capacity constraint

Wouldn't this be expensive to have 3 loops? That was my other suggestion but
wasn't sure the complexity was worth it. So I suggested handling the capacity
inversion case only.


Thanks

--
Qais Yousef

  reply	other threads:[~2022-04-26 10:07 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07  5:19 [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Xuewen Yan
2022-04-11  8:21 ` Dietmar Eggemann
2022-04-11  8:52   ` Xuewen Yan
2022-04-11 14:07     ` Dietmar Eggemann
2022-04-13 13:25       ` Lukasz Luba
2022-04-16  2:47         ` Xuewen Yan
2022-04-19  7:14           ` Vincent Guittot
2022-04-19 12:01             ` Lukasz Luba
2022-04-19 12:51               ` Vincent Guittot
2022-04-19 14:13                 ` Lukasz Luba
2022-04-21  8:29                   ` Vincent Guittot
2022-04-21 10:57                     ` Lukasz Luba
2022-04-26  7:39                       ` Vincent Guittot
2022-04-29  9:27                         ` Lukasz Luba
2022-04-20 13:51 ` Qais Yousef
2022-04-21  8:07   ` Xuewen Yan
2022-04-21 16:15     ` Qais Yousef
2022-04-25  1:31       ` Xuewen Yan
2022-04-25 16:12         ` Qais Yousef
2022-04-26  2:07           ` Xuewen Yan
2022-04-26  8:09             ` Vincent Guittot
2022-04-26  9:30               ` Qais Yousef [this message]
2022-04-26 10:06                 ` Vincent Guittot
2022-04-26 13:06                   ` Qais Yousef
2022-04-26  9:21             ` Qais Yousef
2022-04-27  1:38               ` Xuewen Yan
2022-04-27 10:58                 ` Qais Yousef
2022-05-01  3:20                   ` Xuewen Yan
2022-05-03 14:43                     ` Qais Yousef
2022-05-09  2:29                       ` Xuewen Yan
2022-05-10 14:56                         ` Qais Yousef
2022-05-10 17:44                           ` Lukasz Luba
2022-05-10 18:44                             ` Qais Yousef
2022-05-10 22:03                               ` Lukasz Luba
2022-05-14 15:01                                 ` Xuewen Yan
2022-05-14 23:55                                   ` Qais Yousef
2022-05-15  0:53                                     ` [PATCH] sched/rt: Support multi-criterion fitness search for kernel test robot
2022-05-15  1:43                                     ` kernel test robot
2022-05-19 14:16                                     ` [sched] 0eee64011b: canonical_address#:#[##] kernel test robot
2022-05-19 14:16                                       ` kernel test robot
2022-06-15 10:13                                     ` [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Qais Yousef
2022-06-15 11:17                                       ` Xuewen Yan
2022-06-15 13:54                                         ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220426093056.uxnsz4tv4vhvbipe@airbuntu \
    --to=qais.yousef@arm.com \
    --cc=di.shen@unisoc.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=xuewen.yan94@gmail.com \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.