From: Vincent Guittot <vincent.guittot@linaro.org>
To: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: Lukasz Luba <Lukasz.Luba@arm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Xuewen Yan <xuewen.yan@unisoc.com>,
rafael@kernel.org, viresh.kumar@linaro.org, mingo@redhat.com,
peterz@infradead.org, rostedt@goodmis.org,
linux-kernel@vger.kernel.org, di.shen@unisoc.com
Subject: Re: [PATCH] sched: Take thermal pressure into account when determine rt fits capacity
Date: Tue, 19 Apr 2022 09:14:29 +0200 [thread overview]
Message-ID: <CAKfTPtDsNSW0JH4phAtZB7ackJFKuJfAGkhQ7JjWG_C2tzQYSw@mail.gmail.com> (raw)
In-Reply-To: <CAB8ipk8v1-jyU0Q0k-EXCArsS6Sh=U+fW6NutW+6m+kQ=LKrJA@mail.gmail.com>
On Sat, 16 Apr 2022 at 04:47, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>
> Hi Luba / Dietmar
>
> On Wed, Apr 13, 2022 at 9:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >
> >
> >
> > On 4/11/22 15:07, Dietmar Eggemann wrote:
> > > On 11/04/2022 10:52, Xuewen Yan wrote:
> > >> HI Dietmar
> > >>
> > >> On Mon, Apr 11, 2022 at 4:21 PM Dietmar Eggemann
> > >> <dietmar.eggemann@arm.com> wrote:
> > >>>
> > >>> On 07/04/2022 07:19, Xuewen Yan wrote:
> > >>>> There are cases when the cpu max capacity might be reduced due to thermal.
> > >>>> Take into the thermal pressure into account when judge whether the rt task
> > >>>> fits the cpu. And when schedutil govnor get cpu util, the thermal pressure
> > >>>> also should be considered.
> > >>>>
> > >>>> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > >>>> ---
> > >>>> kernel/sched/cpufreq_schedutil.c | 1 +
> > >>>> kernel/sched/rt.c | 1 +
> > >>>> 2 files changed, 2 insertions(+)
> > >>>>
> > >>>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > >>>> index 3dbf351d12d5..285ad51caf0f 100644
> > >>>> --- a/kernel/sched/cpufreq_schedutil.c
> > >>>> +++ b/kernel/sched/cpufreq_schedutil.c
> > >>>> @@ -159,6 +159,7 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
> > >>>> struct rq *rq = cpu_rq(sg_cpu->cpu);
> > >>>> unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);
> > >>>>
> > >>>> + max -= arch_scale_thermal_pressure(sg_cpu->cpu);
> > >>>
> > >>> max' = arch_scale_cpu_capacity() - arch_scale_thermal_pressure()
> > >>>
> > >>> For the energy part (A) we use max' in compute_energy() to cap sum_util
> > >>> and max_util at max' and to call em_cpu_energy(..., max_util, sum_util,
> > >>> max'). This was done to match (B)'s `policy->max` capping.
> > >>>
> > >>> For the frequency part (B) we have freq_qos_update_request() in:
> > >>>
> > >>> power_actor_set_power()
> > >>> ...
> > >>> cdev->ops->set_cur_state()
> > >>>
> > >>> cpufreq_set_cur_state()
> > >>> freq_qos_update_request() <-- !
> > >>> arch_update_thermal_pressure()
> > >>>
> > >>> restricting `policy->max` which then clamps `target_freq` in:
> > >>>
> > >>> cpufreq_update_util()
> > >>> ...
> > >>> get_next_freq()
> > >>> cpufreq_driver_resolve_freq()
> > >>> __resolve_freq()
> > >>>
> > >>
> > >> Do you mean that the "max" here will not affect the frequency
> > >> conversion, so there is no need to change it?
> > >> But is it better to reflect the influence of thermal here?
> > >
> > > I guess your point is that even though max' has no effect on frequency
> > > since QOS caps policy->max anyway, it is still easier to understand the
> > > dependency between schedutil and EAS/EM when it comes to the use of
> > > thermal pressure.
> > >
> > > I agree. The way how the "hidden" policy->max capping guarantees that
> > > schedutil and EAS/EM are doing the same even when only the latter uses
> > > max' is not obvious.
> >
> > +1 here, IMO we shouldn't rely on hidden stuff. There are two which set
> > the thermal pressure, but one is not setting the freq_qos which causes
> > the update of the 'policy->max'. So the schedutil will send that high
> > frequency but that driver would just ignore and clamp internally. In the
> > end we might argue it still works, but is it clean and visible from the
> > code? Funny thing might start to happen then the driver, which might be
> > the last safety net cannot capture this.
schedutil reflects what scheduler wants not what HW can do. If you
start to cap the freq with arch_scale_thermal_pressure() in schedutil,
you will loose some opportunity to run at higher frequency because
arch_scale_thermal_pressure() is transient and might change just after
using it. This means that you will stay at lower freq after mitigation
stops until a new cpufreq_update_util() happens. ANd I don't vene
mentioned when thermal mitigation is managed by HW at a much higher
frequency than what Linux can handle
arch_scale_thermal_pressure() must not be used but thermal_load_avg()
like scale_rt_capacity() what Dietmar suggested
> >
> > We also should be OK with energy estimation and the CPU capacity vs.
> > task utilization comparisons, since the thermal pressure is accounted
> > there* (until the thermal is controlled in kernel not in FW, which is
> > something where we are heading with scmi-cpufreq mentioned in this
> > cover letter [1]).
>
> IMO, If so, we don't want to modify the original code, but also need to
> consider the impact of thermal, maybe it is possible to add a new
> macro definition
> like this:
>
> #define arch_scale_cpu_capacity_except_thermal()
> (arch_scale_cpu_capacity() - arch_scale_thermal_pressure())
>
> >
> > >
> > > I just wanted to mention the historical reason why the code looks like
> > > it does today.
> > >
> > >>> [...]
> > >>>
> > >>>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > >>>> index a32c46889af8..d9982ebd4821 100644
> > >>>> --- a/kernel/sched/rt.c
> > >>>> +++ b/kernel/sched/rt.c
> > >>>> @@ -466,6 +466,7 @@ static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
> > >>>> max_cap = uclamp_eff_value(p, UCLAMP_MAX);
> > >>>>
> > >>>> cpu_cap = capacity_orig_of(cpu);
> > >>>> + cpu_cap -= arch_scale_thermal_pressure(cpu);
> > >>>>
> > >>>> return cpu_cap >= min(min_cap, max_cap);
> > >>>> }
> > >>>
> > >>> IMHO, this should follow what we do with rq->cpu_capacity
> > >>> (capacity_of(), the remaining capacity for CFS). E.g. we use
> > >>> capacity_of() in find_energy_efficient_cpu() and select_idle_capacity()
> > >>> to compare capacities. So we would need a function like
> > >>> scale_rt_capacity() for RT (minus the rq->avg_rt.util_avg) but then also
> > >>> one for DL (minus rq->avg_dl.util_avg and rq->avg_rt.util_avg).
> > >>
> > >> It's a really good idea. And do you already have the corresponding patch?
> > >> If there is, can you tell me the corresponding link?
> > >
> > > No, I don't have any code for this. Should be trivial though. But the
> > > issue is that the update would probably have to be decoupled from CFS
> > > load_balance (update_group_capacity()) and the use cases in RT/DL are
> > > only valid for asymmetric CPU capacity systems.
> >
> > Having in mind those I would vote for fixing it incrementally.
> > Thus, IMHO this patch is good for taking it. Later we might think how
> > to better estimate the real CPU capacity visible from RT and DL classes.
> > In this shape it is good for many systems which only use RT,
> > but not DL class. Those systems w/ RT and w/o DL might suffer on some
> > asymmetric CPU platforms where medium cores have capacity e.g. 850 and
> > thermal pressure reduced the big cores capacity by 250 making them 774.
> >
>
> Your mean is that before there is better way to handle RT capacity, we
> can take this patch temporarily?
> If so, I can update the patch which will just fix the rt.c.
>
> In fact, in the mobile phone usage scenario where the cpu contains 3
> clusters (small/middle/big),
> the capacity of the big core's capacity will be smaller than that of
> the middle core due to the thermal effect.
> At this time, we do not want the big core CPU to be used as an RT
> task's alternative cpu.
>
> > Regards,
> > Lukasz
> >
> > [1]
> > https://lore.kernel.org/linux-pm/20211007080729.8262-1-lukasz.luba@arm.com/
>
>
> BR
> xuewen.yan
next prev parent reply other threads:[~2022-04-19 7:15 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-07 5:19 [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Xuewen Yan
2022-04-11 8:21 ` Dietmar Eggemann
2022-04-11 8:52 ` Xuewen Yan
2022-04-11 14:07 ` Dietmar Eggemann
2022-04-13 13:25 ` Lukasz Luba
2022-04-16 2:47 ` Xuewen Yan
2022-04-19 7:14 ` Vincent Guittot [this message]
2022-04-19 12:01 ` Lukasz Luba
2022-04-19 12:51 ` Vincent Guittot
2022-04-19 14:13 ` Lukasz Luba
2022-04-21 8:29 ` Vincent Guittot
2022-04-21 10:57 ` Lukasz Luba
2022-04-26 7:39 ` Vincent Guittot
2022-04-29 9:27 ` Lukasz Luba
2022-04-20 13:51 ` Qais Yousef
2022-04-21 8:07 ` Xuewen Yan
2022-04-21 16:15 ` Qais Yousef
2022-04-25 1:31 ` Xuewen Yan
2022-04-25 16:12 ` Qais Yousef
2022-04-26 2:07 ` Xuewen Yan
2022-04-26 8:09 ` Vincent Guittot
2022-04-26 9:30 ` Qais Yousef
2022-04-26 10:06 ` Vincent Guittot
2022-04-26 13:06 ` Qais Yousef
2022-04-26 9:21 ` Qais Yousef
2022-04-27 1:38 ` Xuewen Yan
2022-04-27 10:58 ` Qais Yousef
2022-05-01 3:20 ` Xuewen Yan
2022-05-03 14:43 ` Qais Yousef
2022-05-09 2:29 ` Xuewen Yan
2022-05-10 14:56 ` Qais Yousef
2022-05-10 17:44 ` Lukasz Luba
2022-05-10 18:44 ` Qais Yousef
2022-05-10 22:03 ` Lukasz Luba
2022-05-14 15:01 ` Xuewen Yan
2022-05-14 23:55 ` Qais Yousef
2022-05-15 0:53 ` [PATCH] sched/rt: Support multi-criterion fitness search for kernel test robot
2022-05-15 1:43 ` kernel test robot
2022-05-19 14:16 ` [sched] 0eee64011b: canonical_address#:#[##] kernel test robot
2022-06-15 10:13 ` [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Qais Yousef
2022-06-15 11:17 ` Xuewen Yan
2022-06-15 13:54 ` Qais Yousef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKfTPtDsNSW0JH4phAtZB7ackJFKuJfAGkhQ7JjWG_C2tzQYSw@mail.gmail.com \
--to=vincent.guittot@linaro.org \
--cc=Lukasz.Luba@arm.com \
--cc=di.shen@unisoc.com \
--cc=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=viresh.kumar@linaro.org \
--cc=xuewen.yan94@gmail.com \
--cc=xuewen.yan@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).