All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Lukasz Luba <lukasz.luba@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Xuewen Yan <xuewen.yan@unisoc.com>,
	rafael@kernel.org, viresh.kumar@linaro.org, mingo@redhat.com,
	peterz@infradead.org, rostedt@goodmis.org,
	linux-kernel@vger.kernel.org, di.shen@unisoc.com,
	Xuewen Yan <xuewen.yan94@gmail.com>
Subject: Re: [PATCH] sched: Take thermal pressure into account when determine rt fits capacity
Date: Tue, 26 Apr 2022 09:39:32 +0200	[thread overview]
Message-ID: <CAKfTPtCu4zYchv1d4g-ztw=qR639BS2ncapQxfcwZqkSSQPY0w@mail.gmail.com> (raw)
In-Reply-To: <ae98a861-8945-e630-8d4c-8112723d1007@arm.com>

On Thu, 21 Apr 2022 at 12:57, Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 4/21/22 09:29, Vincent Guittot wrote:
> > On Tue, 19 Apr 2022 at 16:13, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >>
> >>
> >> On 4/19/22 13:51, Vincent Guittot wrote:
> >>> On Tue, 19 Apr 2022 at 14:01, Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 4/19/22 08:14, Vincent Guittot wrote:
> >>>>> On Sat, 16 Apr 2022 at 04:47, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Luba  / Dietmar
> >>>>>>
> >>>>>> On Wed, Apr 13, 2022 at 9:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>

[...]

> >>>> To be precised and maybe fix some potential design issues. We are
> >>>> talking here about utilization and set max capacity in function:
> >>>> sugov_get_util()
> >>>> so fields:
> >>>>
> >>>> sugov_cpu::util
> >>>> sugov_cpu::max /* max capacity */
> >>>
> >>> Yes. With this patch ,util will be lower than current thermal
> >>> mitigation whereas util normally reflects what we need  not what can
> >>> be provided
> >>
> >> This is a different requirements: util has to be max capacity and
> >> max capacity has to be original max CPU capacity - for the SchedUtil.
> >> OK, why? What this requirement adds in the design and final values?
> >
> > Because the calculation you are proposing is wrong and doesn't make
> > sense. Util is the average utilization of the cpu that has to be
> > compared with its original capacity max in order to get the freq that
> > matches with this utilization.
> >
> > We have freq = util / max * max_freq and cpufreq will then capp freq
> > if mitigation is applied. Once the mitigation disappear, the request
> > will be back to targeted freq.
> >
> > If you replace max by max' = max - arch_scale_thermal_pressure then :
> >
> > - by the time you do the calculation, arch_scale_thermal_pressure can
> > have changed and the result is meaningless. This is true whatever the
> > pace of updating arch_scale_thermal_pressure
>
> The sudden change of the value taken from arch_scale_thermal_pressure
> I can understand, but there are similar and we live with it. Look at
> the whole EAS estimations done in a one CPU waku-up event or the uclamp
> stuff. As far this is not too frequently occurring - we live wit it.
>
> I can see your concern here, since you mentioned below that you expect
> some platforms to hit it in 'khz' rate. This is probably not good, to
> trigger the kernel so often from HW/FW.
>
> That's why I have been struggling to find a 'good' design on this
> glue layer for Arm FW+kernel. Our FW would probably won't cause such
> huge notification traffic. A rate e.g. 50-100ms would be enough,
> especially if we have the per-CPU cpufreq policy. So we might have
> this 'PELT-like filter or signal' in FW, and just update kernel

In this case arch_scale_thermal_pressure() doesn't reflect the actual
thermal pressure but an average which is what thermal_load_avg() is
also doing.

> less often. But then there is an issue with the rising/decaying
> penalty of the kernel thermal pressure signal.
>
> We cannot assume that some SoCs don't do this already.
>
> Let's meet in the middle:
> 1) use the thermal PELT signal in RT:
> capacity = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu))

This is what Dietmar and I have been suggested

> 2) introduce a more configurable thermal_pressure shifter instead
> 'sched_thermal_decay_shift', which would allow not only to make the
> decaying longer, but also shorter when the platform already might do
> that, to not cause too much traffic.
>
> >
> > - you change the range of capacity to max'= max -
> > arch_scale_thermal_pressure and you scale it to max_freq. if util >
> > max', then you will ask for max_freq whatever the util being really
> > close to max or not. Also you will ask for max freq even if util is
> > close but below max' whereas the mitigation doesn't impact utilization
>
> It's already there, even w/o patch. That's why I gave you the examples.

Not sure how to understand this above.

utilization can already goes above but this reflects a reality that
the task could need more capacity than the current cpu capacity

>
> BTW, isn't true that the utilization of the Little CPU rq can reach
> 1024 today after your change to the PELT when there is no idle time,
> even when cpu max capacity is e.g. 300?

yes

> Before that change the utilization of a throttled CPU rq would converge
> to the current capacity of the CPU, am I right?

yes

>
> Is it this commit:
> 23127296889fe84b0762b191
>
> >
> >>
> >>>
> >>>>

[...]

> >>>
> >>>> but then in both cases are multiplied by 'max_freq' in (2)
> >>>>
> >>>> As you can see this is not the situation that you have described, is it?
> >>>> And the transient or non-transient is minor here IMO.
> >>>
> >>> If max is 512 then util = 640 which is much lower than 1024.
> >>
> >> What scenario is this?
> >> Is 1024 the utilization that we might have from the CPU rq?
> >> What is the original CPU capacity, 1024?
>
> Is this 1024 the utilization of the CPU runqueue because since
> the new PELT we can have it bigger than CPU capacity?
>
> >>
> >>>
> >>>>
> >>>> Secondly, you have mentioned the mitigation in HW and issue between
> >>>> instantaneous vs. PELT-one thermal pressure information. This is
> >>>> something that I'm stretching my head for long. I'm trying to
> >>>> develop this for new Arm FW thermal. You have mentioned:
> >>>> 'thermal mitigation is managed by HW at a much higher
> >>>> frequency than what Linux can handle' - I would be also more
> >>>> precised here: HW or FW? How often the HW can change max freq or
> >>>> how often FW can change that? If we don't have those numbers
> >>>> than statement: 'a much higher' doesn't help in solving this
> >>>
> >>> By much higher means that Linux can't react fast enough and should not
> >>> try to sync because it's a lost game
> >>
> >> As I said, 'much higher' is not a number to base a design on it.
> >
> > But that gives you the constraint that you can't expect to be always
> > synced with up to date value which is the most important here. This
> > means that  cpu_cap -= arch_scale_thermal_pressure(cpu) can be wrong
> > just after you computed it and your decision is wrong.
>
> This is hypothetical situation when the value can change in such
> noisy way on some platform. But I understand your concern.
>
> >
> >
> >> We need real numbers from real platforms. Currently we have two
> >> places where the thermal pressure is set:
> >> 1) cpufreq_cooling.c [1]
> >> 2) Qcom driver [2]
> >> (we might have 3rd soon for Arm SCMI+FW)
> >
> > I don't have details but i have khz in mind
>
> If such traffic of interrupts in khz is true for driver in 2)
> then it's a bit concerning.
>
> Although, smarter platforms shouldn't suffer due to design forced to one
> corner case platform.
>
> >
> >>
> >> For the 2nd I would like to see numbers. For the 1st one when
> >> kernel thermal is used (which supports higher number of platforms
> >> comparing to Qcom driver) as it's by design kernel tries to control
> >> thermal, so changes are not that frequent.
> >>
> >> As for now, I can see in experiments the 1st is suffering long decay
> >> delays and also corner cases with long idle CPUs.
> >>
> >>>

[...]

> >> I'm trying to help Xuewen to solve his/her issues with the RT class
> >> incrementally. I don't want to push him/her into a deep dark water
> >> of PELT signals, to what variable compare them, corner cases when they
> >> are (or not) updated or completely not implemented. I'm not even sure
> >> if those extra complexities make sense for the RT/DL (since they
> >> make some difference on big.mid.little specific platforms but not for
> >> the rest).
>
> As I said we need a way forward, this issue of capacity inversion
> on big.mid.little is there. It was for ~2-3years and is going to be
> even bigger in future. So please don't block it and prepare/share the
> numbers for the corner case platforms.

I don't want to block anything but just want a solution that is
coherent with the whole design and not just a fix for one UC

>
> I have proposed the where we can meet in the middle, consider it.
> I will prepare a patch for that shifter.

  reply	other threads:[~2022-04-26  7:40 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07  5:19 [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Xuewen Yan
2022-04-11  8:21 ` Dietmar Eggemann
2022-04-11  8:52   ` Xuewen Yan
2022-04-11 14:07     ` Dietmar Eggemann
2022-04-13 13:25       ` Lukasz Luba
2022-04-16  2:47         ` Xuewen Yan
2022-04-19  7:14           ` Vincent Guittot
2022-04-19 12:01             ` Lukasz Luba
2022-04-19 12:51               ` Vincent Guittot
2022-04-19 14:13                 ` Lukasz Luba
2022-04-21  8:29                   ` Vincent Guittot
2022-04-21 10:57                     ` Lukasz Luba
2022-04-26  7:39                       ` Vincent Guittot [this message]
2022-04-29  9:27                         ` Lukasz Luba
2022-04-20 13:51 ` Qais Yousef
2022-04-21  8:07   ` Xuewen Yan
2022-04-21 16:15     ` Qais Yousef
2022-04-25  1:31       ` Xuewen Yan
2022-04-25 16:12         ` Qais Yousef
2022-04-26  2:07           ` Xuewen Yan
2022-04-26  8:09             ` Vincent Guittot
2022-04-26  9:30               ` Qais Yousef
2022-04-26 10:06                 ` Vincent Guittot
2022-04-26 13:06                   ` Qais Yousef
2022-04-26  9:21             ` Qais Yousef
2022-04-27  1:38               ` Xuewen Yan
2022-04-27 10:58                 ` Qais Yousef
2022-05-01  3:20                   ` Xuewen Yan
2022-05-03 14:43                     ` Qais Yousef
2022-05-09  2:29                       ` Xuewen Yan
2022-05-10 14:56                         ` Qais Yousef
2022-05-10 17:44                           ` Lukasz Luba
2022-05-10 18:44                             ` Qais Yousef
2022-05-10 22:03                               ` Lukasz Luba
2022-05-14 15:01                                 ` Xuewen Yan
2022-05-14 23:55                                   ` Qais Yousef
2022-05-15  0:53                                     ` [PATCH] sched/rt: Support multi-criterion fitness search for kernel test robot
2022-05-15  1:43                                     ` kernel test robot
2022-05-19 14:16                                     ` [sched] 0eee64011b: canonical_address#:#[##] kernel test robot
2022-05-19 14:16                                       ` kernel test robot
2022-06-15 10:13                                     ` [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Qais Yousef
2022-06-15 11:17                                       ` Xuewen Yan
2022-06-15 13:54                                         ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtCu4zYchv1d4g-ztw=qR639BS2ncapQxfcwZqkSSQPY0w@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=di.shen@unisoc.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=viresh.kumar@linaro.org \
    --cc=xuewen.yan94@gmail.com \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.