linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xuewen Yan <xuewen.yan94@gmail.com>
To: Lukasz Luba <lukasz.luba@arm.com>
Cc: "Qais Yousef" <qais.yousef@arm.com>,
	"Xuewen Yan" <xuewen.yan@unisoc.com>,
	dietmar.eggemann@arm.com, rafael@kernel.org,
	viresh.kumar@linaro.org, mingo@redhat.com, peterz@infradead.org,
	vincent.guittot@linaro.org, rostedt@goodmis.org,
	linux-kernel@vger.kernel.org, di.shen@unisoc.com,
	"王科 (Ke Wang)" <Ke.Wang@unisoc.com>
Subject: Re: [PATCH] sched: Take thermal pressure into account when determine rt fits capacity
Date: Sat, 14 May 2022 23:01:39 +0800	[thread overview]
Message-ID: <CAB8ipk8RUsxYs6JdeMeUPADQRNqT=Jqz15ecwbzbf0RHUDqOXg@mail.gmail.com> (raw)
In-Reply-To: <0505936e-3746-4623-a967-103a0158bfbd@arm.com>

On Wed, May 11, 2022 at 6:03 AM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 5/10/22 19:44, Qais Yousef wrote:
> > On 05/10/22 18:44, Lukasz Luba wrote:
> >
> > [...]
> >
> >> To properly answer this question we probably have to analyze the timings
> >> and this update path - how often it is actually called. Keep in mind
> >> we are going to solve CPU capacity inversion for RT class, which
> >> contains latency sensitive tasks. In this approach the information
> >
> > This was an attempt for a generic inversion detection. We update
> > rq->cpu_capacity which is used by capacity_of() in the same path.
>
> True, but this is a CFS 'world' and the update path is part of load
> balance. Your proposed code which sets the new
> 'rq->cpu_capacity_inverted' is run there, which might have some
> delays.

Yes, that's exactly what I'm worried about.

>
> >
> > I didn't feel brave to write a quick patch in the topology code, but we can
> > certainly do the detection there in topology_update_thermal_pressure().
>
> Looks better, since that code path is called when we get instantaneous
> information about CPU freq reduction. I'm afraid that again this
> approach might be blocked due to 'khz' calling ratio of that code and we
> 'must not' use this.
>
> >
> >> about HW status is coming from this CFS load balance path.
> >> What if that load balance is not called that often as RT might require?
> >> What if there is a light load on CPUs, but GPU caused them to throttle,
> >> reducing capacity by a decent chunk e.g. 50%?
> >> That would translate to some RT periodic task which takes 2ms every
> >> 8ms to take 4ms, while maybe on other less power hungry CPU it could
> >> take 3ms.
> >>
> >> The usage of thermal_load_avg() in the scale_rt_capacity() looks OK
> >> for the CFS, but might not be from the RT class point of view.
> >> The RT class might want to realize faster that CPUs have changed the
> >> capacity.
> >> Maybe it's OK with that patch [1] and boot config shifter=-5, but in
> >> default boot config for shifter=0 we can suffer for hundreds of ms
> >> running on lower capacity cpu (which is quite high number of frames
> >> nowadays).
> >>
> >> Without a research and experiments data I'm afraid this is too
> >> big step to make, with this CFS load balance path.
> >
> > I think Xuewen didn't want to use thermal_load_avg(), and that's the question
> > I deferred.
>
> Your code snipped might have similar penalty, since you populate
> information about that CPU inversion at 'some point in time'.
> My point is: that 'point in time' is not well defined, since it's
> CFS load balance. I'm afraid that RT class deserves something better
> defined (predictable, repeatable, reliable, short, etc.)
>
> >
> >>
> >>>
> >>>>
> >>>>> +
> >>>>> +               rq->cpu_capacity_inverted = 0;
> >>>>> +
> >>>>> +               for_each_possible_cpu(cpu) {
> >>>>> +                       unsigned long cap = arch_scale_cpu_capacity(cpu);
> >>>>> +
> >>>>> +                       if (capacity_orig <= cap)
> >>>>> +                               continue;
> >>
> >> The search loop here assumes that other CPUs (fortunately not in the
> >> same freq domain) don't suffer due to reduced capacity. This might be
> >> not true - when we have ~1 Watt budget for all CPUs in the system and
> >> single big core can use 3-4W at max or single mid core ~1.2W.
>
> s/1.2W/1-2W
>
> >
> > I defined capacity inversion against capacity_orig. IMHO that's the sensible
> > definition to make.
> >
> > Would be good to hear more/other suggestions.
>
> Capacity of other CPU might also be reduced and capacity_orig is not
> reflecting that. My gut feeling tells me that this capacity_orig
> assumption might be too optimistic for some platforms.

In unisoc platform with 3 clusters(little/mid/big), there are cases
that middle core and big core are throttled at the same time.

>
> It's the same old question: how good the model should be.
> We want to 'model' the reality (CPUs slows down), how good the
> model should be in this RT world use case - I don't know w/o
> experiments.
>
> I don't even know how often this new variable
> 'rq->cpu_capacity_inverted' gets updated and what is the time diff to
> the last update of the raw thermal pressure variable. You said that code
> is 'completely untested'. So it's unknown delay for now - but belongs to
> similar class as thermal_load_avg(), but the 2nd is known. I have
> shared plots with raw signal vs. PELT-like delays. We at least know
> the delays, e.g. ~200ms to reach raw value, but how that impacts RT
> world - I have no experiment results from real apps (i.e. w/ audio or
> display threads).
>
> >
> >>
> >>>>> +
> >>>>> +                       if (cap > inv_cap) {
> >>>>> +                               rq->cpu_capacity_inverted = inv_cap;
> >>>>> +                               break;
> >>>>> +                       }
> >>>>> +               }
> >>>>> +
> >>>>> +       }
> >>>>>
> >>>>>           sdg->sgc->capacity = capacity;
> >>>>>           sdg->sgc->min_capacity = capacity;
> >>>>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >>>>> index 8dccb34eb190..bfe84c870bf9 100644
> >>>>> --- a/kernel/sched/sched.h
> >>>>> +++ b/kernel/sched/sched.h
> >>>>> @@ -992,6 +992,7 @@ struct rq {
> >>>>>
> >>>>>           unsigned long           cpu_capacity;
> >>>>>           unsigned long           cpu_capacity_orig;
> >>>>> +       unsigned long           cpu_capacity_inverted;
> >>>>>
> >>>>>           struct callback_head    *balance_callback;
> >>>>>
> >>>>> @@ -2807,6 +2808,11 @@ static inline unsigned long capacity_orig_of(int cpu)
> >>>>>           return cpu_rq(cpu)->cpu_capacity_orig;
> >>>>>    }
> >>>>>
> >>>>> +static inline unsigned long cpu_in_capacity_inversion(int cpu)
> >>>>> +{
> >>>>> +       return cpu_rq(cpu)->cpu_capacity_inverted;
> >>>>> +}
> >>>>> +
> >>>>>    /**
> >>>>>     * enum cpu_util_type - CPU utilization type
> >>>>>     * @FREQUENCY_UTIL:    Utilization used to select frequency
> >>>>>
> >>>>>
> >>>>> --->8---
> >>>>
> >>>> The patch is amazing for me, and the complexity is not too high. Would
> >>>> you please push the patch?
> >>>> I think the idea is yours, I don't want to use it as my patch v2.
> >>>
> >>> I'd be happy to add a commit message so that you can include it in your v2.
> >>>
> >>> First, I'd like to hear from Vincent and Lukasz they're happy with this
> >>> approach.
> >>>
> >>> I've been trying to think how we can do this generically but can't find an
> >>> alternative to the extra loop or additional fallback_cpu_mask. Maybe the mask
> >>> is okay if we protect it with sched_asymmetric_cpucapacity static key..
> >>>
> >>
> >> I'm sorry Qais, I see that you are trying to bring this
> >> real-CPU-capacity information into RT, but the source and quality of
> >> this information IMO might matter. I cannot help you w/o experiment
> >> results of your proposed approach.
> >
> > The question I was posing here is whether to handle thermal only in inversion
> > case as I was suggesting or do better. We are still trickling through the
> > details, but first, I wanted to make sure there's no objection to this
> > direction (detect inversion and bail out in rt_task_fits_capacity() for cpus in
> > capacity inversion).
>
> IMO how you detect that inversion and at which point in time is part of
> the scope.
>
> I would vote for using that thermal update code path + compare other
> CPUs real capacity not capacity_orig to detect inversion.

Okay, I could push patch v2 later. Maybe we can continue to discuss
this topic based on v2.

Thanks!
---
xuewen.yan

  reply	other threads:[~2022-05-14 15:01 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07  5:19 [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Xuewen Yan
2022-04-11  8:21 ` Dietmar Eggemann
2022-04-11  8:52   ` Xuewen Yan
2022-04-11 14:07     ` Dietmar Eggemann
2022-04-13 13:25       ` Lukasz Luba
2022-04-16  2:47         ` Xuewen Yan
2022-04-19  7:14           ` Vincent Guittot
2022-04-19 12:01             ` Lukasz Luba
2022-04-19 12:51               ` Vincent Guittot
2022-04-19 14:13                 ` Lukasz Luba
2022-04-21  8:29                   ` Vincent Guittot
2022-04-21 10:57                     ` Lukasz Luba
2022-04-26  7:39                       ` Vincent Guittot
2022-04-29  9:27                         ` Lukasz Luba
2022-04-20 13:51 ` Qais Yousef
2022-04-21  8:07   ` Xuewen Yan
2022-04-21 16:15     ` Qais Yousef
2022-04-25  1:31       ` Xuewen Yan
2022-04-25 16:12         ` Qais Yousef
2022-04-26  2:07           ` Xuewen Yan
2022-04-26  8:09             ` Vincent Guittot
2022-04-26  9:30               ` Qais Yousef
2022-04-26 10:06                 ` Vincent Guittot
2022-04-26 13:06                   ` Qais Yousef
2022-04-26  9:21             ` Qais Yousef
2022-04-27  1:38               ` Xuewen Yan
2022-04-27 10:58                 ` Qais Yousef
2022-05-01  3:20                   ` Xuewen Yan
2022-05-03 14:43                     ` Qais Yousef
2022-05-09  2:29                       ` Xuewen Yan
2022-05-10 14:56                         ` Qais Yousef
2022-05-10 17:44                           ` Lukasz Luba
2022-05-10 18:44                             ` Qais Yousef
2022-05-10 22:03                               ` Lukasz Luba
2022-05-14 15:01                                 ` Xuewen Yan [this message]
2022-05-14 23:55                                   ` Qais Yousef
2022-05-15  0:53                                     ` [PATCH] sched/rt: Support multi-criterion fitness search for kernel test robot
2022-05-15  1:43                                     ` kernel test robot
2022-05-19 14:16                                     ` [sched] 0eee64011b: canonical_address#:#[##] kernel test robot
2022-06-15 10:13                                     ` [PATCH] sched: Take thermal pressure into account when determine rt fits capacity Qais Yousef
2022-06-15 11:17                                       ` Xuewen Yan
2022-06-15 13:54                                         ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAB8ipk8RUsxYs6JdeMeUPADQRNqT=Jqz15ecwbzbf0RHUDqOXg@mail.gmail.com' \
    --to=xuewen.yan94@gmail.com \
    --cc=Ke.Wang@unisoc.com \
    --cc=di.shen@unisoc.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).