From: Qais Yousef <qyousef@layalina.io>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
linux-kernel@vger.kernel.org, Xuewen Yan <xuewen.yan94@gmail.com>,
Lukasz Luba <lukasz.luba@arm.com>, Wei Wang <wvw@google.com>,
Jonathan JMChen <Jonathan.JMChen@mediatek.com>,
Hank <han.lin@mediatek.com>
Subject: Re: [PATCH v2 8/9] sched/fair: Detect capacity inversion
Date: Sat, 12 Nov 2022 19:35:22 +0000 [thread overview]
Message-ID: <20221112193522.g4hhpdlywndvik7r@airbuntu> (raw)
In-Reply-To: <68f22089-b3bb-f1da-1fd8-d8a1be34654a@arm.com>
On 11/09/22 11:42, Dietmar Eggemann wrote:
[...]
> > + /*
> > + * Detect if the performance domain is in capacity inversion state.
> > + *
> > + * Capacity inversion happens when another perf domain with equal or
> > + * lower capacity_orig_of() ends up having higher capacity than this
> > + * domain after subtracting thermal pressure.
> > + *
> > + * We only take into account thermal pressure in this detection as it's
> > + * the only metric that actually results in *real* reduction of
> > + * capacity due to performance points (OPPs) being dropped/become
> > + * unreachable due to thermal throttling.
> > + *
> > + * We assume:
> > + * * That all cpus in a perf domain have the same capacity_orig
> > + * (same uArch).
> > + * * Thermal pressure will impact all cpus in this perf domain
> > + * equally.
> > + */
> > + if (static_branch_unlikely(&sched_asym_cpucapacity)) {
>
> This should be sched_energy_enabled(). Performance Domains (PDs) are an
> EAS thing.
Bummer. I had a version that used cpumasks only, but I thought using pds is
cleaner and will save unnecessarily extra traversing. But I missed that it's
conditional on sched_energy_enabled().
This is not good news for CAS.
>
> > + unsigned long inv_cap = capacity_orig - thermal_load_avg(rq);
>
> rcu_read_lock()
>
> > + struct perf_domain *pd = rcu_dereference(rq->rd->pd);
>
> rcu_read_unlock()
Shouldn't we continue to hold it while traversing the pd too?
>
> It's called from build_sched_domains() too. I assume
> static_branch_unlikely(&sched_asym_cpucapacity) hides this issue so far.
>
> > +
> > + rq->cpu_capacity_inverted = 0;
> > +
> > + for (; pd; pd = pd->next) {
> > + struct cpumask *pd_span = perf_domain_span(pd);
> > + unsigned long pd_cap_orig, pd_cap;
> > +
> > + cpu = cpumask_any(pd_span);
> > + pd_cap_orig = arch_scale_cpu_capacity(cpu);
> > +
> > + if (capacity_orig < pd_cap_orig)
> > + continue;
> > +
> > + /*
> > + * handle the case of multiple perf domains have the
> > + * same capacity_orig but one of them is under higher
>
> Like I said above, I'm not aware of such an EAS system.
I did argue against that. But Vincent's PoV was that we shouldn't make
assumptions and handle the case where we have big cores each on its own domain.
>
> > + * thermal pressure. We record it as capacity
> > + * inversion.
> > + */
> > + if (capacity_orig == pd_cap_orig) {
> > + pd_cap = pd_cap_orig - thermal_load_avg(cpu_rq(cpu));
> > +
> > + if (pd_cap > inv_cap) {
> > + rq->cpu_capacity_inverted = inv_cap;
> > + break;
> > + }
>
> In case `capacity_orig == pd_cap_orig` and cpumask_test_cpu(cpu_of(rq),
> pd_span) the code can set rq->cpu_capacity_inverted = inv_cap
> erroneously since thermal_load_avg(rq) can return different values for
> inv_cap and pd_cap.
Good catch!
>
> So even on a classical big little system, this condition can set
> rq->cpu_capacity_inverted for a CPU in the little or big cluster.
>
> thermal_load_avg(rq) would have to stay constant for all CPUs within the
> PD to avoid this.
>
> This is one example of the `thermal pressure` is per PD (or Frequency
> Domain) in Thermal but per-CPU in the task scheduler.
Only compile tested so far, does this patch address all your points? I should
get hardware soon to run some tests and send the patch. I might re-write it to
avoid using pds; though it seems cleaner this way but we miss CAS support.
Thoughts?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 89dadaafc1ec..b01854984994 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8856,16 +8856,24 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
* * Thermal pressure will impact all cpus in this perf domain
* equally.
*/
- if (static_branch_unlikely(&sched_asym_cpucapacity)) {
- unsigned long inv_cap = capacity_orig - thermal_load_avg(rq);
- struct perf_domain *pd = rcu_dereference(rq->rd->pd);
+ if (sched_energy_enabled()) {
+ struct perf_domain *pd;
+ unsigned long inv_cap;
+
+ rcu_read_lock();
+ inv_cap = capacity_orig - thermal_load_avg(rq);
+ pd = rcu_dereference(rq->rd->pd);
rq->cpu_capacity_inverted = 0;
for (; pd; pd = pd->next) {
struct cpumask *pd_span = perf_domain_span(pd);
unsigned long pd_cap_orig, pd_cap;
+ /* We can't be inverted against our own pd */
+ if (cpumask_test_cpu(cpu_of(rq), pd_span))
+ continue;
+
cpu = cpumask_any(pd_span);
pd_cap_orig = arch_scale_cpu_capacity(cpu);
@@ -8890,6 +8898,8 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
break;
}
}
+
+ rcu_read_unlock();
}
Thanks!
--
Qais Yousef
next prev parent reply other threads:[~2022-11-12 19:35 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-04 14:36 [PATCH v2 0/9] Fix relationship between uclamp and fits_capacity() Qais Yousef
2022-08-04 14:36 ` [PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin Qais Yousef
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-11-04 17:35 ` [PATCH v2 1/9] " Valentin Schneider
2022-11-05 19:24 ` Qais Yousef
2022-11-07 18:58 ` Valentin Schneider
2022-11-08 11:33 ` Qais Yousef
2022-11-09 10:33 ` Dietmar Eggemann
2022-08-04 14:36 ` [PATCH v2 2/9] sched/uclamp: Make task_fits_capacity() use util_fits_cpu() Qais Yousef
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 3/9] sched/uclamp: Fix fits_capacity() check in feec() Qais Yousef
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 4/9] sched/uclamp: Make select_idle_capacity() use util_fits_cpu() Qais Yousef
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 5/9] sched/uclamp: Make asym_fits_capacity() " Qais Yousef
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 6/9] sched/uclamp: Make cpu_overutilized() " Qais Yousef
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 7/9] sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition Qais Yousef
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 8/9] sched/fair: Detect capacity inversion Qais Yousef
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-11-09 10:42 ` [PATCH v2 8/9] " Dietmar Eggemann
2022-11-12 19:35 ` Qais Yousef [this message]
2022-11-16 17:45 ` Dietmar Eggemann
2022-11-20 21:30 ` Qais Yousef
2022-08-04 14:36 ` [PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu() Qais Yousef
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-11-04 17:35 ` [PATCH v2 9/9] " Valentin Schneider
2022-11-05 20:41 ` Qais Yousef
2022-11-07 18:58 ` Valentin Schneider
2022-11-08 11:51 ` Qais Yousef
2022-11-09 10:43 ` Dietmar Eggemann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221112193522.g4hhpdlywndvik7r@airbuntu \
--to=qyousef@layalina.io \
--cc=Jonathan.JMChen@mediatek.com \
--cc=dietmar.eggemann@arm.com \
--cc=han.lin@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=vincent.guittot@linaro.org \
--cc=wvw@google.com \
--cc=xuewen.yan94@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).