linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qais Yousef <qyousef@layalina.io>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org, Xuewen Yan <xuewen.yan94@gmail.com>,
	Lukasz Luba <lukasz.luba@arm.com>, Wei Wang <wvw@google.com>,
	Jonathan JMChen <Jonathan.JMChen@mediatek.com>,
	Hank <han.lin@mediatek.com>
Subject: Re: [PATCH v2 8/9] sched/fair: Detect capacity inversion
Date: Sat, 12 Nov 2022 19:35:22 +0000	[thread overview]
Message-ID: <20221112193522.g4hhpdlywndvik7r@airbuntu> (raw)
In-Reply-To: <68f22089-b3bb-f1da-1fd8-d8a1be34654a@arm.com>

On 11/09/22 11:42, Dietmar Eggemann wrote:

[...]

> > +	/*
> > +	 * Detect if the performance domain is in capacity inversion state.
> > +	 *
> > +	 * Capacity inversion happens when another perf domain with equal or
> > +	 * lower capacity_orig_of() ends up having higher capacity than this
> > +	 * domain after subtracting thermal pressure.
> > +	 *
> > +	 * We only take into account thermal pressure in this detection as it's
> > +	 * the only metric that actually results in *real* reduction of
> > +	 * capacity due to performance points (OPPs) being dropped/become
> > +	 * unreachable due to thermal throttling.
> > +	 *
> > +	 * We assume:
> > +	 *   * That all cpus in a perf domain have the same capacity_orig
> > +	 *     (same uArch).
> > +	 *   * Thermal pressure will impact all cpus in this perf domain
> > +	 *     equally.
> > +	 */
> > +	if (static_branch_unlikely(&sched_asym_cpucapacity)) {
> 
> This should be sched_energy_enabled(). Performance Domains (PDs) are an
> EAS thing.

Bummer. I had a version that used cpumasks only, but I thought using pds is
cleaner and will save unnecessarily extra traversing. But I missed that it's
conditional on sched_energy_enabled().

This is not good news for CAS.

> 
> > +		unsigned long inv_cap = capacity_orig - thermal_load_avg(rq);
> 
> rcu_read_lock()
> 
> > +		struct perf_domain *pd = rcu_dereference(rq->rd->pd);
> 
> rcu_read_unlock()

Shouldn't we continue to hold it while traversing the pd too?

> 
> It's called from build_sched_domains() too. I assume
> static_branch_unlikely(&sched_asym_cpucapacity) hides this issue so far.
> 
> > +
> > +		rq->cpu_capacity_inverted = 0;
> > +
> > +		for (; pd; pd = pd->next) {
> > +			struct cpumask *pd_span = perf_domain_span(pd);
> > +			unsigned long pd_cap_orig, pd_cap;
> > +
> > +			cpu = cpumask_any(pd_span);
> > +			pd_cap_orig = arch_scale_cpu_capacity(cpu);
> > +
> > +			if (capacity_orig < pd_cap_orig)
> > +				continue;
> > +
> > +			/*
> > +			 * handle the case of multiple perf domains have the
> > +			 * same capacity_orig but one of them is under higher
> 
> Like I said above, I'm not aware of such an EAS system.

I did argue against that. But Vincent's PoV was that we shouldn't make
assumptions and handle the case where we have big cores each on its own domain.

> 
> > +			 * thermal pressure. We record it as capacity
> > +			 * inversion.
> > +			 */
> > +			if (capacity_orig == pd_cap_orig) {
> > +				pd_cap = pd_cap_orig - thermal_load_avg(cpu_rq(cpu));
> > +
> > +				if (pd_cap > inv_cap) {
> > +					rq->cpu_capacity_inverted = inv_cap;
> > +					break;
> > +				}
> 
> In case `capacity_orig == pd_cap_orig` and cpumask_test_cpu(cpu_of(rq),
> pd_span) the code can set rq->cpu_capacity_inverted = inv_cap
> erroneously since thermal_load_avg(rq) can return different values for
> inv_cap and pd_cap.

Good catch!

> 
> So even on a classical big little system, this condition can set
> rq->cpu_capacity_inverted for a CPU in the little or big cluster.
> 
> thermal_load_avg(rq) would have to stay constant for all CPUs within the
> PD to avoid this.
> 
> This is one example of the `thermal pressure` is per PD (or Frequency
> Domain) in Thermal but per-CPU in the task scheduler.

Only compile tested so far, does this patch address all your points? I should
get hardware soon to run some tests and send the patch. I might re-write it to
avoid using pds; though it seems cleaner this way but we miss CAS support.

Thoughts?


diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 89dadaafc1ec..b01854984994 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8856,16 +8856,24 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
         *   * Thermal pressure will impact all cpus in this perf domain
         *     equally.
         */
-       if (static_branch_unlikely(&sched_asym_cpucapacity)) {
-               unsigned long inv_cap = capacity_orig - thermal_load_avg(rq);
-               struct perf_domain *pd = rcu_dereference(rq->rd->pd);
+       if (sched_energy_enabled()) {
+               struct perf_domain *pd;
+               unsigned long inv_cap;
+
+               rcu_read_lock();

+               inv_cap = capacity_orig - thermal_load_avg(rq);
+               pd = rcu_dereference(rq->rd->pd);
                rq->cpu_capacity_inverted = 0;

                for (; pd; pd = pd->next) {
                        struct cpumask *pd_span = perf_domain_span(pd);
                        unsigned long pd_cap_orig, pd_cap;

+                       /* We can't be inverted against our own pd */
+                       if (cpumask_test_cpu(cpu_of(rq), pd_span))
+                               continue;
+
                        cpu = cpumask_any(pd_span);
                        pd_cap_orig = arch_scale_cpu_capacity(cpu);

@@ -8890,6 +8898,8 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
                                break;
                        }
                }
+
+               rcu_read_unlock();
        }


Thanks!

--
Qais Yousef

  reply	other threads:[~2022-11-12 19:35 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-04 14:36 [PATCH v2 0/9] Fix relationship between uclamp and fits_capacity() Qais Yousef
2022-08-04 14:36 ` [PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-11-04 17:35   ` [PATCH v2 1/9] " Valentin Schneider
2022-11-05 19:24     ` Qais Yousef
2022-11-07 18:58       ` Valentin Schneider
2022-11-08 11:33         ` Qais Yousef
2022-11-09 10:33   ` Dietmar Eggemann
2022-08-04 14:36 ` [PATCH v2 2/9] sched/uclamp: Make task_fits_capacity() use util_fits_cpu() Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 3/9] sched/uclamp: Fix fits_capacity() check in feec() Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 4/9] sched/uclamp: Make select_idle_capacity() use util_fits_cpu() Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 5/9] sched/uclamp: Make asym_fits_capacity() " Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 6/9] sched/uclamp: Make cpu_overutilized() " Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 7/9] sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 8/9] sched/fair: Detect capacity inversion Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-11-09 10:42   ` [PATCH v2 8/9] " Dietmar Eggemann
2022-11-12 19:35     ` Qais Yousef [this message]
2022-11-16 17:45       ` Dietmar Eggemann
2022-11-20 21:30         ` Qais Yousef
2022-08-04 14:36 ` [PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu() Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-11-04 17:35   ` [PATCH v2 9/9] " Valentin Schneider
2022-11-05 20:41     ` Qais Yousef
2022-11-07 18:58       ` Valentin Schneider
2022-11-08 11:51         ` Qais Yousef
2022-11-09 10:43       ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221112193522.g4hhpdlywndvik7r@airbuntu \
    --to=qyousef@layalina.io \
    --cc=Jonathan.JMChen@mediatek.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=han.lin@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=vincent.guittot@linaro.org \
    --cc=wvw@google.com \
    --cc=xuewen.yan94@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).