linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org, Xuewen Yan <xuewen.yan94@gmail.com>,
	Lukasz Luba <lukasz.luba@arm.com>, Wei Wang <wvw@google.com>,
	Jonathan JMChen <Jonathan.JMChen@mediatek.com>,
	Hank <han.lin@mediatek.com>
Subject: Re: [PATCH v2 8/9] sched/fair: Detect capacity inversion
Date: Wed, 16 Nov 2022 18:45:46 +0100	[thread overview]
Message-ID: <5ec773d4-0a19-cc04-e3bc-4cb57cf55b09@arm.com> (raw)
In-Reply-To: <20221112193522.g4hhpdlywndvik7r@airbuntu>

On 12/11/2022 20:35, Qais Yousef wrote:
> On 11/09/22 11:42, Dietmar Eggemann wrote:
> 

[...]

>>> +			 * thermal pressure. We record it as capacity
>>> +			 * inversion.
>>> +			 */
>>> +			if (capacity_orig == pd_cap_orig) {
>>> +				pd_cap = pd_cap_orig - thermal_load_avg(cpu_rq(cpu));
>>> +
>>> +				if (pd_cap > inv_cap) {
>>> +					rq->cpu_capacity_inverted = inv_cap;
>>> +					break;
>>> +				}
>>
>> In case `capacity_orig == pd_cap_orig` and cpumask_test_cpu(cpu_of(rq),
>> pd_span) the code can set rq->cpu_capacity_inverted = inv_cap
>> erroneously since thermal_load_avg(rq) can return different values for
>> inv_cap and pd_cap.
> 
> Good catch!
> 
>>
>> So even on a classical big little system, this condition can set
>> rq->cpu_capacity_inverted for a CPU in the little or big cluster.
>>
>> thermal_load_avg(rq) would have to stay constant for all CPUs within the
>> PD to avoid this.
>>
>> This is one example of the `thermal pressure` is per PD (or Frequency
>> Domain) in Thermal but per-CPU in the task scheduler.
> 
> Only compile tested so far, does this patch address all your points? I should
> get hardware soon to run some tests and send the patch. I might re-write it to
> avoid using pds; though it seems cleaner this way but we miss CAS support.
> 
> Thoughts?

I still don't think that the `CPU capacity inversion` implementation
which uses `cap_orig' = cap_orig - thermal load avg (2)` instead of
`cap_orig'' = cap_orig - thermal pressure (1)` for inverted CPUs (i.e.
other PD exists w/ cap_orig > cap_orig') is the right answer, besides
the EAS vs. CAS coverage.

The basic question for me is why do we have to switch between (1) and
(2)? IIRC we introduced (1) in feec() to cater for the CPUfreq policy
min/max capping between schedutil and the CPUfreq driver
__resolve_freq() [drivers/cpufreq/cpufreq.c] (3).

The policy caps are set together with thermal pressure in
cpufreq_set_cur_state() [drivers/thermal/cpufreq_cooling.c] via
freq_qos_update_request().

I think we should only use (2) in the task scheduler even though the
EAS-schedutil machinery would be not 100% in sync in some cases due to (3).
Thermal load avg has similar properties like all the other EWMA-based
signals we use and we have to live with a certain degree of inaccuracy
anyway (e.g. also because of lock-less CPU statistic fetching when
selecting CPU).

And in this case we wouldn't have to have infrastructure to switch
between (1) and (2) at all.

To illustrate the problem I ran 2 workloads (hackbench/sleep) on a H960
board with step-wise thermal governor tracing thermal load_avg
(sched_pelt_thermal), thermal pressure (thermal_pressure_update) and CPU
capacity (sched_cpu_capacity). Would we really gain something
substantial reliably when we would know the diff between (1) and (2)?

https://nbviewer.org/github/deggeman/lisa/blob/ipynbs/ipynb/scratchpad/thermal_pressure.ipynb


  reply	other threads:[~2022-11-16 17:46 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-04 14:36 [PATCH v2 0/9] Fix relationship between uclamp and fits_capacity() Qais Yousef
2022-08-04 14:36 ` [PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-11-04 17:35   ` [PATCH v2 1/9] " Valentin Schneider
2022-11-05 19:24     ` Qais Yousef
2022-11-07 18:58       ` Valentin Schneider
2022-11-08 11:33         ` Qais Yousef
2022-11-09 10:33   ` Dietmar Eggemann
2022-08-04 14:36 ` [PATCH v2 2/9] sched/uclamp: Make task_fits_capacity() use util_fits_cpu() Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 3/9] sched/uclamp: Fix fits_capacity() check in feec() Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 4/9] sched/uclamp: Make select_idle_capacity() use util_fits_cpu() Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 5/9] sched/uclamp: Make asym_fits_capacity() " Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 6/9] sched/uclamp: Make cpu_overutilized() " Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 7/9] sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-08-04 14:36 ` [PATCH v2 8/9] sched/fair: Detect capacity inversion Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-11-09 10:42   ` [PATCH v2 8/9] " Dietmar Eggemann
2022-11-12 19:35     ` Qais Yousef
2022-11-16 17:45       ` Dietmar Eggemann [this message]
2022-11-20 21:30         ` Qais Yousef
2022-08-04 14:36 ` [PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu() Qais Yousef
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Qais Yousef
2022-11-04 17:35   ` [PATCH v2 9/9] " Valentin Schneider
2022-11-05 20:41     ` Qais Yousef
2022-11-07 18:58       ` Valentin Schneider
2022-11-08 11:51         ` Qais Yousef
2022-11-09 10:43       ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ec773d4-0a19-cc04-e3bc-4cb57cf55b09@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=Jonathan.JMChen@mediatek.com \
    --cc=han.lin@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=vincent.guittot@linaro.org \
    --cc=wvw@google.com \
    --cc=xuewen.yan94@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).