From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>,
peterz@infradead.org, mingo@kernel.org,
linux-kernel@vger.kernel.org
Cc: rjw@rjwysocki.net, Morten.Rasmussen@arm.com,
patrick.bellasi@arm.com, pjt@google.com, bsegall@google.com,
thara.gopinath@linaro.org
Subject: Re: [PATCH v4 2/2] sched/fair: update scale invariance of PELT
Date: Thu, 25 Oct 2018 12:35:56 +0200 [thread overview]
Message-ID: <43b126ab-403b-3fb3-5951-45a107e4a14b@arm.com> (raw)
In-Reply-To: <1539965871-22410-3-git-send-email-vincent.guittot@linaro.org>
Hi Vincent,
On 10/19/18 6:17 PM, Vincent Guittot wrote:
> The current implementation of load tracking invariance scales the
> contribution with current frequency and uarch performance (only for
> utilization) of the CPU. One main result of this formula is that the
> figures are capped by current capacity of CPU. Another one is that the
> load_avg is not invariant because not scaled with uarch.
>
> The util_avg of a periodic task that runs r time slots every p time slots
> varies in the range :
>
> U * (1-y^r)/(1-y^p) * y^i < Utilization < U * (1-y^r)/(1-y^p)
>
> with U is the max util_avg value = SCHED_CAPACITY_SCALE
>
> At a lower capacity, the range becomes:
>
> U * C * (1-y^r')/(1-y^p) * y^i' < Utilization < U * C * (1-y^r')/(1-y^p)
>
> with C reflecting the compute capacity ratio between current capacity and
> max capacity.
>
> so C tries to compensate changes in (1-y^r') but it can't be accurate.
>
> Instead of scaling the contribution value of PELT algo, we should scale the
> running time. The PELT signal aims to track the amount of computation of
> tasks and/or rq so it seems more correct to scale the running time to
> reflect the effective amount of computation done since the last update.
>
> In order to be fully invariant, we need to apply the same amount of
> running time and idle time whatever the current capacity. Because running
> at lower capacity implies that the task will run longer, we have to ensure
> that the same amount of idle time will be apply when system becomes idle
> and no idle time has been "stolen". But reaching the maximum utilization
> value (SCHED_CAPACITY_SCALE) means that the task is seen as an
> always-running task whatever the capacity of the CPU (even at max compute
> capacity). In this case, we can discard this "stolen" idle times which
> becomes meaningless.
>
> In order to achieve this time scaling, a new clock_pelt is created per rq.
> The increase of this clock scales with current capacity when something
> is running on rq and synchronizes with clock_task when rq is idle. With
> this mecanism, we ensure the same running and idle time whatever the
> current capacity. This also enables to simplify the pelt algorithm by
> removing all references of uarch and frequency and applying the same
> contribution to utilization and loads. Furthermore, the scaling is done
> only once per update of clock (update_rq_clock_task()) instead of during
> each update of sched_entities and cfs/rt/dl_rq of the rq like the current
> implementation. This is interesting when cgroup are involved as shown in
> the results below:
I have a couple of questions related to the tests you ran.
> On a hikey (octo ARM platform).
> Performance cpufreq governor and only shallowest c-state to remove variance
> generated by those power features so we only track the impact of pelt algo.
So you disabled c-state 'cpu-sleep' and 'cluster-sleep'?
I get 'hisi_thermal f7030700.tsensor: THERMAL ALARM: 66385 > 65000' on
my hikey620. Did you change the thermal configuration? Not sure if there
are any actions attached to this warning though.
> each test runs 16 times
>
> ./perf bench sched pipe
> (higher is better)
> kernel tip/sched/core + patch
> ops/seconds ops/seconds diff
> cgroup
> root 59648(+/- 0.13%) 59785(+/- 0.24%) +0.23%
> level1 55570(+/- 0.21%) 56003(+/- 0.24%) +0.78%
> level2 52100(+/- 0.20%) 52788(+/- 0.22%) +1.32%
>
> hackbench -l 1000
Shouldn't this be '-l 100'?
> (lower is better)
> kernel tip/sched/core + patch
> duration(sec) duration(sec) diff
> cgroup
> root 4.472(+/- 1.86%) 4.346(+/- 2.74%) -2.80%
> level1 5.039(+/- 11.05%) 4.662(+/- 7.57%) -7.47%
> level2 5.195(+/- 10.66%) 4.877(+/- 8.90%) -6.12%
>
> The responsivness of PELT is improved when CPU is not running at max
> capacity with this new algorithm. I have put below some examples of
> duration to reach some typical load values according to the capacity of the
> CPU with current implementation and with this patch.
>
> Util (%) max capacity half capacity(mainline) half capacity(w/ patch)
> 972 (95%) 138ms not reachable 276ms
> 486 (47.5%) 30ms 138ms 60ms
> 256 (25%) 13ms 32ms 26ms
Could you describe these testcases in more detail?
So I assume you run one 100% task (possibly pinned to one CPU) on your
hikey620 with userspace governor and for:
(1) max capacity:
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
(2) half capacity:
echo 729000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
and then you measure the time till t1 reaches 25%, 47.5% and 95%
utilization?
What's the initial utilization value of t1? I assume t1 starts with
utilization=512 (post_init_entity_util_avg()).
> On my hikey (octo ARM platform) with schedutil governor, the time to reach
> max OPP when starting from a null utilization, decreases from 223ms with
> current scale invariance down to 121ms with the new algorithm. For this
> test, I have enable arch_scale_freq for arm64.
Isn't the arch-specific arch_scale_freq_capacity() enabled by default on
arm64 with cpufreq support?
I would like to run the same tests so we can discuss results more easily.
next prev parent reply other threads:[~2018-10-25 10:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-19 16:17 [PATCH v4 0/2] sched/fair: update scale invariance of PELT Vincent Guittot
2018-10-19 16:17 ` [PATCH 1/2] sched/fair: move rq_of helper function Vincent Guittot
2018-10-20 0:44 ` kbuild test robot
2018-10-19 16:17 ` [PATCH v4 2/2] sched/fair: update scale invariance of PELT Vincent Guittot
2018-10-23 5:59 ` Pavan Kondeti
2018-10-23 12:15 ` Vincent Guittot
2018-10-24 4:53 ` Pavan Kondeti
2018-10-24 9:07 ` Vincent Guittot
2018-10-23 10:00 ` Peter Zijlstra
2018-10-23 12:15 ` Vincent Guittot
2018-10-25 10:35 ` Dietmar Eggemann [this message]
2018-10-25 10:43 ` Vincent Guittot
2018-10-25 11:08 ` Dietmar Eggemann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43b126ab-403b-3fb3-5951-45a107e4a14b@arm.com \
--to=dietmar.eggemann@arm.com \
--cc=Morten.Rasmussen@arm.com \
--cc=bsegall@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=patrick.bellasi@arm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=rjw@rjwysocki.net \
--cc=thara.gopinath@linaro.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).