linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>,
	Paul Turner <pjt@google.com>, Ben Segall <bsegall@google.com>,
	Thara Gopinath <thara.gopinath@linaro.org>,
	pkondeti@codeaurora.org, Quentin Perret <quentin.perret@arm.com>
Subject: Re: [PATCH v6 2/2] sched/fair: update scale invariance of PELT
Date: Wed, 14 Nov 2018 15:56:07 -0800	[thread overview]
Message-ID: <CAKfTPtB6Fm2xJQ04GSs5o58Kh-GHxX_Wmn2bfLu=wJPmydijyw@mail.gmail.com> (raw)
In-Reply-To: <f7725461-bb0e-27fd-04cc-a225c97a9940@arm.com>

On Mon, 12 Nov 2018 at 18:53, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 11/9/18 8:20 AM, Vincent Guittot wrote:
>
> [...]
>
> > In order to achieve this time scaling, a new clock_pelt is created per rq.
> > The increase of this clock scales with current capacity when something
> > is running on rq and synchronizes with clock_task when rq is idle. With
> > this mecanism, we ensure the same running and idle time whatever the
>
> nitpick: s/mecanism/mechanism
>
> [...]
>
> > The responsivness of PELT is improved when CPU is not running at max
>
> nitpick: s/responsivness/responsiveness
>
> > capacity with this new algorithm. I have put below some examples of
> > duration to reach some typical load values according to the capacity of the
> > CPU with current implementation and with this patch. These values has been
> > computed based on the geometric serie and the half period value:
>
> nitpick: s/serie/series

ok for this and previous

>
> [...]
>
> > +/*
> > + * The clock_pelt scales the time to reflect the effective amount of
> > + * computation done during the running delta time but then sync back to
> > + * clock_task when rq is idle.
> > + *
> > + *
> > + * absolute time   | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16
> > + * @ max capacity  ------******---------------******---------------
> > + * @ half capacity ------************---------************---------
> > + * clock pelt      | 1| 2|    3|    4| 7| 8| 9|   10|   11|14|15|16
> > + *
> > + */
> > +static inline void update_rq_clock_pelt(struct rq *rq, s64 delta)
> > +{
> > +     if (unlikely(is_idle_task(rq->curr))) {
> > +             /* The rq is idle, we can sync to clock_task */
> > +             rq->clock_pelt  = rq_clock_task(rq);
> > +             return;
>
> I think the term (time) stretching was used to to describe what's
> happening to the clock_pelt values at lower capacity and to this re-sync
> with the clock task. But IMHO, one has to be called stretching and the
> other compressing so it makes sense. I think it's a question of definition.
>
> > +     }
> > +
> > +     /*
> > +      * When a rq runs at a lower compute capacity, it will need
> > +      * more time to do the same amount of work than at max
> > +      * capacity: either because it takes more time to compute the
> > +      * same amount of work or because taking more time means
> > +      * sharing more often the CPU between entities.
>
> I wonder if since clock_pelt is related to the sched_avg(s) of the rq
> isn't the only reason the first one "It takes more time to do the same
> amount of work"? IMHO, the sharing of sched entities shouldn't be
> visible here.

yes probably

>
> > +      * In order to be invariant, we scale the delta to reflect how
> > +      * much work has been really done.
> > +      * Running at lower capacity also means running longer to do
> > +      * the same amount of work and this results in stealing some
>
> This is already mentioned above.
>
> > +      * idle time that will disturb the load signal compared to
> > +      * max capacity; This stolen idle time will be automaticcally
>
> nitpick: s/automaticcally/automatically
>
> > +      * reflected when the rq will be idle and the clock will be
> > +      * synced with rq_clock_task.
> > +      */
> > +
> > +     /*
> > +      * scale the elapsed time to reflect the real amount of
> > +      * computation
> > +      */
> > +     delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu_of(rq)));
> > +     delta = cap_scale(delta, arch_scale_freq_capacity(cpu_of(rq)));
> > +
> > +     rq->clock_pelt += delta;
> > +}
> > +
> > +/*
> > + * When rq becomes idle, we have to check if it has lost some idle time
> > + * because it was fully busy. A rq is fully used when the /Sum util_sum
> > + * is greater or equal to:
> > + * (LOAD_AVG_MAX - 1024 + rq->cfs.avg.period_contrib) << SCHED_CAPACITY_SHIFT;
> > + * For optimization and computing rounding purpose, we don't take into account
> > + * the position in the current window (period_contrib) and we use the maximum
> > + * util_avg value minus 1
> > + */
>
> In v4 you were using:
>
> u32 divider = (LOAD_AVG_MAX - 1024 + rq->cfs.avg.period_contrib) <<
> SCHED_CAPACITY_SHIFT;
>
> and switched in v5 to:
>
> u32 divider = ((LOAD_AVG_MAX - 1024) << SCHED_CAPACITY_SHIFT) -
> LOAD_AVG_MAX;
>
> The period_contrib of rq->cfs.avg, rq->avg_rt and rq->avg_dl are not
> necessarily aligned but for overload you sum up the util_sum values for
> cfs, rt and dl. Was this also a reason why you now assume max util_avg -
> 1 ?

The original reason is optimization

>
> > +static inline void update_idle_rq_clock_pelt(struct rq *rq)
> > +{
> > +     u32 divider = ((LOAD_AVG_MAX - 1024) << SCHED_CAPACITY_SHIFT) - LOAD_AVG_MAX;
>
> util_avg = util_sum / divider   ,maximum util_avg = 1024
>
> 1024 = util_sum / (LOAD_AVG_MAX - 1024)   w/ period_contrib = 0
>
> util_sum >= (LOAD_AVG_MAX - 1024) * 1024
>
> util_sum >= (LOAD_AVG_MAX - 1024) << SCHED_CAPACITY_SHIFT;
>
> So you want to use 1024 - 1 = 1023 instead. Wouldn't you have to
> subtract (LOAD_AVG_MAX - 1024) from (LOAD_AVG_MAX - 1024) <<
> SCHED_CAPACITY_SHIFT in this case?
>
> util_sum >= (LOAD_AVG_MAX - 1024) << SCHED_CAPACITY_SHIFT -
> (LOAD_AVG_MAX - 1024)

(LOAD_AVG_MAX - 1024) is the lower bound of the max value ( it's in
fact LOAD_AVG_MAX*y) so in order to be conservative and prevent any
rounding effect, I'm using the higher bound (LOAD_AVG_MAX*y + a full
new window (1024)) = LOAD_AVG_MAX for the Sum of util_sum side

And his is even more true now that we sum 3 util_sum values

> > +     u32 overload = rq->cfs.avg.util_sum;
> > +     overload += rq->avg_rt.util_sum;
> > +     overload += rq->avg_dl.util_sum;
> > +
> > +     /*
> > +      * Reflecting some stolen time makes sense only if the idle
> > +      * phase would be present at max capacity. As soon as the
> > +      * utilization of a rq has reached the maximum value, it is
> > +      * considered as an always runnnig rq without idle time to
>
> nitpick: s/runnnig/runnig

ok

>
> > +      * steal. This potential idle time is considered as lost in
> > +      * this case. We keep track of this lost idle time compare to
> > +      * rq's clock_task.
> > +      */
> > +     if ((overload >= divider))
> > +             rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt;
>
> Shouldn't overload still be called util_sum? Overload (or overutilized
> is IMHO the state when util_sum >= divider.

yes i can rename it

>
> [...]
>

  reply	other threads:[~2018-11-14 23:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-09 16:20 [PATCH v6 0/2] sched/fair: update scale invariance of PELT Vincent Guittot
2018-11-09 16:20 ` [PATCH v6 1/2] sched/fair: move rq_of helper function Vincent Guittot
2018-11-09 16:20 ` [PATCH v6 2/2] sched/fair: update scale invariance of PELT Vincent Guittot
2018-11-13  2:53   ` Dietmar Eggemann
2018-11-14 23:56     ` Vincent Guittot [this message]
2018-11-13  2:58 ` [PATCH v6 0/2] " Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtB6Fm2xJQ04GSs5o58Kh-GHxX_Wmn2bfLu=wJPmydijyw@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=Morten.Rasmussen@arm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=pkondeti@codeaurora.org \
    --cc=quentin.perret@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=thara.gopinath@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).