linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org,
	dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com,
	yuyang.du@intel.com, pjt@google.com, bsegall@google.com
Subject: Re: [PATCH v2] sched/fair: update scale invariance of PELT
Date: Wed, 12 Apr 2017 13:28:58 +0200	[thread overview]
Message-ID: <20170412112858.75hg75sd3clfxvvk@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20170411130920.GB22895@linaro.org>

On Tue, Apr 11, 2017 at 03:09:20PM +0200, Vincent Guittot wrote:
> Le Tuesday 11 Apr 2017 à 12:49:49 (+0200), Peter Zijlstra a écrit :
> > 
> > Lets go back to the unscaled version:
> > 
> >     running   idle
> >    |*********|---------|
> > 
> > With the current code, that would effectively end up like (again
> > assuming 50%):
> > 
> >     running   idle
> >    |*_*_*_*_*|---------|
> > 
> > 
> > Time stays the same, but we add extra idle cycles.
> 
> In fact it's not really like that because this doesn't reflect the impact of
> the decay window which is not linear with time
> 
> For a task that run at max freq
> 
> (1) |-------|-------|-------|----
>        ****            ****
>     ---****------------****------
> 
> The same task running at half frequency, will run twice longer
>     |-------|-------|-------|----
>        ********        ********                          
>     ---********--------********--
> 
> With the current implementation, we are not inserting idle cycle but
> dividing the contribution by half in order to compensate the fact that the task
> will run twice longer:
> 
> (2) |-------|-------|-------|----
> 
>     ---********--------********--  
> 
> But the final result is neither equal to
> 
>     |-------|-------|-------|----
>        * * * *         * * * *                
>     ---*_*_*_*---------*_*_*_*---
> 
> nor to (1) as described below:

I'm not sure I get what you say; dividing the contribution in half is
exactly equal to the above picture. Because as you can see, there's half
the number of * per window.

> Let assume all durations are aligned with decay window. This will simplify the
> formula for the explanation and will not change the demonstration. We can come back
> on the full equation later on.
> 
> For (1), we have the equation that you write previously:
> 
>      util_sum' = utilsum * y^p +
>                                   p-1
>                d1 * y^p + 1024 * \Sum y^n + d3 * y^0
>  	                              n=1
> 
> Which becomes like below when we are aligned to decay window (d1 and d3 are null)

> (A)  util_sum' = utilsum * y^p +
>                        p-1
> 			   1024 * \Sum y^n
>  	                   n=1
> 
> The running time at max frequency is d and p is the number of decay window period
> In this case p = d/1024us and l = 0
> 
> For (2), task runs at half frequency so the duration d' is twice longer than
> d and p'=2*p. In current implementation, we compensate the increase of running
> duration (and lost idle time) by dividing the contribution by 2:

>      util_sum' = utilsum * y^p' +
>                        p'-1
>                 512 * \Sum y^n
>  	                   n=1
> 
> (B)  util_sum' = utilsum * y^(2*p) +
>                        2*p-1
>                 512 * \Sum y^n
>                        n=1

Hmmm.. but 2p is the actual wall-time of the window period.

> With the new scale invariance, we have the following pattern before scaling
> time:
> 
>     |-------|-------|-------|--
>        ********        ********
>     ---********--------********
> 
> Task still runs at half frequency so the duration d' is twice longer than
> d and p': p'=2*p just like for (2).

> But before computing util_sum', we change the temporal referencial to reflect
> what would have been done at max frequency: we scale the running time (divide it by 2)
> in order to have something like:
> 
>     |-------|-------|-------|--
>        ****            ****
>     ---****____--------****____

So you're moving the window edges away from wall-time.

> so we now have a duration d'' that is half d'and a number of period p'' that
> is half p' so p'' = 1/2 * p' == p

>      util_sum' = utilsum * y^p'' +
>                        p''-1
>                1024 * \Sum y^n
>  	                   n=1
>      util_sum' = utilsum * y^p +
>                        p-1
>                 512 * \Sum y^n
>  	                   n=1


  |---------|---------|          (wall-time)
  ----****------------- F=100%
  ----******----------- F= 66%
  |--------------|----|          (fudge-time)

(explicitly not used 50%, because then the second window would have
collapsed to 0, imagine the joy if you go lower still)

So in fudge-time the first window has 6/15 == 4/10 for the max-freq /
wall-time combo.

> 
> Then l = p' - p''. The lost idle time is tracked to apply the same amount of decay
> window when the task is sleeping
> 
> so at the end we have a number of decay window of p''+l = p'' so we still have
> the same amount of decay window than previously.

Now, we have to stretch time back to equal window size, and while you do
that for the active windows, we have to do manual compensation for idle
windows (which is somewhat ugleh) and is where the lost-time comes from.

Also, this all feels entirely yucky, because as per the above, if we'd
ran at 33%, we'd have ended up with a negative time window.

Not to mention that this only seems to work for low utilization. Once
you hit higher utilization scenarios, where there isn't much idle time
to compensate for the stretching, things go wobbly. Although both
scenarios might end up being the same.

And instead of resurrecting 0 sized windows, you throw them out, which
results in max-util at low F being reported as max-util at F=1, which I
suppose is a useful property and results in the increased ramp-up (which
is a desired property).

So not only do you have non-linear time, you also have non-continuous
time.

I still wonder about the whole !running vs !weight thing.. this all
needs more thinking.

  reply	other threads:[~2017-04-12 11:29 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-10  9:18 [PATCH v2] sched/fair: update scale invariance of PELT Vincent Guittot
2017-04-10 17:38 ` Peter Zijlstra
2017-04-11  7:52   ` Vincent Guittot
2017-04-11  8:53     ` Peter Zijlstra
2017-04-11  9:40       ` Vincent Guittot
2017-04-11 10:41         ` Peter Zijlstra
2017-04-11 10:49           ` Peter Zijlstra
2017-04-11 13:09             ` Vincent Guittot
2017-04-12 11:28               ` Peter Zijlstra [this message]
2017-04-12 14:50                 ` Vincent Guittot
2017-04-12 15:44                   ` Peter Zijlstra
2017-04-13  9:42                     ` Vincent Guittot
2017-04-13 13:32                 ` Peter Zijlstra
2017-04-13 14:59                   ` Vincent Guittot
2017-04-13 18:06                     ` Peter Zijlstra
2017-04-14  8:47                       ` Vincent Guittot
2017-04-11 12:08           ` Vincent Guittot
2017-04-11  9:12     ` Peter Zijlstra
2017-04-11  9:46       ` Vincent Guittot
2017-04-13 13:39     ` Peter Zijlstra
2017-04-13 15:16       ` Vincent Guittot
2017-04-13 16:13         ` Peter Zijlstra
2017-04-14  8:49           ` Vincent Guittot
2017-04-19 16:31             ` Vincent Guittot
2017-04-28 15:52 ` Morten Rasmussen
2017-04-28 17:08   ` Dietmar Eggemann
2017-05-03 17:11   ` Vincent Guittot
2017-04-28 22:09 ` Peter Zijlstra
2017-05-01  9:00   ` Peter Zijlstra
2017-05-02 13:38     ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170412112858.75hg75sd3clfxvvk@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Morten.Rasmussen@arm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=pjt@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).