All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Douglas Raillard <douglas.raillard@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"open list:THERMAL" <linux-pm@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	viresh kumar <viresh.kumar@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Quentin Perret <qperret@google.com>,
	Patrick Bellasi <patrick.bellasi@matbug.net>,
	dh.han@samsung.com
Subject: Re: [RFC PATCH v3 0/6] sched/cpufreq: Make schedutil energy aware
Date: Fri, 18 Oct 2019 17:15:53 +0200	[thread overview]
Message-ID: <CAKfTPtATv+TaLus3ggijLWf0KAkexHgpHOTq++iqxaB4jeo-=w@mail.gmail.com> (raw)
In-Reply-To: <32d07c51-847d-9d51-480c-c8836f1aedc7@arm.com>

On Fri, 18 Oct 2019 at 16:44, Douglas Raillard <douglas.raillard@arm.com> wrote:
>
>
>
> On 10/18/19 1:07 PM, Peter Zijlstra wrote:
> > On Fri, Oct 18, 2019 at 12:46:25PM +0100, Douglas Raillard wrote:
> >
> >>> What I don't see is how that that difference makes sense as input to:
> >>>
> >>>     cost(x) : (1 + x) * cost_j
> >>
> >> The actual input is:
> >> x = (EM_COST_MARGIN_SCALE/SCHED_CAPACITY_SCALE) * (util - util_est)
> >>
> >> Since EM_COST_MARGIN_SCALE == SCHED_CAPACITY_SCALE == 1024, this factor of 1
> >> is not directly reflected in the code but is important for units
> >> consistency.
> >
> > But completely irrelevant for the actual math and conceptual
> > understanding.
>
>  > how that that difference makes sense as input to
> I was unsure if you referred to the units being inconsistent or the
> actual way of computing values being strange, so I provided some
> justification for both.
>
> > Just because computers suck at real numbers, and floats
> > are expensive, doesn't mean we have to burden ourselves with fixed point
> > when writing equations.
> >
> > Also, as a physicist I'm prone to normalizing everything to 1, because
> > that's lazy.
> >
> >>> I suppose that limits the additional OPP to twice the previously
> >>> selected cost / efficiency (see the confusion from that other email).
> >>> But given that efficency drops (or costs rise) for higher OPPs that
> >>> still doesn't really make sense..
> >
> >> Yes, this current limit to +100% freq boosting is somehow arbitrary and
> >> could probably benefit from being tunable in some way (Kconfig option
> >> maybe). When (margin > 0), we end up selecting an OPP that has a higher cost
> >> than the one strictly required, which is expected. The goal is to speed
> >> things up at the expense of more power consumed to achieve the same work,
> >> hence at a lower efficiency (== higher cost).
> >
> > No, no Kconfig knobs.
> >
> >> That's the main reason why this boosting apply a margin on the cost of the
> >> selected OPP rather than just inflating the util. This allows controlling
> >> directly how much more power (battery life) we are going to spend to achieve
> >> some work that we know could be achieved with less power.
> >
> > But you're not; the margin is relative to the OPP, it is not absolute.
>
> Considering a CPU with 1024 max capacity (since we are not talking about
> migrations here, we can ignore CPU invariance):
>
> work = normalized number of iterations of a given busy loop
> # Thanks to freq invariance
> work = util (between 0 and 1)
> util = f/f_max
>
> # f(work) is the min freq that is admissible for "work", which we will
> # abbreviate as "f"
> f(work) = work * f_max
>
> # from struct em_cap_state doc in energy_model.h
> cost(f) = power(f) * f_max / f
> cost(f) = power(f) / util
> cost(f) = power(f) / work
> power(f) = cost(f) * work
>
> boosted_cost(f) = cost(f) + x

In em_pd_get_higher_freq, the boost is a % of cost(f)  so it should be
boosted_cost(f)=cost(f)1+ cost(f)*x

> boosted_power(f) = boosted_cost(f) * work
> boosted_power(f) = (cost(f) + x) * work
>
> # Let's normalize cost() so we can forget about f and deal only with work.
> cost'(work) = cost(f)/cost(f_max)
> x' = x/cost(f_max)
> boosted_power'(work) = (cost'(work) + x') * work
> boosted_power'(work) = cost'(work) * work + x' * work
> boosted_power'(work) = power'(work) + x' * work
> boosted_power'(work) = power'(work) + A(work)
>
> # Over a duration T, spend an extra B unit of energy
> B(work) = A(work) * T
> lost_battery_percent(work) = 100 * B(work)/total_battery_energy
> lost_battery_percent(work) = 100 * T * x' * work /total_battery_energy
> lost_battery_percent(work) =
>   (100 * T / cost(f_max) / total_battery_energy) * x * work
>
> This means that the effect of boosting on battery life is proportional
> to "x" unless I made a mistake somewhere.
>
> >
> > Or rather, the only actual limit is in relation to the max OPP. So you
> > have very little actual control over how much more energy you're
> > spending.
> >
> >>> So while I agree that 2) is a reasonable signal to work from, everything
> >>> that comes after is still much confusing me.
> >
> >> "When applying these boosting rules on the runqueue util signals ...":
> >> Assuming the set of enqueued tasks stays the same between 2 observations
> >> from schedutil, if we see the rq util_avg increase above its
> >> util_est.enqueued, that means that at least one task had its util_avg go
> >> above util_est.enqueued. We might miss some boosting opportunities if some
> >> (util - util_est) compensates:
> >> TASK_1(util - util_est) = - TASK_2(util - util_est)
> >> but working on the aggregated value is much easier in schedutil, to avoid
> >> crawling the list of entities.
> >
> > That still does not explain why 'util - util_est', when >0, makes for a
> > sensible input into an OPP relative function > I agree that 'util - util_est', when >0, indicates utilization is
> > increasing (for the aperiodic blah blah blah). But after that I'm still
> > confused.
>
> For the same reason PELT makes a sensible input for OPP selection.
> Currently, OPP selection is based on max(util_avg, util_est.enqueued)
> (from cpu_util_cfs in sched.h), so as soon as we have
> (util - util_est > 0), the OPP will be selected according to util_avg.
> In a way, using util_avg there is already some kind of boosting.
>
> Since the boosting is essentially (util - constant), it grows the same
> way as util. If we think of (util - util_est) as being some estimation
> of how wrong we were in the estimation of the task "true" utilization of
> the CPU, then it makes sense to feed that to the boost. The wronger we
> were, the more we want to boost, because the more time passes, the more
> the scheduler realizes it actually does not know what the task needs. In
> doubt, provide a higher freq than usual until we get to know this task
> better. When that happens (at the next period), boosting is disabled and
> we revert to the usual behavior (aka margin=0).
>
> Hope we are converging to some wording that makes sense.

  reply	other threads:[~2019-10-18 15:16 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-11 13:44 [RFC PATCH v3 0/6] sched/cpufreq: Make schedutil energy aware Douglas RAILLARD
2019-10-11 13:44 ` [RFC PATCH v3 1/6] PM: Introduce em_pd_get_higher_freq() Douglas RAILLARD
2019-10-17  8:57   ` Dietmar Eggemann
2019-10-17  9:58   ` Dietmar Eggemann
2019-10-17 11:09     ` Douglas Raillard
2019-10-11 13:44 ` [RFC PATCH v3 2/6] sched/cpufreq: Attach perf domain to sugov policy Douglas RAILLARD
2019-10-17  8:57   ` Dietmar Eggemann
2019-10-17 10:22     ` Douglas Raillard
2019-10-11 13:44 ` [RFC PATCH v3 3/6] sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq() Douglas RAILLARD
2019-10-11 13:44 ` [RFC PATCH v3 4/6] sched/cpufreq: Introduce sugov_cpu_ramp_boost Douglas RAILLARD
2019-10-14 14:33   ` Peter Zijlstra
2019-10-14 15:32     ` Douglas Raillard
2019-10-17  8:57   ` Dietmar Eggemann
2019-10-17 11:19     ` Douglas Raillard
2019-10-11 13:44 ` [RFC PATCH v3 5/6] sched/cpufreq: Boost schedutil frequency ramp up Douglas RAILLARD
2019-10-17  9:21   ` Dietmar Eggemann
2019-10-11 13:45 ` [RFC PATCH v3 6/6] sched/cpufreq: Add schedutil_em_tp tracepoint Douglas RAILLARD
2019-10-14 14:53 ` [RFC PATCH v3 0/6] sched/cpufreq: Make schedutil energy aware Peter Zijlstra
2019-10-14 15:50   ` Douglas Raillard
2019-10-17  9:50     ` Peter Zijlstra
2019-10-17 11:11       ` Quentin Perret
2019-10-17 14:11         ` Peter Zijlstra
2019-10-18  7:44           ` Dietmar Eggemann
2019-10-18  7:59             ` Peter Zijlstra
2019-10-18 17:24               ` Douglas Raillard
2019-10-18  8:11             ` Peter Zijlstra
2019-10-17 14:23       ` Douglas Raillard
2019-10-17 14:53         ` Peter Zijlstra
2019-10-17 19:07         ` Peter Zijlstra
2019-10-18 11:46           ` Douglas Raillard
2019-10-18 12:07             ` Peter Zijlstra
2019-10-18 14:44               ` Douglas Raillard
2019-10-18 15:15                 ` Vincent Guittot [this message]
2019-10-18 16:03                   ` Douglas Raillard
2019-10-18 15:20                 ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtATv+TaLus3ggijLWf0KAkexHgpHOTq++iqxaB4jeo-=w@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=dh.han@samsung.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=douglas.raillard@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@matbug.net \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=rjw@rjwysocki.net \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.