linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Julia Lawall <julia.lawall@inria.fr>
To: Francisco Jerez <currojerez@riseup.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Len Brown <lenb@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: cpufreq: intel_pstate: map utilization into the pstate range
Date: Thu, 6 Jan 2022 20:49:08 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.22.394.2201062044340.3098@hadrien> (raw)
In-Reply-To: <87a6g9rbje.fsf@riseup.net>



On Wed, 5 Jan 2022, Francisco Jerez wrote:

> Julia Lawall <julia.lawall@inria.fr> writes:
>
> > On Tue, 4 Jan 2022, Rafael J. Wysocki wrote:
> >
> >> On Tue, Jan 4, 2022 at 4:49 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> >> >
> >> > I tried the whole experiment again on an Intel w2155 (one socket, 10
> >> > physical cores, pstates 12, 33, and 45).
> >> >
> >> > For the CPU there is a small jump a between 32 and 33 - less than for the
> >> > 6130.
> >> >
> >> > For the RAM, there is a big jump between 21 and 22.
> >> >
> >> > Combining them leaves a big jump between 21 and 22.
> >>
> >> These jumps are most likely related to voltage increases.
> >>
> >> > It seems that the definition of efficient is that there is no more cost
> >> > for the computation than the cost of simply having the machine doing any
> >> > computation at all.  It doesn't take into account the time and energy
> >> > required to do some actual amount of work.
> >>
> >> Well, that's not what I wanted to say.
> >
> > I was referring to Francisco's comment that the lowest indicated frequency
> > should be the most efficient one.  Turbostat also reports the lowest
> > frequency as the most efficient one.  In my graph, there are the pstates 7
> > and 10, which give exactly the same energy consumption as 12.  7 and 10
> > are certainly less efficient, because the energy consumption is the same,
> > but the execution speed is lower.
> >
> >> Of course, the configuration that requires less energy to be spent to
> >> do a given amount of work is more energy-efficient.  To measure this,
> >> the system needs to be given exactly the same amount of work for each
> >> run and the energy spent by it during each run needs to be compared.
>
> I disagree that the system needs to be given the exact same amount of
> work in order to measure differences in energy efficiency.  The average
> energy efficiency of Julia's 10s workloads can be calculated easily in
> both cases (e.g. as the W/E ratio below, W will just be a different
> value for each run), and the result will likely approximate the
> instantaneous energy efficiency of the fixed P-states we're comparing,
> since her workload seems to be fairly close to a steady state.
>
> >
> > This is bascially my point of view, but there is a question about it.  If
> > over 10 seconds you consume 10J and by running twice as fast you would
> > consume only 6J, then how do you account for the nest 5 seconds?  If the
> > machine is then idle for the next 5 seconds, maybe you would end up
> > consuming 8J in total over the 10 seconds.  But if you take advantage of
> > the free 5 seconds to pack in another job, then you end up consuming 12J.
> >
>
> Geometrically, such an oscillatory workload with periods of idling and
> periods of activity would give an average power consumption along the
> line that passes through the points corresponding to both states on the
> CPU's power curve -- IOW your average power consumption will just be the
> weighted average of the power consumption of each state (with the duty
> cycle t_i/t_total of each state being its weight):
>
> P_avg = t_0/t_total * P_0 + t_1/t_total * P_1
>
> Your energy usage would just be 10s times that P_avg, since you're
> assuming that the total runtime of the workload is fixed at 10s
> independent of how long the CPU actually takes to complete the
> computation.  In cases where the P-state during the period of activity
> t_1 is equal or lower to the maximum efficiency P-state, that line
> segment is guaranteed to lie below the power curve, indicating that such
> oscillation is more efficient than running the workload fixed to its
> average P-state.
>
> That said, this scenario doesn't really seem very relevant to your case,
> since the last workload you've provided turbostat traces for seems to
> show almost no oscillation.  If there was such an oscillation, your
> total energy usage would still be greater for oscillations between idle
> and some P-state different from the most efficient one.  Such an
> oscillation doesn't explain the anomaly we're seeing on your traces,
> which show more energy-efficient instantaneous behavior for a P-state 2x
> the one reported by your processor as the most energy-efficient.

All the turbostat output and graphs I have sent recently were just for
continuous spinning:

for(;;);

Now I am trying running for the percentage of the time corresponding to
10 / P for pstate P (ie 0.5 of the time for pstate 20), and then sleeping,
to see whether one can just add the sleeping power consumption of the
machine to compute the efficiency as Rafael suggested.

julia

>
> >> However, I think that you are interested in answering a different
> >> question: Given a specific amount of time (say T) to run the workload,
> >> what frequency to run the CPUs doing the work at in order to get the
> >> maximum amount of work done per unit of energy spent by the system (as
> >> a whole)?  Or, given 2 different frequency levels, which of them to
> >> run the CPUs at to get more work done per energy unit?
> >
> > This is the approach where you assume that the machine will be idle in any
> > leftover time.  And it accounts for the energy consumed in that idle time.
> >
> >> The work / energy ratio can be estimated as
> >>
> >> W / E = C * f / P(f)
> >>
> >> where C is a constant and P(f) is the power drawn by the whole system
> >> while the CPUs doing the work are running at frequency f, and of
> >> course for the system discussed previously it is greater in the 2 GHz
> >> case.
> >>
> >> However P(f) can be divided into two parts, P_1(f) that really depends
> >> on the frequency and P_0 that does not depend on it.  If P_0 is large
> >> enough to dominate P(f), which is the case in the 10-20 range of
> >> P-states on the system in question, it is better to run the CPUs doing
> >> the work faster (as long as there is always enough work to do for
> >> them; see below).  This doesn't mean that P(f) is not a convex
> >> function of f, though.
> >>
> >> Moreover, this assumes that there will always be enough work for the
> >> system to do when running the busy CPUs at 2 GHz, or that it can go
> >> completely idle when it doesn't do any work, but let's see what
> >> happens if the amount of work to do is W_1 = C * 1 GHz * T and the
> >> system cannot go completely idle when the work is done.
> >>
> >> Then, nothing changes for the busy CPUs running at 1 GHz, but in the 2
> >> GHz case we get W = W_1 and E = P(2 GHz) * T/2 + P_0 * T/2, because
> >> the busy CPUs are only busy 1/2 of the time, but power P_0 is drawn by
> >> the system regardless.  Hence, in the 2 GHz case (assuming P(2 GHz) =
> >> 120 W and P_0 = 90 W), we get
> >>
> >> W / E = 2 * C * 1 GHz / (P(2 GHz) + P_0) = 0.0095 * C * 1 GHz
> >>
> >> which is slightly less than the W / E ratio at 1 GHz approximately
> >> equal to 0.01 * C * 1 GHz (assuming P(1 GHz) = 100 W), so in these
> >> conditions it would be better to run the busy CPUs at 1 GHz.
> >
> > OK, I'll try to measure this.
> >
> > thanks,
> > julia
>

  reply	other threads:[~2022-01-06 19:49 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-13 22:52 cpufreq: intel_pstate: map utilization into the pstate range Julia Lawall
2021-12-17 18:36 ` Rafael J. Wysocki
2021-12-17 19:32   ` Julia Lawall
2021-12-17 20:36     ` Francisco Jerez
     [not found]       ` <alpine.DEB.2.22.394.2112172258480.2968@hadrien>
2021-12-18  0:04         ` Francisco Jerez
2021-12-18  6:12           ` Julia Lawall
2021-12-18 10:19             ` Francisco Jerez
2021-12-18 11:07               ` Julia Lawall
2021-12-18 22:12                 ` Francisco Jerez
2021-12-19  6:42                   ` Julia Lawall
2021-12-19 14:19                     ` Rafael J. Wysocki
2021-12-19 14:30                       ` Rafael J. Wysocki
2021-12-19 22:10                     ` Francisco Jerez
     [not found]                       ` <alpine.DEB.2.22.394.2112192312520.3181@hadrien>
2021-12-19 23:31                         ` Francisco Jerez
2021-12-21 17:04                       ` Rafael J. Wysocki
2021-12-21 23:56                         ` Francisco Jerez
2021-12-22 14:54                           ` Rafael J. Wysocki
2021-12-24 11:08                             ` Julia Lawall
2021-12-28 16:58                           ` Julia Lawall
2021-12-28 17:40                             ` Rafael J. Wysocki
2021-12-28 17:46                               ` Julia Lawall
2021-12-28 18:06                                 ` Rafael J. Wysocki
2021-12-28 18:16                                   ` Julia Lawall
2021-12-29  9:13                                   ` Julia Lawall
2021-12-30 17:03                                     ` Rafael J. Wysocki
2021-12-30 17:54                                       ` Julia Lawall
2021-12-30 17:58                                         ` Rafael J. Wysocki
2021-12-30 18:20                                           ` Julia Lawall
2021-12-30 18:37                                             ` Rafael J. Wysocki
2021-12-30 18:44                                               ` Julia Lawall
2022-01-03 15:50                                                 ` Rafael J. Wysocki
2022-01-03 16:41                                                   ` Julia Lawall
2022-01-03 18:23                                                   ` Julia Lawall
2022-01-03 19:58                                                     ` Rafael J. Wysocki
2022-01-03 20:51                                                       ` Julia Lawall
2022-01-04 14:09                                                         ` Rafael J. Wysocki
2022-01-04 15:49                                                           ` Julia Lawall
2022-01-04 19:22                                                             ` Rafael J. Wysocki
2022-01-05 20:19                                                               ` Julia Lawall
2022-01-05 23:46                                                                 ` Francisco Jerez
2022-01-06 19:49                                                                   ` Julia Lawall [this message]
2022-01-06 20:28                                                                     ` Srinivas Pandruvada
2022-01-06 20:43                                                                       ` Julia Lawall
2022-01-06 21:55                                                                         ` srinivas pandruvada
2022-01-06 21:58                                                                           ` Julia Lawall
2022-01-05  0:38                                                         ` Francisco Jerez
2021-12-19 14:14     ` Rafael J. Wysocki
2021-12-19 17:03       ` Julia Lawall
2021-12-19 22:30         ` Francisco Jerez
2021-12-21 18:10         ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.22.394.2201062044340.3098@hadrien \
    --to=julia.lawall@inria.fr \
    --cc=currojerez@riseup.net \
    --cc=juri.lelli@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).