All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Julia Lawall <julia.lawall@inria.fr>
Cc: Francisco Jerez <currojerez@riseup.net>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Len Brown <lenb@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: cpufreq: intel_pstate: map utilization into the pstate range
Date: Sun, 19 Dec 2021 15:19:07 +0100	[thread overview]
Message-ID: <CAJZ5v0he+_p5qVkx+fGUg7BCBYmm5yRh4q-_9jgJoZLwDf1c2Q@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.22.394.2112190734070.3181@hadrien>

On Sun, Dec 19, 2021 at 7:42 AM Julia Lawall <julia.lawall@inria.fr> wrote:
>
>
>
> On Sat, 18 Dec 2021, Francisco Jerez wrote:
>
> > Julia Lawall <julia.lawall@inria.fr> writes:
> >
> > > On Sat, 18 Dec 2021, Francisco Jerez wrote:
> > >
> > >> Julia Lawall <julia.lawall@inria.fr> writes:
> > >>
> > >> >> As you can see in intel_pstate.c, min_pstate is initialized on core
> > >> >> platforms from MSR_PLATFORM_INFO[47:40], which is "Maximum Efficiency
> > >> >> Ratio (R/O)".  However that seems to deviate massively from the most
> > >> >> efficient ratio on your system, which may indicate a firmware bug, some
> > >> >> sort of clock gating problem, or an issue with the way that
> > >> >> intel_pstate.c processes this information.
> > >> >
> > >> > I'm not sure to understand the bug part.  min_pstate gives the frequency
> > >> > that I find as the minimum frequency when I look for the specifications of
> > >> > the CPU.  Should one expect that it should be something different?
> > >> >
> > >>
> > >> I'd expect the minimum frequency on your processor specification to
> > >> roughly match the "Maximum Efficiency Ratio (R/O)" value from that MSR,
> > >> since there's little reason to claim your processor can be clocked down
> > >> to a frequency which is inherently inefficient /and/ slower than the
> > >> maximum efficiency ratio -- In fact they both seem to match in your
> > >> system, they're just nowhere close to the frequency which is actually
> > >> most efficient, which smells like a bug, like your processor
> > >> misreporting what the most efficient frequency is, or it deviating from
> > >> the expected one due to your CPU static power consumption being greater
> > >> than it would be expected to be under ideal conditions -- E.g. due to
> > >> some sort of clock gating issue, possibly due to a software bug, or due
> > >> to our scheduling of such workloads with a large amount of lightly
> > >> loaded threads being unnecessarily inefficient which could also be
> > >> preventing most of your CPU cores from ever being clock-gated even
> > >> though your processor may be sitting idle for a large fraction of their
> > >> runtime.
> > >
> > > The original mail has results from two different machines: Intel 6130
> > > (skylake) and Intel 5218 (cascade lake).  I have access to another cluster
> > > of 6130s and 5218s.  I can try them.
> > >
> > > I tried 5.9 in which I just commented out the schedutil code to make
> > > frequency requests.  I only tested avrora (tiny pauses) and h2 (longer
> > > pauses) and in both case the execution is almost entirely in the turbo
> > > frequencies.
> > >
> > > I'm not sure to understand the term "clock-gated".  What C state does that
> > > correspond to?  The turbostat output for one run of avrora is below.
> > >
> >
> > I didn't have any specific C1+ state in mind, most of the deeper ones
> > implement some sort of clock gating among other optimizations, I was
> > just wondering whether some sort of software bug and/or the highly
> > intermittent CPU utilization pattern of these workloads are preventing
> > most of your CPU cores from entering deep sleep states.  See below.
> >
> > > julia
> > >
> > > 78.062895 sec
> > > Package Core  CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ     SMI     POLL    C1      C1E     C6      POLL%   C1%     C1E%    C6%     CPU%c1  CPU%c6  CoreTmp PkgTmp  Pkg%pc2 Pkg%pc6 Pkg_J   RAM_J   PKG_%   RAM_%
> > > -     -       -       31      2.95    1065    2096    156134  0       1971    155458  2956270 657130  0.00    0.20    4.78    92.26   14.75   82.31   40      41      45.14   0.04    4747.52 2509.05 0.00    0.00
> > > 0     0       0       13      1.15    1132    2095    11360   0       0       2       39      19209   0.00    0.00    0.01    99.01   8.02    90.83   39      41      90.24   0.04    2266.04 1346.09 0.00    0.00
> >
> > This seems suspicious:                                                                                                                                                          ^^^^    ^^^^^^^
> >
> > I hadn't understood that you're running this on a dual-socket system
> > until I looked at these results.
>
> Sorry not to have mentioned that.
>
> > It seems like package #0 is doing
> > pretty much nothing according to the stats below, but it's still
> > consuming nearly half of your energy, apparently because the idle
> > package #0 isn't entering deep sleep states (Pkg%pc6 above is close to
> > 0%).  That could explain your unexpectedly high static power consumption
> > and the deviation of the real maximum efficiency frequency from the one
> > reported by your processor, since the reported maximum efficiency ratio
> > cannot possibly take into account the existence of a second CPU package
> > with dysfunctional idle management.
>
> Our assumption was that if anything happens on any core, all of the
> packages remain in a state that allows them to react in a reasonable
> amount of time ot any memory request.
>
> > I'm guessing that if you fully disable one of your CPU packages and
> > repeat the previous experiment forcing various P-states between 10 and
> > 37 you should get a maximum efficiency ratio closer to the theoretical
> > one for this CPU?
>
> OK, but that's not really a natural usage context...  I do have a
> one-socket Intel 5220.  I'll see what happens there.
>
> I did some experiements with forcing different frequencies.  I haven't
> finished processing the results, but I notice that as the frequency goes
> up, the utilization (specifically the value of
> map_util_perf(sg_cpu->util) at the point of the call to
> cpufreq_driver_adjust_perf in sugov_update_single_perf) goes up as well.
> Is this expected?

It isn't, as long as the scale-invariance mechanism mentioned in my
previous message works properly.

  reply	other threads:[~2021-12-19 14:19 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-13 22:52 cpufreq: intel_pstate: map utilization into the pstate range Julia Lawall
2021-12-17 18:36 ` Rafael J. Wysocki
2021-12-17 19:32   ` Julia Lawall
2021-12-17 20:36     ` Francisco Jerez
2021-12-17 22:51       ` Julia Lawall
2021-12-18  0:04         ` Francisco Jerez
2021-12-18  6:12           ` Julia Lawall
2021-12-18 10:19             ` Francisco Jerez
2021-12-18 11:07               ` Julia Lawall
2021-12-18 22:12                 ` Francisco Jerez
2021-12-19  6:42                   ` Julia Lawall
2021-12-19 14:19                     ` Rafael J. Wysocki [this message]
2021-12-19 14:30                       ` Rafael J. Wysocki
2021-12-19 21:47                       ` Julia Lawall
2021-12-19 22:10                     ` Francisco Jerez
2021-12-19 22:41                       ` Julia Lawall
2021-12-19 23:31                         ` Francisco Jerez
2021-12-21 17:04                       ` Rafael J. Wysocki
2021-12-21 23:56                         ` Francisco Jerez
2021-12-22 14:54                           ` Rafael J. Wysocki
2021-12-24 11:08                             ` Julia Lawall
2021-12-28 16:58                           ` Julia Lawall
2021-12-28 17:40                             ` Rafael J. Wysocki
2021-12-28 17:46                               ` Julia Lawall
2021-12-28 18:06                                 ` Rafael J. Wysocki
2021-12-28 18:16                                   ` Julia Lawall
2021-12-29  9:13                                   ` Julia Lawall
2021-12-30 17:03                                     ` Rafael J. Wysocki
2021-12-30 17:54                                       ` Julia Lawall
2021-12-30 17:58                                         ` Rafael J. Wysocki
2021-12-30 18:20                                           ` Julia Lawall
2021-12-30 18:37                                             ` Rafael J. Wysocki
2021-12-30 18:44                                               ` Julia Lawall
2022-01-03 15:50                                                 ` Rafael J. Wysocki
2022-01-03 16:41                                                   ` Julia Lawall
2022-01-03 18:23                                                   ` Julia Lawall
2022-01-03 19:58                                                     ` Rafael J. Wysocki
2022-01-03 20:51                                                       ` Julia Lawall
2022-01-04 14:09                                                         ` Rafael J. Wysocki
2022-01-04 15:49                                                           ` Julia Lawall
2022-01-04 19:22                                                             ` Rafael J. Wysocki
2022-01-05 20:19                                                               ` Julia Lawall
2022-01-05 23:46                                                                 ` Francisco Jerez
2022-01-06 19:49                                                                   ` Julia Lawall
2022-01-06 20:28                                                                     ` Srinivas Pandruvada
2022-01-06 20:43                                                                       ` Julia Lawall
2022-01-06 21:55                                                                         ` srinivas pandruvada
2022-01-06 21:58                                                                           ` Julia Lawall
2022-01-05  0:38                                                         ` Francisco Jerez
2021-12-19 14:14     ` Rafael J. Wysocki
2021-12-19 17:03       ` Julia Lawall
2021-12-19 22:30         ` Francisco Jerez
2021-12-21 18:10         ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJZ5v0he+_p5qVkx+fGUg7BCBYmm5yRh4q-_9jgJoZLwDf1c2Q@mail.gmail.com \
    --to=rafael@kernel.org \
    --cc=currojerez@riseup.net \
    --cc=julia.lawall@inria.fr \
    --cc=juri.lelli@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.