linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Artem Bityutskiy <dedekind1@gmail.com>,
	x86@kernel.org, Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Linux PM Mailing List <linux-pm@vger.kernel.org>,
	Arjan van de Ven <arjan@linux.intel.com>
Subject: Re: [PATCH v4 0/4] Sapphire Rapids C0.x idle states support
Date: Thu, 20 Jul 2023 20:35:10 +0200	[thread overview]
Message-ID: <CAJZ5v0iLWvAwuma4P3hf_3i10qB9NNtZh9aMWw9M3VayRURpEw@mail.gmail.com> (raw)
In-Reply-To: <20230710093100.918337-1-dedekind1@gmail.com>

On Mon, Jul 10, 2023 at 11:31 AM Artem Bityutskiy <dedekind1@gmail.com> wrote:
>
> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>
> Background
> ----------
>
> Idle states reduce power consumption when a CPU has no work to do. The most
> shallow CPU idle state is "POLL". It has lowest wake up latency, but saves
> little power. The next idle state on Intel platforms is "C1". It has has higher
> latency, but saves more power than "POLL".
>
> Sapphire Rapids Xeons add new C0.1 and C0.2 (later C0.x) idle states which
> conceptually sit between "POLL" and "C1". These provide a very attractive
> midpoint: near-POLL wake-up latency and power consumption halfway between
> "POLL" and "C1".
>
> In other words, the expectation is that most latency-sensitive users will
> prefer C0.x over POLL.
>
> Enable C0.2 idle state support on Sapphire Rapids Xeon (later - SPR) by adding
> it between POLL and C1.
>
> Base commit
> -----------
>
> Based on the "linux-next" branch of "linux-pm" git tree.
>
> base-commit: bd9bb08847da3b1eba2ea8cebf514d9287e7f4fb
>
> Changelog
> ---------
>
> * v4:
>   - Address issues pointed out by Thomas Gleixner.
>     . mwait.h: use 'IS_ENABLED()' instead of '#ifdef'.
>     . mwait.h: use '__always_inline'.
>     . mwait.h: use inline stub instead for macro for "!CONFIG_X86_64" case.
>     . mwait.h: use proper commentaries for '#endif' and '#else'.
>     . mwait.h: tested with llvm/clang.
>     . Use imperative form (removed "this patch").
>   - intel_idle: rename 'intel_idle_hlt_irq_on()' for consistency.

I'm wondering if the v4 looks better than the previous versions to the
x86 folks (and Thomas in particular).

Note that patch [3/4] will not be needed any more as it refines a
commit that is going to be reverted.

> * v3
>   - Dropped patch 'x86/umwait: Increase tpause and umwait quanta' after, as
>     suggested by Andy Lutomirski.
>   - Followed Peter Zijlstra's suggestion and removed explicit 'umwait'
>     deadline. Rely on the global implicit deadline instead.
>   - Rebased on top of Arjan's patches.
>   - C0.2 was tested in a VM by Arjan van de Ven.
>   - Re-measured on 2S and 4S Sapphire Rapids Xeon.
> * v2
>   - Do not mix 'raw_local_irq_enable()' and 'local_irq_disable()'. I failed to
>     directly verify it, but I believe it'll address the '.noinstr.text' warning.
>   - Minor kerneldoc commentary fix.
>
> C0.2 vs POLL latency and power
> ------------------------------
>
> I compared POLL to C0.2 using 'wult' tool (https://github.com/intel/wult),
> which measures idle states latency.
>
> * In "POLL" experiments, all C-states except for POLL were disabled.
> * In "C0.2" experiments, all C-states except for POLL and C0.2 were disabled.
>
> Here are the measurement results. The numbers are percent change from POLL to
> C0.2.
>
> -----------|-----------|----------|-----------
>  Median IR | 99th % IR | AC Power | RAPL power
> -----------|-----------|----------|-----------
>  24%       | 12%       | -13%     | -18%
> -----------------------|----------|-----------
>
> * IR stands for interrupt latency. The table provides the median and 99th
>   percentile. Wult measures it as the delay between the moment a timer
>   interrupt fires to the moment the CPU reaches the interrupt handler.
> * AC Power is the wall socket AC power.
> * RAPL power is the CPU package power, measured using the 'turbostat' tool.
>
> Hackbench measurements
> ----------------------
>
> I ran the 'hackbench' benchmark using the following commands:
>
> # 4 groups, 200 threads
> hackbench -s 128 -l 100000000 -g4 -f 25 -P
> # 8 groups, 400 threads.
> hackbench -s 128 -l 100000000 -g8 -f 25 -P
>
> My SPR system has 224 CPUs, so the first command did not use all CPUs, the
> second command used all of them. However, in both cases CPU power reached TDP.
>
> I ran hackbench 5 times for every configuration and compared hackbench "score"
> averages.
>
> In case of 4 groups, C0.2 improved the score by about 4%, and in case of 8
> groups by about 0.6%.
>
> Q&A
> ---
>
> 1. Can C0.2 be disabled?
>
> C0.2 can be disabled via sysfs and with the following kernel boot option:
>
>   intel_idle.states_off=2
>
> 2. Why C0.2, not C0.1?
>
> I measured both C0.1 and C0.2, but did not notice a clear C0.1 advantage in
> terms of latency, but did notice that C0.2 saves more power.
>
> But if users want to try using C0.1 instead of C0.2, they can do this:
>
> echo 0 > /sys/devices/system/cpu/umwait_control/enable_c02
>
> This will make sure that C0.2 requests from 'intel_idle' are automatically
> converted to C0.1 requests.
>
> 3. How did you verify that system enters C0.2?
>
> I used 'perf' to read the corresponding PMU counters:
>
> perf stat -e CPU_CLK_UNHALTED.C01,CPU_CLK_UNHALTED.C02,cycles -a sleep 1
>
> 4. Ho to change the global explicit 'umwait' deadline?
>
> Via '/sys/devices/system/cpu/umwait_control/max_time'
>
> Artem Bityutskiy (4):
>   x86/umwait: use 'IS_ENABLED()'
>   x86/mwait: Add support for idle via umwait
>   intel_idle: rename 'intel_idle_hlt_irq_on()'
>   intel_idle: add C0.2 state for Sapphire Rapids Xeon
>
>  arch/x86/include/asm/mwait.h | 85 ++++++++++++++++++++++++++++++++----
>  drivers/idle/intel_idle.c    | 52 +++++++++++++++++++---
>  2 files changed, 123 insertions(+), 14 deletions(-)
>
> --
> 2.40.1
>

  parent reply	other threads:[~2023-07-20 18:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10  9:30 [PATCH v4 0/4] Sapphire Rapids C0.x idle states support Artem Bityutskiy
2023-07-10  9:30 ` [PATCH v4 1/4] x86/umwait: use 'IS_ENABLED()' Artem Bityutskiy
2023-07-10  9:30 ` [PATCH v4 2/4] x86/mwait: Add support for idle via umwait Artem Bityutskiy
2023-07-10  9:30 ` [PATCH v4 3/4] intel_idle: rename 'intel_idle_hlt_irq_on()' Artem Bityutskiy
2023-07-14 15:34   ` Rafael J. Wysocki
2023-07-14 15:39     ` Arjan van de Ven
2023-07-14 18:11     ` Artem Bityutskiy
2023-07-14 21:01     ` Peter Zijlstra
2023-07-14 21:02       ` Arjan van de Ven
2023-07-14 21:12         ` Peter Zijlstra
2023-07-10  9:31 ` [PATCH v4 4/4] intel_idle: add C0.2 state for Sapphire Rapids Xeon Artem Bityutskiy
2023-07-20 18:35 ` Rafael J. Wysocki [this message]
2023-08-28 16:43 ` [PATCH v4 0/4] Sapphire Rapids C0.x idle states support Artem Bityutskiy
2023-09-13 11:37   ` Artem Bityutskiy
2023-09-13 12:34     ` Rafael J. Wysocki
2023-09-13 12:49       ` Artem Bityutskiy
2023-09-13 12:55         ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJZ5v0iLWvAwuma4P3hf_3i10qB9NNtZh9aMWw9M3VayRURpEw@mail.gmail.com \
    --to=rafael@kernel.org \
    --cc=arjan@linux.intel.com \
    --cc=dedekind1@gmail.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).