linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Artem Bityutskiy <dedekind1@gmail.com>
To: x86@kernel.org, "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Linux PM Mailing List <linux-pm@vger.kernel.org>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Artem Bityutskiy <dedekind1@gmail.com>
Subject: [PATCH v3 0/3] Sapphire Rapids C0.x idle states support
Date: Sat, 10 Jun 2023 21:35:16 +0300	[thread overview]
Message-ID: <20230610183518.4061159-1-dedekind1@gmail.com> (raw)

From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

Background
----------

Idle states reduce power consumption when a CPU has no work to do. The most
shallow CPU idle state is "POLL". It has lowest wake up latency, but saves
little power. The next idle state on Intel platforms is "C1". It has has higher
latency, but saves more power than "POLL".

Sapphire Rapids Xeons add new C0.1 and C0.2 idle states which conceptually sit
between "POLL" and "C1". These provide a very attractive midpoint: near-POLL
wake-up latency and power consumption halfway between "POLL" and "C1".

In other words, we expect all but the most latency-sensitive users to prefer
these idle state over POLL.

This patch-set enables C0.2 idle state support on Sapphire Rapids Xeon (later -
SPR). The new idle state is added between POLL and C1.

Patch-set overview
------------------

This patch-set is based on the "linux-next" branch of the "linux-pm" plus
patches from Arjan van de Ven, submitted to linux-pm mailing list
on Jun 5, 2023:
 * Cover letter: [PATCH 0/4 v2] Add support for running in VM guests to intel_idle
 * https://patchwork.kernel.org/project/linux-pm/patch/20230605154716.840930-2-arjan@linux.intel.com/

In other words, the base commit is 'e8195eaff86fd2ddb5f00646b5f76e40cd1164a8',
then Arjan's patches should be applied, and then these patches on top.

Patch #1 does not depend on Arjan's patches, but patch #2 requires the cleanups
from Arjan's patch-set.

Changelog
---------

* v3
  - Dropped patch 'x86/umwait: Increase tpause and umwait quanta' after, as
    suggested by Andy Lutomirski.
  - Followed Peter Zijlstra's suggestion and removed explicit 'umwait'
    deadline. Rely on the global implicit deadline instead.
  - Rebased on top of Arjan's patches.
  - C0.2 was tested in a VM by Arjan van de Ven.
  - Re-measured on 2S and 4S Sapphire Rapids Xeon.
* v2
  - Do not mix 'raw_local_irq_enable()' and 'local_irq_disable()'. I failed to
    directly verify it, but I believe it'll address the '.noinstr.text' warning.
  - Minor kerneldoc commentary fix.

C0.2 vs POLL latency and power
------------------------------

I compared POLL to C0.2 using 'wult' tool (https://github.com/intel/wult),
which measures idle states latency.

* In "POLL" experiments, all C-states except for POLL were disabled.
* In "C0.2" experiments, all C-states except for POLL and C0.2 were disabled.

Here are the measurement results. The numbers are percent change from POLL to
C0.2.

-----------|-----------|----------|-----------
 Median IR | 99th % IR | AC Power | RAPL power
-----------|-----------|----------|-----------
 24%       | 12%       | -13%     | -18%
-----------------------|----------|-----------

* IR stands for interrupt latency. The table provides the median and 99th
  percentile. Wult measures it as the delay between the moment a timer
  interrupt fires to the moment the CPU reaches the interrupt handler.
* AC Power is the wall socket AC power.
* RAPL power is the CPU package power, measured using the 'turbostat' tool.

Hackbench measurements
----------------------

I ran the 'hackbench' benchmark using the following commands:

# 4 groups, 200 threads
hackbench -s 128 -l 100000000 -g4 -f 25 -P
# 8 groups, 400 threads.
hackbench -s 128 -l 100000000 -g8 -f 25 -P

My SPR system has 224 CPUs, so the first command did not use all CPUs, the
second command used all of them. However, in both cases CPU power reached TDP.

I ran hackbench 5 times for every configuration and compared hackbench "score"
averages.

In case of 4 groups, C0.2 improved the score by about 4%, and in case of 8
groups by about 0.6%.

Q&A
---

1. Can C0.2 be disabled?

C0.2 can be disabled via sysfs and with the following kernel boot option:

  intel_idle.states_off=2

2. Why C0.2, not C0.1?

I measured both C0.1 and C0.2, but did not notice a clear C0.1 advantage in
terms of latency, but did notice that C0.2 saves more power.

But if users want to try using C0.1 instead of C0.2, they can do this:

echo 0 > /sys/devices/system/cpu/umwait_control/enable_c02

This will make sure that C0.2 requests from 'intel_idle' are automatically
converted to C0.1 requests.

3. How did you verify that system enters C0.2?

I used 'perf' to read the corresponding PMU counters:

perf stat -e CPU_CLK_UNHALTED.C01,CPU_CLK_UNHALTED.C02,cycles -a sleep 1

4. Ho to change the global explicit 'umwait' deadline?

Via '/sys/devices/system/cpu/umwait_control/max_time'

Artem Bityutskiy (2):
  x86/mwait: Add support for idle via umwait
  intel_idle: add C0.2 state for Sapphire Rapids Xeon

 arch/x86/include/asm/mwait.h | 65 ++++++++++++++++++++++++++++++++++++
 drivers/idle/intel_idle.c    | 44 +++++++++++++++++++++++-
 2 files changed, 108 insertions(+), 1 deletion(-)

-- 
2.40.1

             reply	other threads:[~2023-06-10 18:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-10 18:35 Artem Bityutskiy [this message]
2023-06-10 18:35 ` [PATCH v3 1/2] x86/mwait: Add support for idle via umwait Artem Bityutskiy
2023-06-29 19:03   ` Rafael J. Wysocki
2023-06-29 22:04   ` Thomas Gleixner
2023-06-29 22:36     ` Thomas Gleixner
2023-06-30 16:54     ` Artem Bityutskiy
2023-07-07 17:13     ` Artem Bityutskiy
2023-06-10 18:35 ` [PATCH v3 2/2] intel_idle: add C0.2 state for Sapphire Rapids Xeon Artem Bityutskiy
2023-06-29 22:05 ` [PATCH v3 0/3] Sapphire Rapids C0.x idle states support Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230610183518.4061159-1-dedekind1@gmail.com \
    --to=dedekind1@gmail.com \
    --cc=arjan@linux.intel.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).