linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] x86: Prefer MWAIT over HLT on AMD processors
@ 2022-05-23 16:55 Wyes Karny
  2022-05-23 16:55 ` [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle Wyes Karny
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Wyes Karny @ 2022-05-23 16:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, rui.zhang, puwen, rafael.j.wysocki, andrew.cooper3,
	jing2.liu, jmattson, pawan.kumar.gupta

This is a version 4 of the patchset to "Prefer MWAIT over HLT on AMD
processors"

The previous versions are:
v3: https://lore.kernel.org/lkml/cover.fba143c82098dffab6bbf0a2f3c4be8bae07ccf1.1652176835.git-series.wyes.karny@amd.com/
v2: https://lore.kernel.org/lkml/20220505104856.452311-1-wyes.karny@amd.com/
v1: https://lore.kernel.org/lkml/20220405130021.557880-1-wyes.karny@amd.com/

Changes between v3 --> v4:
- Update documentation around idle=nomwait

Changes between v2 --> v3:
- Update some text in commit messages
- Update the documentation around idle=nomwait
- Remove unnecessary CPUID level check from prefer_mwait_c1_over_halt function

Background
==========

Currently in the absence of the cpuidle driver (eg: when global C-States are
disabled in the BIOS or when cpuidle is driver is not compiled in), the default
idle state on AMD Zen processors uses the HLT instruction even though there is
support for MWAIT instruction which is more efficient than HLT.

HPC customers who want to optimize for lower latency are known to disable
Global C-States in the BIOS. Some vendors allow choosing a BIOS 'performance'
profile which explicitly disables C-States. In this scenario, the cpuidle
driver will not be loaded and the kernel will continue with the default idle
state chosen at boot time. On AMD systems currently the default idle state is
HLT which has a higher exit latency compared to MWAIT.

The reason for the choice of HLT over MWAIT on AMD systems is:

1. Families prior to 10h didn't support MWAIT
2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was
   preferable to use HLT as the default state on these systems.

However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1. And it is
preferable to use MWAIT as the default idle state on these systems, as it has
lower exit latencies.

The below table represents the exit latency for HLT and MWAIT on AMD Zen 3
system. Exit latency is measured by issuing a wakeup (IPI) to other CPU and
measuring how many clock cycles it took to wakeup.  Each iteration measures 10K
wakeups by pinning source and destination.

HLT:

25.0000th percentile  :      1900 ns
50.0000th percentile  :      2000 ns
75.0000th percentile  :      2300 ns
90.0000th percentile  :      2500 ns
95.0000th percentile  :      2600 ns
99.0000th percentile  :      2800 ns
99.5000th percentile  :      3000 ns
99.9000th percentile  :      3400 ns
99.9500th percentile  :      3600 ns
99.9900th percentile  :      5900 ns
  Min latency         :      1700 ns
  Max latency         :      5900 ns
Total Samples      9999

MWAIT:

25.0000th percentile  :      1400 ns
50.0000th percentile  :      1500 ns
75.0000th percentile  :      1700 ns
90.0000th percentile  :      1800 ns
95.0000th percentile  :      1900 ns
99.0000th percentile  :      2300 ns
99.5000th percentile  :      2500 ns
99.9000th percentile  :      3200 ns
99.9500th percentile  :      3500 ns
99.9900th percentile  :      4600 ns
  Min latency         :      1200 ns
  Max latency         :      4600 ns
Total Samples      9997

Improvement (99th percentile): 21.74%

Below is another result for context_switch2 micro-benchmark, which brings out
the impact of improved wakeup latency through increased context-switches per
second.

Link: https://ozlabs.org/~anton/junkcode/context_switch2.c

with HLT:
-------------------------------
50.0000th percentile  :  190184
75.0000th percentile  :  191032
90.0000th percentile  :  192314
95.0000th percentile  :  192520
99.0000th percentile  :  192844
MIN  :  190148
MAX  :  192852

with MWAIT:
-------------------------------
50.0000th percentile  :  277444
75.0000th percentile  :  278268
90.0000th percentile  :  278888
95.0000th percentile  :  279164
99.0000th percentile  :  280504
MIN  :  273278
MAX  :  281410

Improvement(99th percentile): ~ 45.46%

A similar trend is observed on older Zen processors also.

Here we enable MWAIT instruction as the default idle call for AMD Zen
processors which support MWAIT. We retain the existing behaviour for older
processors which depend on HLT.

This patchset restores the decision tree that was present in the kernel earlier
due to Thomas Gleixner's patch: commit 09fd4b4ef5bc ("x86: use cpuid to check
MWAIT support for C1")

NOTE: This change only impacts the default idle behaviour in the absence of
cpuidle driver. If the cpuidle driver is present, it controls the processor
idle behaviour.

Fixes: commit b253149b843f ("sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance")

Changelog:
v4:
- Update documetation around idle=nomwait
v3:
- Update documentation around idle=nomwait
- Remove unnecessary CPUID check from prefer_mwait_c1_over_halt function
v2:
- Remove vendor checks, fix idle=nomwait condition, fix documentation

Zhang Rui from Intel confirmed that this patchset has no impact on
modern Intel processors.

Wyes Karny (3):
  x86: Handle idle=nomwait cmdline properly for x86_idle
  x86: Remove vendor checks from prefer_mwait_c1_over_halt
  x86: Fix comment for X86_FEATURE_ZEN

 Documentation/admin-guide/pm/cpuidle.rst | 15 +++++----
 arch/x86/include/asm/cpufeatures.h       |  2 +-
 arch/x86/include/asm/mwait.h             |  1 +-
 arch/x86/kernel/process.c                | 41 ++++++++++++++++++-------
 4 files changed, 41 insertions(+), 18 deletions(-)

base-commit: 672c0c5173427e6b3e2a9bbb7be51ceeec78093a
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle
  2022-05-23 16:55 [PATCH v4 0/3] x86: Prefer MWAIT over HLT on AMD processors Wyes Karny
@ 2022-05-23 16:55 ` Wyes Karny
  2022-05-25  8:06   ` Zhang Rui
  2022-05-23 16:55 ` [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt Wyes Karny
  2022-05-23 16:55 ` [PATCH v4 3/3] x86: Fix comment for X86_FEATURE_ZEN Wyes Karny
  2 siblings, 1 reply; 12+ messages in thread
From: Wyes Karny @ 2022-05-23 16:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, rui.zhang, puwen, rafael.j.wysocki, andrew.cooper3,
	jing2.liu, jmattson, pawan.kumar.gupta

When kernel is booted with idle=nomwait do not use MWAIT as the
default idle state.

If the user boots the kernel with idle=nomwait, it is a clear
direction to not use mwait as the default idle state.
However, the current code does not take this into consideration
while selecting the default idle state on x86.

This patch fixes it by checking for the idle=nomwait boot option in
prefer_mwait_c1_over_halt().

Also update the documentation around idle=nomwait appropriately.

Signed-off-by: Wyes Karny <wyes.karny@amd.com>
---
Changes in v4:
- Update documentation around idle=nomwait
- Rename patch subject

 Documentation/admin-guide/pm/cpuidle.rst | 15 +++++++++------
 arch/x86/kernel/process.c                |  6 +++++-
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
index aec2cd2aaea7..19754beb5a4e 100644
--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -612,8 +612,8 @@ the ``menu`` governor to be used on the systems that use the ``ladder`` governor
 by default this way, for example.
 
 The other kernel command line parameters controlling CPU idle time management
-described below are only relevant for the *x86* architecture and some of
-them affect Intel processors only.
+described below are only relevant for the *x86* architecture and references
+to ``intel_idle`` affect Intel processors only.
 
 The *x86* architecture support code recognizes three kernel command line
 options related to CPU idle time management: ``idle=poll``, ``idle=halt``,
@@ -635,10 +635,13 @@ idle, so it very well may hurt single-thread computations performance as well as
 energy-efficiency.  Thus using it for performance reasons may not be a good idea
 at all.]
 
-The ``idle=nomwait`` option disables the ``intel_idle`` driver and causes
-``acpi_idle`` to be used (as long as all of the information needed by it is
-there in the system's ACPI tables), but it is not allowed to use the
-``MWAIT`` instruction of the CPUs to ask the hardware to enter idle states.
+The ``idle=nomwait`` option prevents the use of ``MWAIT`` instruction of
+the CPU to enter idle states. When this option is used, the ``acpi_idle``
+driver will use the ``HLT`` instruction instead of ``MWAIT``. On systems
+running Intel processors, this option disables the ``intel_idle`` driver
+and forces the use of the ``acpi_idle`` driver instead. Note that in either
+case, ``acpi_idle`` driver will function only if all the information needed
+by it is in the system's ACPI tables.
 
 In addition to the architecture-level kernel command line options affecting CPU
 idle time management, there are parameters affecting individual ``CPUIdle``
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b370767f5b19..4e0178b066c5 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -824,6 +824,10 @@ static void amd_e400_idle(void)
  */
 static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
 {
+	/* User has disallowed the use of MWAIT. Fallback to HALT */
+	if (boot_option_idle_override == IDLE_NOMWAIT)
+		return 0;
+
 	if (c->x86_vendor != X86_VENDOR_INTEL)
 		return 0;
 
@@ -932,7 +936,7 @@ static int __init idle_setup(char *str)
 	} else if (!strcmp(str, "nomwait")) {
 		/*
 		 * If the boot option of "idle=nomwait" is added,
-		 * it means that mwait will be disabled for CPU C2/C3
+		 * it means that mwait will be disabled for CPU C1/C2/C3
 		 * states. In such case it won't touch the variable
 		 * of boot_option_idle_override.
 		 */
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt
  2022-05-23 16:55 [PATCH v4 0/3] x86: Prefer MWAIT over HLT on AMD processors Wyes Karny
  2022-05-23 16:55 ` [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle Wyes Karny
@ 2022-05-23 16:55 ` Wyes Karny
  2022-05-25 16:55   ` Peter Zijlstra
  2022-06-06 12:50   ` Zhang Rui
  2022-05-23 16:55 ` [PATCH v4 3/3] x86: Fix comment for X86_FEATURE_ZEN Wyes Karny
  2 siblings, 2 replies; 12+ messages in thread
From: Wyes Karny @ 2022-05-23 16:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, rui.zhang, puwen, rafael.j.wysocki, andrew.cooper3,
	jing2.liu, jmattson, pawan.kumar.gupta

Remove vendor checks from prefer_mwait_c1_over_halt function. Restore
the decision tree to support MWAIT C1 as the default idle state based on
CPUID checks as done by Thomas Gleixner in
commit 09fd4b4ef5bc ("x86: use cpuid to check MWAIT support for C1")

The decision tree is removed in
commit 69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")

Prefer MWAIT when the following conditions are satisfied:
    1. CPUID_Fn00000001_ECX [Monitor] should be set
    2. CPUID_Fn00000005 should be supported
    3. If CPUID_Fn00000005_ECX [EMX] is set then there should be
       at least one C1 substate available, indicated by
       CPUID_Fn00000005_EDX [MWaitC1SubStates] bits.

Otherwise use HLT for default_idle function.

HPC customers who want to optimize for lower latency are known to
disable Global C-States in the BIOS. In fact, some vendors allow
choosing a BIOS 'performance' profile which explicitly disables
C-States.  In this scenario, the cpuidle driver will not be loaded and
the kernel will continue with the default idle state chosen at boot
time. On AMD systems currently the default idle state is HLT which has
a higher exit latency compared to MWAIT.

The reason for the choice of HLT over MWAIT on AMD systems is:

1. Families prior to 10h didn't support MWAIT
2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was
   preferable to use HLT as the default state on these systems.

However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1. And
it is preferable to use MWAIT as the default idle state on these
systems, as it has lower exit latencies.

The below table represents the exit latency for HLT and MWAIT on AMD
Zen 3 system. Exit latency is measured by issuing a wakeup (IPI) to
other CPU and measuring how many clock cycles it took to wakeup.  Each
iteration measures 10K wakeups by pinning source and destination.

HLT:

25.0000th percentile  :      1900 ns
50.0000th percentile  :      2000 ns
75.0000th percentile  :      2300 ns
90.0000th percentile  :      2500 ns
95.0000th percentile  :      2600 ns
99.0000th percentile  :      2800 ns
99.5000th percentile  :      3000 ns
99.9000th percentile  :      3400 ns
99.9500th percentile  :      3600 ns
99.9900th percentile  :      5900 ns
  Min latency         :      1700 ns
  Max latency         :      5900 ns
Total Samples      9999

MWAIT:

25.0000th percentile  :      1400 ns
50.0000th percentile  :      1500 ns
75.0000th percentile  :      1700 ns
90.0000th percentile  :      1800 ns
95.0000th percentile  :      1900 ns
99.0000th percentile  :      2300 ns
99.5000th percentile  :      2500 ns
99.9000th percentile  :      3200 ns
99.9500th percentile  :      3500 ns
99.9900th percentile  :      4600 ns
  Min latency         :      1200 ns
  Max latency         :      4600 ns
Total Samples      9997

Improvement (99th percentile): 21.74%

Below is another result for context_switch2 micro-benchmark, which
brings out the impact of improved wakeup latency through increased
context-switches per second.

Link: https://ozlabs.org/~anton/junkcode/context_switch2.c

with HLT:
-------------------------------
50.0000th percentile  :  190184
75.0000th percentile  :  191032
90.0000th percentile  :  192314
95.0000th percentile  :  192520
99.0000th percentile  :  192844
MIN  :  190148
MAX  :  192852

with MWAIT:
-------------------------------
50.0000th percentile  :  277444
75.0000th percentile  :  278268
90.0000th percentile  :  278888
95.0000th percentile  :  279164
99.0000th percentile  :  280504
MIN  :  273278
MAX  :  281410

Improvement(99th percentile): ~ 45.46%

Signed-off-by: Wyes Karny <wyes.karny@amd.com>
---
 arch/x86/include/asm/mwait.h |  1 +
 arch/x86/kernel/process.c    | 35 +++++++++++++++++++++++++----------
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 29dd27b5a339..3a8fdf881313 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -13,6 +13,7 @@
 #define MWAIT_SUBSTATE_SIZE		4
 #define MWAIT_HINT2CSTATE(hint)		(((hint) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK)
 #define MWAIT_HINT2SUBSTATE(hint)	((hint) & MWAIT_CSTATE_MASK)
+#define MWAIT_C1_SUBSTATE_MASK  0xf0
 
 #define CPUID_MWAIT_LEAF		5
 #define CPUID5_ECX_EXTENSIONS_SUPPORTED 0x1
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4e0178b066c5..7bf4d73c9522 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -813,28 +813,43 @@ static void amd_e400_idle(void)
 }
 
 /*
- * Intel Core2 and older machines prefer MWAIT over HALT for C1.
- * We can't rely on cpuidle installing MWAIT, because it will not load
- * on systems that support only C1 -- so the boot default must be MWAIT.
+ * Prefer MWAIT over HALT if MWAIT is supported, MWAIT_CPUID leaf
+ * exists and whenever MONITOR/MWAIT extensions are present there is at
+ * least one C1 substate.
  *
- * Some AMD machines are the opposite, they depend on using HALT.
- *
- * So for default C1, which is used during boot until cpuidle loads,
- * use MWAIT-C1 on Intel HW that has it, else use HALT.
+ * Do not prefer MWAIT if MONITOR instruction has a bug or idle=nomwait
+ * is passed to kernel commandline parameter.
  */
 static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
 {
+	u32 eax, ebx, ecx, edx;
+
 	/* User has disallowed the use of MWAIT. Fallback to HALT */
 	if (boot_option_idle_override == IDLE_NOMWAIT)
 		return 0;
 
-	if (c->x86_vendor != X86_VENDOR_INTEL)
+	/* MWAIT is not supported on this platform. Fallback to HALT */
+	if (!cpu_has(c, X86_FEATURE_MWAIT))
 		return 0;
 
-	if (!cpu_has(c, X86_FEATURE_MWAIT) || boot_cpu_has_bug(X86_BUG_MONITOR))
+	/* Monitor has a bug. Fallback to HALT */
+	if (boot_cpu_has_bug(X86_BUG_MONITOR))
 		return 0;
 
-	return 1;
+	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
+
+	/*
+	 * If MWAIT extensions are not available, it is safe to use MWAIT
+	 * with EAX=0, ECX=0.
+	 */
+	if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED))
+		return 1;
+
+	/*
+	 * If MWAIT extensions are available, there should be at least one
+	 * MWAIT C1 substate present.
+	 */
+	return (edx & MWAIT_C1_SUBSTATE_MASK);
 }
 
 /*
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/3] x86: Fix comment for X86_FEATURE_ZEN
  2022-05-23 16:55 [PATCH v4 0/3] x86: Prefer MWAIT over HLT on AMD processors Wyes Karny
  2022-05-23 16:55 ` [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle Wyes Karny
  2022-05-23 16:55 ` [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt Wyes Karny
@ 2022-05-23 16:55 ` Wyes Karny
  2 siblings, 0 replies; 12+ messages in thread
From: Wyes Karny @ 2022-05-23 16:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, rui.zhang, puwen, rafael.j.wysocki, andrew.cooper3,
	jing2.liu, jmattson, pawan.kumar.gupta

The feature X86_FEATURE_ZEN implies that the CPU based on Zen
microarchitecture. Call this out explicitly in the comment.

Signed-off-by: Wyes Karny <wyes.karny@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 73e643ae94b6..6141457cda38 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,7 +219,7 @@
 #define X86_FEATURE_IBRS		( 7*32+25) /* Indirect Branch Restricted Speculation */
 #define X86_FEATURE_IBPB		( 7*32+26) /* Indirect Branch Prediction Barrier */
 #define X86_FEATURE_STIBP		( 7*32+27) /* Single Thread Indirect Branch Predictors */
-#define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 or above (Zen) */
+#define X86_FEATURE_ZEN			(7*32+28) /* "" CPU based on Zen microarchitecture */
 #define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
 #define X86_FEATURE_IBRS_ENHANCED	( 7*32+30) /* Enhanced IBRS */
 #define X86_FEATURE_MSR_IA32_FEAT_CTL	( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle
  2022-05-23 16:55 ` [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle Wyes Karny
@ 2022-05-25  8:06   ` Zhang Rui
  2022-06-02 15:41     ` Wyes Karny
  0 siblings, 1 reply; 12+ messages in thread
From: Zhang Rui @ 2022-05-25  8:06 UTC (permalink / raw)
  To: Wyes Karny, linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, puwen, rafael.j.wysocki, andrew.cooper3, jing2.liu,
	jmattson, pawan.kumar.gupta

On Mon, 2022-05-23 at 22:25 +0530, Wyes Karny wrote:
> When kernel is booted with idle=nomwait do not use MWAIT as the
> default idle state.
> 
> If the user boots the kernel with idle=nomwait, it is a clear
> direction to not use mwait as the default idle state.
> However, the current code does not take this into consideration
> while selecting the default idle state on x86.
> 
> This patch fixes it by checking for the idle=nomwait boot option in
> prefer_mwait_c1_over_halt().
> 
> Also update the documentation around idle=nomwait appropriately.
> 
> Signed-off-by: Wyes Karny <wyes.karny@amd.com>
> ---
> Changes in v4:
> - Update documentation around idle=nomwait
> - Rename patch subject
> 
>  Documentation/admin-guide/pm/cpuidle.rst | 15 +++++++++------
>  arch/x86/kernel/process.c                |  6 +++++-
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/admin-guide/pm/cpuidle.rst
> b/Documentation/admin-guide/pm/cpuidle.rst
> index aec2cd2aaea7..19754beb5a4e 100644
> --- a/Documentation/admin-guide/pm/cpuidle.rst
> +++ b/Documentation/admin-guide/pm/cpuidle.rst
> @@ -612,8 +612,8 @@ the ``menu`` governor to be used on the systems
> that use the ``ladder`` governor
>  by default this way, for example.
>  
>  The other kernel command line parameters controlling CPU idle time
> management
> -described below are only relevant for the *x86* architecture and
> some of
> -them affect Intel processors only.
> +described below are only relevant for the *x86* architecture and
> references
> +to ``intel_idle`` affect Intel processors only.
>  
>  The *x86* architecture support code recognizes three kernel command
> line
>  options related to CPU idle time management: ``idle=poll``,
> ``idle=halt``,
> @@ -635,10 +635,13 @@ idle, so it very well may hurt single-thread
> computations performance as well as
>  energy-efficiency.  Thus using it for performance reasons may not be
> a good idea
>  at all.]
>  
> -The ``idle=nomwait`` option disables the ``intel_idle`` driver and
> causes
> -``acpi_idle`` to be used (as long as all of the information needed
> by it is
> -there in the system's ACPI tables), but it is not allowed to use the
> -``MWAIT`` instruction of the CPUs to ask the hardware to enter idle
> states.
> +The ``idle=nomwait`` option prevents the use of ``MWAIT``
> instruction of
> +the CPU to enter idle states. When this option is used, the
> ``acpi_idle``
> +driver will use the ``HLT`` instruction instead of ``MWAIT``. On
> systems
> +running Intel processors, this option disables the ``intel_idle``
> driver
> +and forces the use of the ``acpi_idle`` driver instead. Note that in
> either
> +case, ``acpi_idle`` driver will function only if all the information
> needed
> +by it is in the system's ACPI tables.
>  
>  In addition to the architecture-level kernel command line options
> affecting CPU
>  idle time management, there are parameters affecting individual
> ``CPUIdle``
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index b370767f5b19..4e0178b066c5 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -824,6 +824,10 @@ static void amd_e400_idle(void)
>   */
>  static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
>  {
> +	/* User has disallowed the use of MWAIT. Fallback to HALT */
> +	if (boot_option_idle_override == IDLE_NOMWAIT)
> +		return 0;
> +
>  	if (c->x86_vendor != X86_VENDOR_INTEL)
>  		return 0;
>  
> @@ -932,7 +936,7 @@ static int __init idle_setup(char *str)
>  	} else if (!strcmp(str, "nomwait")) {
>  		/*
>  		 * If the boot option of "idle=nomwait" is added,
> -		 * it means that mwait will be disabled for CPU C2/C3
> +		 * it means that mwait will be disabled for CPU
> C1/C2/C3
>  		 * states. In such case it won't touch the variable
>  		 * of boot_option_idle_override.

the code didn't change boot_option_idle_override when it was
introduced, but this has changed since commit d18960494f65 ("ACPI,
intel_idle: Cleanup idle= internal variables")

thanks,
rui


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt
  2022-05-23 16:55 ` [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt Wyes Karny
@ 2022-05-25 16:55   ` Peter Zijlstra
  2022-06-06 12:50   ` Zhang Rui
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2022-05-25 16:55 UTC (permalink / raw)
  To: Wyes Karny
  Cc: linux-kernel, Lewis.Carroll, Mario.Limonciello, gautham.shenoy,
	Ananth.Narayan, bharata, len.brown, x86, tglx, mingo, bp,
	dave.hansen, hpa, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, rui.zhang, puwen, rafael.j.wysocki, andrew.cooper3,
	jing2.liu, jmattson, pawan.kumar.gupta

On Mon, May 23, 2022 at 10:25:50PM +0530, Wyes Karny wrote:

> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 4e0178b066c5..7bf4d73c9522 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -813,28 +813,43 @@ static void amd_e400_idle(void)
>  }
>  
>  /*
> - * Intel Core2 and older machines prefer MWAIT over HALT for C1.
> - * We can't rely on cpuidle installing MWAIT, because it will not load
> - * on systems that support only C1 -- so the boot default must be MWAIT.
> + * Prefer MWAIT over HALT if MWAIT is supported, MWAIT_CPUID leaf
> + * exists and whenever MONITOR/MWAIT extensions are present there is at
> + * least one C1 substate.
>   *
> - * Some AMD machines are the opposite, they depend on using HALT.
> - *
> - * So for default C1, which is used during boot until cpuidle loads,
> - * use MWAIT-C1 on Intel HW that has it, else use HALT.
> + * Do not prefer MWAIT if MONITOR instruction has a bug or idle=nomwait
> + * is passed to kernel commandline parameter.
>   */
>  static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
>  {
> +	u32 eax, ebx, ecx, edx;
> +
>  	/* User has disallowed the use of MWAIT. Fallback to HALT */
>  	if (boot_option_idle_override == IDLE_NOMWAIT)
>  		return 0;
>  
> -	if (c->x86_vendor != X86_VENDOR_INTEL)
> +	/* MWAIT is not supported on this platform. Fallback to HALT */
> +	if (!cpu_has(c, X86_FEATURE_MWAIT))
>  		return 0;
>  
> -	if (!cpu_has(c, X86_FEATURE_MWAIT) || boot_cpu_has_bug(X86_BUG_MONITOR))
> +	/* Monitor has a bug. Fallback to HALT */
> +	if (boot_cpu_has_bug(X86_BUG_MONITOR))
>  		return 0;
>  
> -	return 1;
> +	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> +
> +	/*
> +	 * If MWAIT extensions are not available, it is safe to use MWAIT
> +	 * with EAX=0, ECX=0.
> +	 */
> +	if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED))
> +		return 1;
> +
> +	/*
> +	 * If MWAIT extensions are available, there should be at least one
> +	 * MWAIT C1 substate present.
> +	 */
> +	return (edx & MWAIT_C1_SUBSTATE_MASK);
>  }

Seems reasonable enough to me,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle
  2022-05-25  8:06   ` Zhang Rui
@ 2022-06-02 15:41     ` Wyes Karny
  2022-06-05 12:32       ` Zhang Rui
  0 siblings, 1 reply; 12+ messages in thread
From: Wyes Karny @ 2022-06-02 15:41 UTC (permalink / raw)
  To: Zhang Rui, linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, puwen, rafael.j.wysocki, andrew.cooper3, jing2.liu,
	jmattson, pawan.kumar.gupta

Hi Rui,

On 5/25/2022 1:36 PM, Zhang Rui wrote:
> On Mon, 2022-05-23 at 22:25 +0530, Wyes Karny wrote:
>> When kernel is booted with idle=nomwait do not use MWAIT as the
>> default idle state.
>>
>> If the user boots the kernel with idle=nomwait, it is a clear
>> direction to not use mwait as the default idle state.
>> However, the current code does not take this into consideration
>> while selecting the default idle state on x86.
>>
>> This patch fixes it by checking for the idle=nomwait boot option in
>> prefer_mwait_c1_over_halt().
>>
>> Also update the documentation around idle=nomwait appropriately.
>>
>> Signed-off-by: Wyes Karny <wyes.karny@amd.com>
>> ---
>> Changes in v4:
>> - Update documentation around idle=nomwait
>> - Rename patch subject
>>
>>  Documentation/admin-guide/pm/cpuidle.rst | 15 +++++++++------
>>  arch/x86/kernel/process.c                |  6 +++++-
>>  2 files changed, 14 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/pm/cpuidle.rst
>> b/Documentation/admin-guide/pm/cpuidle.rst
>> index aec2cd2aaea7..19754beb5a4e 100644
>> --- a/Documentation/admin-guide/pm/cpuidle.rst
>> +++ b/Documentation/admin-guide/pm/cpuidle.rst
>> @@ -612,8 +612,8 @@ the ``menu`` governor to be used on the systems
>> that use the ``ladder`` governor
>>  by default this way, for example.
>>  
>>  The other kernel command line parameters controlling CPU idle time
>> management
>> -described below are only relevant for the *x86* architecture and
>> some of
>> -them affect Intel processors only.
>> +described below are only relevant for the *x86* architecture and
>> references
>> +to ``intel_idle`` affect Intel processors only.
>>  
>>  The *x86* architecture support code recognizes three kernel command
>> line
>>  options related to CPU idle time management: ``idle=poll``,
>> ``idle=halt``,
>> @@ -635,10 +635,13 @@ idle, so it very well may hurt single-thread
>> computations performance as well as
>>  energy-efficiency.  Thus using it for performance reasons may not be
>> a good idea
>>  at all.]
>>  
>> -The ``idle=nomwait`` option disables the ``intel_idle`` driver and
>> causes
>> -``acpi_idle`` to be used (as long as all of the information needed
>> by it is
>> -there in the system's ACPI tables), but it is not allowed to use the
>> -``MWAIT`` instruction of the CPUs to ask the hardware to enter idle
>> states.
>> +The ``idle=nomwait`` option prevents the use of ``MWAIT``
>> instruction of
>> +the CPU to enter idle states. When this option is used, the
>> ``acpi_idle``
>> +driver will use the ``HLT`` instruction instead of ``MWAIT``. On
>> systems
>> +running Intel processors, this option disables the ``intel_idle``
>> driver
>> +and forces the use of the ``acpi_idle`` driver instead. Note that in
>> either
>> +case, ``acpi_idle`` driver will function only if all the information
>> needed
>> +by it is in the system's ACPI tables.
>>  
>>  In addition to the architecture-level kernel command line options
>> affecting CPU
>>  idle time management, there are parameters affecting individual
>> ``CPUIdle``
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index b370767f5b19..4e0178b066c5 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -824,6 +824,10 @@ static void amd_e400_idle(void)
>>   */
>>  static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
>>  {
>> +	/* User has disallowed the use of MWAIT. Fallback to HALT */
>> +	if (boot_option_idle_override == IDLE_NOMWAIT)
>> +		return 0;
>> +
>>  	if (c->x86_vendor != X86_VENDOR_INTEL)
>>  		return 0;
>>  
>> @@ -932,7 +936,7 @@ static int __init idle_setup(char *str)
>>  	} else if (!strcmp(str, "nomwait")) {
>>  		/*
>>  		 * If the boot option of "idle=nomwait" is added,
>> -		 * it means that mwait will be disabled for CPU C2/C3
>> +		 * it means that mwait will be disabled for CPU
>> C1/C2/C3
>>  		 * states. In such case it won't touch the variable
>>  		 * of boot_option_idle_override.
> 
> the code didn't change boot_option_idle_override when it was
> introduced, but this has changed since commit d18960494f65 ("ACPI,
> intel_idle: Cleanup idle= internal variables")

Could you please clarify bit more why the commit you mentioned is
related to this patch?

> 
> thanks,
> rui
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle
  2022-06-02 15:41     ` Wyes Karny
@ 2022-06-05 12:32       ` Zhang Rui
  2022-06-06  9:13         ` Wyes Karny
  0 siblings, 1 reply; 12+ messages in thread
From: Zhang Rui @ 2022-06-05 12:32 UTC (permalink / raw)
  To: Wyes Karny, linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, puwen, rafael.j.wysocki, andrew.cooper3, jing2.liu,
	jmattson, pawan.kumar.gupta

On Thu, 2022-06-02 at 21:11 +0530, Wyes Karny wrote:
> > 
> 
> Hi Rui,
> 
> On 5/25/2022 1:36 PM, Zhang Rui wrote:
> > On Mon, 2022-05-23 at 22:25 +0530, Wyes Karny wrote:
> > > When kernel is booted with idle=nomwait do not use MWAIT as the
> > > default idle state.
> > > 
> > > If the user boots the kernel with idle=nomwait, it is a clear
> > > direction to not use mwait as the default idle state.
> > > However, the current code does not take this into consideration
> > > while selecting the default idle state on x86.
> > > 
> > > This patch fixes it by checking for the idle=nomwait boot option
> > > in
> > > prefer_mwait_c1_over_halt().
> > > 
> > > Also update the documentation around idle=nomwait appropriately.
> > > 
> > > Signed-off-by: Wyes Karny <wyes.karny@amd.com>
> > > ---
> > > Changes in v4:
> > > - Update documentation around idle=nomwait
> > > - Rename patch subject
> > > 
> > >  Documentation/admin-guide/pm/cpuidle.rst | 15 +++++++++------
> > >  arch/x86/kernel/process.c                |  6 +++++-
> > >  2 files changed, 14 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/Documentation/admin-guide/pm/cpuidle.rst
> > > b/Documentation/admin-guide/pm/cpuidle.rst
> > > index aec2cd2aaea7..19754beb5a4e 100644
> > > --- a/Documentation/admin-guide/pm/cpuidle.rst
> > > +++ b/Documentation/admin-guide/pm/cpuidle.rst
> > > @@ -612,8 +612,8 @@ the ``menu`` governor to be used on the
> > > systems
> > > that use the ``ladder`` governor
> > >  by default this way, for example.
> > >  
> > >  The other kernel command line parameters controlling CPU idle
> > > time
> > > management
> > > -described below are only relevant for the *x86* architecture and
> > > some of
> > > -them affect Intel processors only.
> > > +described below are only relevant for the *x86* architecture and
> > > references
> > > +to ``intel_idle`` affect Intel processors only.
> > >  
> > >  The *x86* architecture support code recognizes three kernel
> > > command
> > > line
> > >  options related to CPU idle time management: ``idle=poll``,
> > > ``idle=halt``,
> > > @@ -635,10 +635,13 @@ idle, so it very well may hurt single-
> > > thread
> > > computations performance as well as
> > >  energy-efficiency.  Thus using it for performance reasons may
> > > not be
> > > a good idea
> > >  at all.]
> > >  
> > > -The ``idle=nomwait`` option disables the ``intel_idle`` driver
> > > and
> > > causes
> > > -``acpi_idle`` to be used (as long as all of the information
> > > needed
> > > by it is
> > > -there in the system's ACPI tables), but it is not allowed to use
> > > the
> > > -``MWAIT`` instruction of the CPUs to ask the hardware to enter
> > > idle
> > > states.
> > > +The ``idle=nomwait`` option prevents the use of ``MWAIT``
> > > instruction of
> > > +the CPU to enter idle states. When this option is used, the
> > > ``acpi_idle``
> > > +driver will use the ``HLT`` instruction instead of ``MWAIT``. On
> > > systems
> > > +running Intel processors, this option disables the
> > > ``intel_idle``
> > > driver
> > > +and forces the use of the ``acpi_idle`` driver instead. Note
> > > that in
> > > either
> > > +case, ``acpi_idle`` driver will function only if all the
> > > information
> > > needed
> > > +by it is in the system's ACPI tables.
> > >  
> > >  In addition to the architecture-level kernel command line
> > > options
> > > affecting CPU
> > >  idle time management, there are parameters affecting individual
> > > ``CPUIdle``
> > > diff --git a/arch/x86/kernel/process.c
> > > b/arch/x86/kernel/process.c
> > > index b370767f5b19..4e0178b066c5 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -824,6 +824,10 @@ static void amd_e400_idle(void)
> > >   */
> > >  static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86
> > > *c)
> > >  {
> > > +	/* User has disallowed the use of MWAIT. Fallback to HALT */
> > > +	if (boot_option_idle_override == IDLE_NOMWAIT)
> > > +		return 0;
> > > +
> > >  	if (c->x86_vendor != X86_VENDOR_INTEL)
> > >  		return 0;
> > >  
> > > @@ -932,7 +936,7 @@ static int __init idle_setup(char *str)
> > >  	} else if (!strcmp(str, "nomwait")) {
> > >  		/*
> > >  		 * If the boot option of "idle=nomwait" is added,
> > > -		 * it means that mwait will be disabled for CPU C2/C3
> > > +		 * it means that mwait will be disabled for CPU
> > > C1/C2/C3
> > >  		 * states. In such case it won't touch the variable
> > >  		 * of boot_option_idle_override.
> > 
> > the code didn't change boot_option_idle_override when it was
> > introduced, but this has changed since commit d18960494f65 ("ACPI,
> > intel_idle: Cleanup idle= internal variables")
> 
> Could you please clarify bit more why the commit you mentioned is
> related to this patch?
> 

The comment "In such case it won't touch the variable of
boot_option_idle_override." has been broken for some time, it is not
related with this patch. But given that this patch "Also update the
documentation around idle=nomwait appropriately", so my suggestion is
to update it altogether, by deleting the last sentence.

thanks,
rui


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle
  2022-06-05 12:32       ` Zhang Rui
@ 2022-06-06  9:13         ` Wyes Karny
  0 siblings, 0 replies; 12+ messages in thread
From: Wyes Karny @ 2022-06-06  9:13 UTC (permalink / raw)
  To: Zhang Rui, linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, puwen, rafael.j.wysocki, andrew.cooper3, jing2.liu,
	jmattson, pawan.kumar.gupta

Hello Rui,

On 6/5/2022 6:02 PM, Zhang Rui wrote:
> On Thu, 2022-06-02 at 21:11 +0530, Wyes Karny wrote:
>>>
>>
>> Hi Rui,
>>
>> On 5/25/2022 1:36 PM, Zhang Rui wrote:
>>> On Mon, 2022-05-23 at 22:25 +0530, Wyes Karny wrote:
>>>> When kernel is booted with idle=nomwait do not use MWAIT as the
>>>> default idle state.
>>>>
>>>> If the user boots the kernel with idle=nomwait, it is a clear
>>>> direction to not use mwait as the default idle state.
>>>> However, the current code does not take this into consideration
>>>> while selecting the default idle state on x86.
>>>>
>>>> This patch fixes it by checking for the idle=nomwait boot option
>>>> in
>>>> prefer_mwait_c1_over_halt().
>>>>
>>>> Also update the documentation around idle=nomwait appropriately.
>>>>
>>>> Signed-off-by: Wyes Karny <wyes.karny@amd.com>
>>>> ---
>>>> Changes in v4:
>>>> - Update documentation around idle=nomwait
>>>> - Rename patch subject
>>>>
>>>>  Documentation/admin-guide/pm/cpuidle.rst | 15 +++++++++------
>>>>  arch/x86/kernel/process.c                |  6 +++++-
>>>>  2 files changed, 14 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/Documentation/admin-guide/pm/cpuidle.rst
>>>> b/Documentation/admin-guide/pm/cpuidle.rst
>>>> index aec2cd2aaea7..19754beb5a4e 100644
>>>> --- a/Documentation/admin-guide/pm/cpuidle.rst
>>>> +++ b/Documentation/admin-guide/pm/cpuidle.rst
>>>> @@ -612,8 +612,8 @@ the ``menu`` governor to be used on the
>>>> systems
>>>> that use the ``ladder`` governor
>>>>  by default this way, for example.
>>>>  
>>>>  The other kernel command line parameters controlling CPU idle
>>>> time
>>>> management
>>>> -described below are only relevant for the *x86* architecture and
>>>> some of
>>>> -them affect Intel processors only.
>>>> +described below are only relevant for the *x86* architecture and
>>>> references
>>>> +to ``intel_idle`` affect Intel processors only.
>>>>  
>>>>  The *x86* architecture support code recognizes three kernel
>>>> command
>>>> line
>>>>  options related to CPU idle time management: ``idle=poll``,
>>>> ``idle=halt``,
>>>> @@ -635,10 +635,13 @@ idle, so it very well may hurt single-
>>>> thread
>>>> computations performance as well as
>>>>  energy-efficiency.  Thus using it for performance reasons may
>>>> not be
>>>> a good idea
>>>>  at all.]
>>>>  
>>>> -The ``idle=nomwait`` option disables the ``intel_idle`` driver
>>>> and
>>>> causes
>>>> -``acpi_idle`` to be used (as long as all of the information
>>>> needed
>>>> by it is
>>>> -there in the system's ACPI tables), but it is not allowed to use
>>>> the
>>>> -``MWAIT`` instruction of the CPUs to ask the hardware to enter
>>>> idle
>>>> states.
>>>> +The ``idle=nomwait`` option prevents the use of ``MWAIT``
>>>> instruction of
>>>> +the CPU to enter idle states. When this option is used, the
>>>> ``acpi_idle``
>>>> +driver will use the ``HLT`` instruction instead of ``MWAIT``. On
>>>> systems
>>>> +running Intel processors, this option disables the
>>>> ``intel_idle``
>>>> driver
>>>> +and forces the use of the ``acpi_idle`` driver instead. Note
>>>> that in
>>>> either
>>>> +case, ``acpi_idle`` driver will function only if all the
>>>> information
>>>> needed
>>>> +by it is in the system's ACPI tables.
>>>>  
>>>>  In addition to the architecture-level kernel command line
>>>> options
>>>> affecting CPU
>>>>  idle time management, there are parameters affecting individual
>>>> ``CPUIdle``
>>>> diff --git a/arch/x86/kernel/process.c
>>>> b/arch/x86/kernel/process.c
>>>> index b370767f5b19..4e0178b066c5 100644
>>>> --- a/arch/x86/kernel/process.c
>>>> +++ b/arch/x86/kernel/process.c
>>>> @@ -824,6 +824,10 @@ static void amd_e400_idle(void)
>>>>   */
>>>>  static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86
>>>> *c)
>>>>  {
>>>> +	/* User has disallowed the use of MWAIT. Fallback to HALT */
>>>> +	if (boot_option_idle_override == IDLE_NOMWAIT)
>>>> +		return 0;
>>>> +
>>>>  	if (c->x86_vendor != X86_VENDOR_INTEL)
>>>>  		return 0;
>>>>  
>>>> @@ -932,7 +936,7 @@ static int __init idle_setup(char *str)
>>>>  	} else if (!strcmp(str, "nomwait")) {
>>>>  		/*
>>>>  		 * If the boot option of "idle=nomwait" is added,
>>>> -		 * it means that mwait will be disabled for CPU C2/C3
>>>> +		 * it means that mwait will be disabled for CPU
>>>> C1/C2/C3
>>>>  		 * states. In such case it won't touch the variable
>>>>  		 * of boot_option_idle_override.
>>>
>>> the code didn't change boot_option_idle_override when it was
>>> introduced, but this has changed since commit d18960494f65 ("ACPI,
>>> intel_idle: Cleanup idle= internal variables")
>>
>> Could you please clarify bit more why the commit you mentioned is
>> related to this patch?
>>
> 
> The comment "In such case it won't touch the variable of
> boot_option_idle_override." has been broken for some time, it is not
> related with this patch. But given that this patch "Also update the
> documentation around idle=nomwait appropriately", so my suggestion is
> to update it altogether, by deleting the last sentence.

Sure, will do. Thanks!

> 
> thanks,
> rui
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt
  2022-05-23 16:55 ` [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt Wyes Karny
  2022-05-25 16:55   ` Peter Zijlstra
@ 2022-06-06 12:50   ` Zhang Rui
  2022-06-06 15:37     ` Dave Hansen
  1 sibling, 1 reply; 12+ messages in thread
From: Zhang Rui @ 2022-06-06 12:50 UTC (permalink / raw)
  To: Wyes Karny, linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, puwen, rafael.j.wysocki, andrew.cooper3, jing2.liu,
	jmattson, pawan.kumar.gupta

On Mon, 2022-05-23 at 22:25 +0530, Wyes Karny wrote:
> Remove vendor checks from prefer_mwait_c1_over_halt function. Restore
> the decision tree to support MWAIT C1 as the default idle state based
> on
> CPUID checks as done by Thomas Gleixner in
> commit 09fd4b4ef5bc ("x86: use cpuid to check MWAIT support for C1")
> 
> The decision tree is removed in
> commit 69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait"
> cmdline param")
> 
> Prefer MWAIT when the following conditions are satisfied:
>     1. CPUID_Fn00000001_ECX [Monitor] should be set
>     2. CPUID_Fn00000005 should be supported
>     3. If CPUID_Fn00000005_ECX [EMX] is set then there should be
>        at least one C1 substate available, indicated by
>        CPUID_Fn00000005_EDX [MWaitC1SubStates] bits.
> 
> Otherwise use HLT for default_idle function.
> 
> HPC customers who want to optimize for lower latency are known to
> disable Global C-States in the BIOS. In fact, some vendors allow
> choosing a BIOS 'performance' profile which explicitly disables
> C-States.  In this scenario, the cpuidle driver will not be loaded
> and
> the kernel will continue with the default idle state chosen at boot
> time. On AMD systems currently the default idle state is HLT which
> has
> a higher exit latency compared to MWAIT.
> 
> The reason for the choice of HLT over MWAIT on AMD systems is:
> 
> 1. Families prior to 10h didn't support MWAIT
> 2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was
>    preferable to use HLT as the default state on these systems.
> 
> However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1.
> And
> it is preferable to use MWAIT as the default idle state on these
> systems, as it has lower exit latencies.
> 
> The below table represents the exit latency for HLT and MWAIT on AMD
> Zen 3 system. Exit latency is measured by issuing a wakeup (IPI) to
> other CPU and measuring how many clock cycles it took to
> wakeup.  Each
> iteration measures 10K wakeups by pinning source and destination.
> 
> HLT:
> 
> 25.0000th percentile  :      1900 ns
> 50.0000th percentile  :      2000 ns
> 75.0000th percentile  :      2300 ns
> 90.0000th percentile  :      2500 ns
> 95.0000th percentile  :      2600 ns
> 99.0000th percentile  :      2800 ns
> 99.5000th percentile  :      3000 ns
> 99.9000th percentile  :      3400 ns
> 99.9500th percentile  :      3600 ns
> 99.9900th percentile  :      5900 ns
>   Min latency         :      1700 ns
>   Max latency         :      5900 ns
> Total Samples      9999
> 
> MWAIT:
> 
> 25.0000th percentile  :      1400 ns
> 50.0000th percentile  :      1500 ns
> 75.0000th percentile  :      1700 ns
> 90.0000th percentile  :      1800 ns
> 95.0000th percentile  :      1900 ns
> 99.0000th percentile  :      2300 ns
> 99.5000th percentile  :      2500 ns
> 99.9000th percentile  :      3200 ns
> 99.9500th percentile  :      3500 ns
> 99.9900th percentile  :      4600 ns
>   Min latency         :      1200 ns
>   Max latency         :      4600 ns
> Total Samples      9997
> 
> Improvement (99th percentile): 21.74%
> 
> Below is another result for context_switch2 micro-benchmark, which
> brings out the impact of improved wakeup latency through increased
> context-switches per second.
> 
> Link: https://ozlabs.org/~anton/junkcode/context_switch2.c
> 
> with HLT:
> -------------------------------
> 50.0000th percentile  :  190184
> 75.0000th percentile  :  191032
> 90.0000th percentile  :  192314
> 95.0000th percentile  :  192520
> 99.0000th percentile  :  192844
> MIN  :  190148
> MAX  :  192852
> 
> with MWAIT:
> -------------------------------
> 50.0000th percentile  :  277444
> 75.0000th percentile  :  278268
> 90.0000th percentile  :  278888
> 95.0000th percentile  :  279164
> 99.0000th percentile  :  280504
> MIN  :  273278
> MAX  :  281410
> 
> Improvement(99th percentile): ~ 45.46%
> 
> Signed-off-by: Wyes Karny <wyes.karny@amd.com>

I couldn't evaluate the impact to other vendors, but at least for Intel
platforms,

Test-by: Zhang Rui <rui.zhang@intel.com>

> ---
>  arch/x86/include/asm/mwait.h |  1 +
>  arch/x86/kernel/process.c    | 35 +++++++++++++++++++++++++---------
> -
>  2 files changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mwait.h
> b/arch/x86/include/asm/mwait.h
> index 29dd27b5a339..3a8fdf881313 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -13,6 +13,7 @@
>  #define MWAIT_SUBSTATE_SIZE		4
>  #define MWAIT_HINT2CSTATE(hint)		(((hint) >>
> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK)
>  #define MWAIT_HINT2SUBSTATE(hint)	((hint) & MWAIT_CSTATE_MASK)
> +#define MWAIT_C1_SUBSTATE_MASK  0xf0
>  
>  #define CPUID_MWAIT_LEAF		5
>  #define CPUID5_ECX_EXTENSIONS_SUPPORTED 0x1
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 4e0178b066c5..7bf4d73c9522 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -813,28 +813,43 @@ static void amd_e400_idle(void)
>  }
>  
>  /*
> - * Intel Core2 and older machines prefer MWAIT over HALT for C1.
> - * We can't rely on cpuidle installing MWAIT, because it will not
> load
> - * on systems that support only C1 -- so the boot default must be
> MWAIT.
> + * Prefer MWAIT over HALT if MWAIT is supported, MWAIT_CPUID leaf
> + * exists and whenever MONITOR/MWAIT extensions are present there is
> at
> + * least one C1 substate.
>   *
> - * Some AMD machines are the opposite, they depend on using HALT.
> - *
> - * So for default C1, which is used during boot until cpuidle loads,
> - * use MWAIT-C1 on Intel HW that has it, else use HALT.
> + * Do not prefer MWAIT if MONITOR instruction has a bug or
> idle=nomwait
> + * is passed to kernel commandline parameter.
>   */
>  static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
>  {
> +	u32 eax, ebx, ecx, edx;
> +
>  	/* User has disallowed the use of MWAIT. Fallback to HALT */
>  	if (boot_option_idle_override == IDLE_NOMWAIT)
>  		return 0;
>  
> -	if (c->x86_vendor != X86_VENDOR_INTEL)
> +	/* MWAIT is not supported on this platform. Fallback to HALT */
> +	if (!cpu_has(c, X86_FEATURE_MWAIT))
>  		return 0;
>  
> -	if (!cpu_has(c, X86_FEATURE_MWAIT) ||
> boot_cpu_has_bug(X86_BUG_MONITOR))
> +	/* Monitor has a bug. Fallback to HALT */
> +	if (boot_cpu_has_bug(X86_BUG_MONITOR))
>  		return 0;
>  
> -	return 1;
> +	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> +
> +	/*
> +	 * If MWAIT extensions are not available, it is safe to use
> MWAIT
> +	 * with EAX=0, ECX=0.
> +	 */
> +	if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED))
> +		return 1;
> +
> +	/*
> +	 * If MWAIT extensions are available, there should be at least
> one
> +	 * MWAIT C1 substate present.
> +	 */
> +	return (edx & MWAIT_C1_SUBSTATE_MASK);
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt
  2022-06-06 12:50   ` Zhang Rui
@ 2022-06-06 15:37     ` Dave Hansen
  2022-06-07  1:16       ` Zhang Rui
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2022-06-06 15:37 UTC (permalink / raw)
  To: Zhang Rui, Wyes Karny, linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, puwen, rafael.j.wysocki, andrew.cooper3, jing2.liu,
	jmattson, pawan.kumar.gupta

On 6/6/22 05:50, Zhang Rui wrote:
> I couldn't evaluate the impact to other vendors, but at least for
> Intel platforms,
> 
> Test-by: Zhang Rui <rui.zhang@intel.com>

I you mean:

Tested-by: Zhang Rui <rui.zhang@intel.com>

right?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt
  2022-06-06 15:37     ` Dave Hansen
@ 2022-06-07  1:16       ` Zhang Rui
  0 siblings, 0 replies; 12+ messages in thread
From: Zhang Rui @ 2022-06-07  1:16 UTC (permalink / raw)
  To: Dave Hansen, Wyes Karny, linux-kernel
  Cc: Lewis.Carroll, Mario.Limonciello, gautham.shenoy, Ananth.Narayan,
	bharata, len.brown, x86, tglx, mingo, bp, dave.hansen, hpa,
	peterz, chang.seok.bae, keescook, metze, zhengqi.arch,
	mark.rutland, puwen, rafael.j.wysocki, andrew.cooper3, jing2.liu,
	jmattson, pawan.kumar.gupta

On Mon, 2022-06-06 at 08:37 -0700, Dave Hansen wrote:
> On 6/6/22 05:50, Zhang Rui wrote:
> > I couldn't evaluate the impact to other vendors, but at least for
> > Intel platforms,
> > 
> > Test-by: Zhang Rui <rui.zhang@intel.com>
> 
> I you mean:
> 
> Tested-by: Zhang Rui <rui.zhang@intel.com>
> 
> right?

Oops, sorry.

Tested-by: Zhang Rui <rui.zhang@intel.com>

-rui


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-07  1:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-23 16:55 [PATCH v4 0/3] x86: Prefer MWAIT over HLT on AMD processors Wyes Karny
2022-05-23 16:55 ` [PATCH v4 1/3] x86: Handle idle=nomwait cmdline properly for x86_idle Wyes Karny
2022-05-25  8:06   ` Zhang Rui
2022-06-02 15:41     ` Wyes Karny
2022-06-05 12:32       ` Zhang Rui
2022-06-06  9:13         ` Wyes Karny
2022-05-23 16:55 ` [PATCH v4 2/3] x86: Remove vendor checks from prefer_mwait_c1_over_halt Wyes Karny
2022-05-25 16:55   ` Peter Zijlstra
2022-06-06 12:50   ` Zhang Rui
2022-06-06 15:37     ` Dave Hansen
2022-06-07  1:16       ` Zhang Rui
2022-05-23 16:55 ` [PATCH v4 3/3] x86: Fix comment for X86_FEATURE_ZEN Wyes Karny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).