All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/7 Add support for running in VM guests to intel_idle
@ 2023-06-01 18:27 arjan
  2023-06-01 18:27 ` [PATCH 1/7] intel_idle: refactor state->enter manipulation into its own function arjan
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: arjan @ 2023-06-01 18:27 UTC (permalink / raw)
  To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven

From: Arjan van de Ven <arjan@linux.intel.com>

intel_idle provides the CPU Idle states (for power saving in idle) to the
cpuidle framework, based on per-cpu tables combined with limited hardware
enumeration. This combination of cpuidle and intel_idle provides dynamic
behavior where power saving and performance impact are dynamically balanced
and where a set of generic knobs are provided in sysfs for users to tune
the heuristics (and get statistics etc)

However, intel_idle currently does not support running inside VM guests, and
the linux kernel falls back to either ACPI based idle (if supported by the
hypervisor/virtual bios) or just the default x86 fallback "hlt" based idle
method... that was introduced in the 1.2 kernel series... and lacks all the
dynamic behavior, user control and statistics that cpuidle brings.

While this is obviously functional, it's not great and we can do better
for the user by hooking up intel_idle into the cpuidle framework also
for the "in a guest" case.
And not only not great for the user, it's also not optimal and lacks two
key capabilities that are supported by the bare metal case:

1) The ability to flush the TLB for very long idle periods, to avoid
   a costly (and high latency) IPI wakeup later, of an idle vCPU when a
   process that used to run on the idle vCPU does an munmap or similar
   operation. Avoiding high latency IPIs helps avoid performance jitter.
2) The ability to use the new Intel C0.2 idle state instead of polling
   for very short duration idle periods to save power (and carbon footprint)

This patch series adds the basic support to run in a VM guest
to the intel_idle driver, and then addresses the first of these shortfalls.
The C0.2 gap will be fixed with a small additional patch after the
C0.2 support is merged seperately.

Arjan van de Ven (7):
  intel_idle: refactor state->enter manipulation into its own function
  intel_idle: clean up the (new) state_update_enter_method function
  intel_idle: Add a sanity check in the new state_update_enter_method
    function
  intel_idle: Add helper functions to support 'hlt' as idle state
  intel_idle: Add a way to skip the mwait check on all states
  intel_idle: Add support for using intel_idle in a VM guest using just
    hlt
  intel_idle: Add a "Long HLT" C1 state for the VM guest mode

 drivers/idle/intel_idle.c | 216 ++++++++++++++++++++++++++++++++++----
 1 file changed, 194 insertions(+), 22 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] intel_idle: refactor state->enter manipulation into its own function
  2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
@ 2023-06-01 18:27 ` arjan
  2023-06-01 18:27 ` [PATCH 2/7] intel_idle: clean up the (new) state_update_enter_method function arjan
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: arjan @ 2023-06-01 18:27 UTC (permalink / raw)
  To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven, Arjan van de Ven

From: Arjan van de Ven <arjan.van.de.ven@intel.com>

Since the 6.4 kernel, the logic for updating a state's enter method
based on "environmental conditions" (command line options, cpu sidechannel
workarounds etc etc) has gotten pretty complex.
This patch refactors this into a seperate small, self contained function
(no behavior changes) for improved readability and to make future
changes to this logic easier to do and understand.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/idle/intel_idle.c | 50 ++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index aa2d19db2b1d..c351b21c0875 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1839,6 +1839,32 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
 	return true;
 }
 
+static void state_update_enter_method(struct cpuidle_state *state, int cstate)
+{
+	if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
+		/*
+		 * Combining with XSTATE with IBRS or IRQ_ENABLE flags
+		 * is not currently supported but this driver.
+		 */
+		WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
+		WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
+		state->enter = intel_idle_xstate;
+	} else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
+			   state->flags & CPUIDLE_FLAG_IBRS) {
+		/*
+		 * IBRS mitigation requires that C-states are entered
+		 * with interrupts disabled.
+		 */
+		WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
+		state->enter = intel_idle_ibrs;
+	} else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
+		state->enter = intel_idle_irq;
+	} else if (force_irq_on) {
+		pr_info("forced intel_idle_irq for state %d\n", cstate);
+		state->enter = intel_idle_irq;
+	}
+}
+
 static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 {
 	int cstate;
@@ -1894,28 +1920,8 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 		drv->states[drv->state_count] = cpuidle_state_table[cstate];
 		state = &drv->states[drv->state_count];
 
-		if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
-			/*
-			 * Combining with XSTATE with IBRS or IRQ_ENABLE flags
-			 * is not currently supported but this driver.
-			 */
-			WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
-			WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
-			state->enter = intel_idle_xstate;
-		} else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
-			   state->flags & CPUIDLE_FLAG_IBRS) {
-			/*
-			 * IBRS mitigation requires that C-states are entered
-			 * with interrupts disabled.
-			 */
-			WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
-			state->enter = intel_idle_ibrs;
-		} else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
-			state->enter = intel_idle_irq;
-		} else if (force_irq_on) {
-			pr_info("forced intel_idle_irq for state %d\n", cstate);
-			state->enter = intel_idle_irq;
-		}
+		state_update_enter_method(state, cstate);
+
 
 		if ((disabled_states_mask & BIT(drv->state_count)) ||
 		    ((icpu->use_acpi || force_use_acpi) &&
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] intel_idle: clean up the (new) state_update_enter_method function
  2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
  2023-06-01 18:27 ` [PATCH 1/7] intel_idle: refactor state->enter manipulation into its own function arjan
@ 2023-06-01 18:27 ` arjan
  2023-06-04 15:34   ` Rafael J. Wysocki
  2023-06-01 18:27 ` [PATCH 3/7] intel_idle: Add a sanity check in the new " arjan
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: arjan @ 2023-06-01 18:27 UTC (permalink / raw)
  To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven, Arjan van de Ven

From: Arjan van de Ven <arjan.van.de.ven@intel.com>

Now that the logic for state_update_enter_method() is in its own
function, the long if .. else if .. else if .. else if chain
can be simplified by just returning from the function
at the various places. This does not change functionality,
but it makes the logic much simpler to read or modify later.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/idle/intel_idle.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c351b21c0875..256c2d42e350 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1849,7 +1849,10 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
 		WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
 		WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
 		state->enter = intel_idle_xstate;
-	} else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
+		return;
+	}
+
+	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
 			   state->flags & CPUIDLE_FLAG_IBRS) {
 		/*
 		 * IBRS mitigation requires that C-states are entered
@@ -1857,9 +1860,15 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
 		 */
 		WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
 		state->enter = intel_idle_ibrs;
-	} else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
+		return;
+	}
+
+	if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
 		state->enter = intel_idle_irq;
-	} else if (force_irq_on) {
+		return;
+	}
+
+	if (force_irq_on) {
 		pr_info("forced intel_idle_irq for state %d\n", cstate);
 		state->enter = intel_idle_irq;
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] intel_idle: Add a sanity check in the new state_update_enter_method function
  2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
  2023-06-01 18:27 ` [PATCH 1/7] intel_idle: refactor state->enter manipulation into its own function arjan
  2023-06-01 18:27 ` [PATCH 2/7] intel_idle: clean up the (new) state_update_enter_method function arjan
@ 2023-06-01 18:27 ` arjan
  2023-06-04 15:43   ` Rafael J. Wysocki
  2023-06-01 18:27 ` [PATCH 4/7] intel_idle: Add helper functions to support 'hlt' as idle state arjan
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: arjan @ 2023-06-01 18:27 UTC (permalink / raw)
  To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven, Arjan van de Ven

From: Arjan van de Ven <arjan.van.de.ven@intel.com>

The state_update_enter_method function updates a state's enter function pointer,
but does so assuming that the current function is "intel_idle" or "intel_idle_irq".

In the code currently that's basically the case, but soon this will change.
Add a sanity check early in the function to make the assumption explicit,
and return early if the precondition is not met.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/idle/intel_idle.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 256c2d42e350..8415965372c7 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1841,6 +1841,14 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
 
 static void state_update_enter_method(struct cpuidle_state *state, int cstate)
 {
+	/*
+	 * The updates below are only valid if state->enter is actually the
+	 * 'intel_idle' or 'intel_idle_irq' functions; for all other cases
+	 * we just bow out early.
+	 */
+	if (state->enter != intel_idle && state->enter != intel_idle_irq )
+		return;
+
 	if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
 		/*
 		 * Combining with XSTATE with IBRS or IRQ_ENABLE flags
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] intel_idle: Add helper functions to support 'hlt' as idle state
  2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
                   ` (2 preceding siblings ...)
  2023-06-01 18:27 ` [PATCH 3/7] intel_idle: Add a sanity check in the new " arjan
@ 2023-06-01 18:27 ` arjan
  2023-06-04 15:46   ` Rafael J. Wysocki
  2023-06-01 18:27 ` [PATCH 5/7] intel_idle: Add a way to skip the mwait check on all states arjan
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: arjan @ 2023-06-01 18:27 UTC (permalink / raw)
  To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven, Arjan van de Ven

From: Arjan van de Ven <arjan.van.de.ven@intel.com>

Currently the intel_idle driver has a family of functions called
intel_idle/intel_idle_irq/intel_idle_xsave/... that use the
mwait instruction to enter into a low power state.

x86 cpus can also use the legacy "hlt" instruction for this,
and in some cases (VM guests for example) the mwait instruction
might not be available.

Because of this, add the basic helpers to allow 'hlt' to be used to enter
a low power state (will be used in later patches), both in the
regular and the _irq enabled variant.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/idle/intel_idle.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 8415965372c7..66d262fd267e 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -199,6 +199,43 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev,
 	return __intel_idle(dev, drv, index);
 }
 
+static __always_inline int __intel_idle_hlt(struct cpuidle_device *dev,
+					struct cpuidle_driver *drv, int index)
+{
+	raw_safe_halt();
+	raw_local_irq_disable();
+	return index;
+}
+
+/**
+ * intel_idle_hlt - Ask the processor to enter the given idle state using hlt.
+ * @dev: cpuidle device of the target CPU.
+ * @drv: cpuidle driver (assumed to point to intel_idle_driver).
+ * @index: Target idle state index.
+ *
+ * Use the HLT instruction to notify the processor that the CPU represented by
+ * @dev is idle and it can try to enter the idle state corresponding to @index.
+ *
+ * Must be called under local_irq_disable().
+ */
+static __cpuidle int intel_idle_hlt(struct cpuidle_device *dev,
+				struct cpuidle_driver *drv, int index)
+{
+	return __intel_idle_hlt(dev, drv, index);
+}
+
+static __cpuidle int intel_idle_hlt_irq(struct cpuidle_device *dev,
+                                   struct cpuidle_driver *drv, int index)
+{
+       int ret;
+
+       raw_local_irq_enable();
+       ret = __intel_idle_hlt(dev, drv, index);
+       raw_local_irq_disable();
+
+       return ret;
+}
+
 /**
  * intel_idle_s2idle - Ask the processor to enter the given idle state.
  * @dev: cpuidle device of the target CPU.
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] intel_idle: Add a way to skip the mwait check on all states
  2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
                   ` (3 preceding siblings ...)
  2023-06-01 18:27 ` [PATCH 4/7] intel_idle: Add helper functions to support 'hlt' as idle state arjan
@ 2023-06-01 18:27 ` arjan
  2023-06-04 15:54   ` Rafael J. Wysocki
  2023-06-01 18:28 ` [PATCH 6/7] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: arjan @ 2023-06-01 18:27 UTC (permalink / raw)
  To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven, Arjan van de Ven

From: Arjan van de Ven <arjan.van.de.ven@intel.com>

Currently, intel_idle verifies that the cpuid instruction enumerates
that the mwait value for a state is actually supported by the CPU.

Going forward, when running in a VM guest, that check will not work
and we're going to need a way to turn it off.

Add a global bool for this, and uses this in the check
function to short circuit this cpuid check.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/idle/intel_idle.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 66d262fd267e..55c3e6ece3dd 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -69,6 +69,7 @@ static int max_cstate = CPUIDLE_STATE_MAX - 1;
 static unsigned int disabled_states_mask __read_mostly;
 static unsigned int preferred_states_mask __read_mostly;
 static bool force_irq_on __read_mostly;
+static bool skip_mwait_check __read_mostly;
 
 static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
 
@@ -1866,6 +1867,9 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
 	unsigned int num_substates = (mwait_substates >> mwait_cstate * 4) &
 					MWAIT_SUBSTATE_MASK;
 
+	if (skip_mwait_check)
+		return true;
+
 	/* Ignore the C-state if there are NO sub-states in CPUID for it. */
 	if (num_substates == 0)
 		return false;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] intel_idle: Add support for using intel_idle in a VM guest using just hlt
  2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
                   ` (4 preceding siblings ...)
  2023-06-01 18:27 ` [PATCH 5/7] intel_idle: Add a way to skip the mwait check on all states arjan
@ 2023-06-01 18:28 ` arjan
  2023-06-04 15:59   ` Rafael J. Wysocki
  2023-06-01 18:28 ` [PATCH 7/7] intel_idle: Add a "Long HLT" C1 state for the VM guest mode arjan
  2023-06-04 15:01 ` [PATCH 00/7 Add support for running in VM guests to intel_idle Rafael J. Wysocki
  7 siblings, 1 reply; 17+ messages in thread
From: arjan @ 2023-06-01 18:28 UTC (permalink / raw)
  To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven, Arjan van de Ven

From: Arjan van de Ven <arjan.van.de.ven@intel.com>

In a typical VM guest, the mwait instruction is not available, leaving only the
'hlt' instruction (which causes a VMEXIT to the host).

So currently, intel_idle will detect the lack of mwait, and fail
to initialize (after which another idle method would step in which will
just use hlt always).

By providing capability to do this with the intel_idle driver, we can
do better than this fallback. While this current change only gets us parity
to the existing behavior, later patches in this series will add new capabilities.

In order to do this, a simplified version of the initialization function
for VM guests is created, and this will be called if the CPU is recognized,
but mwait is not supported, and we're in a VM guest.

One thing to note is that the latency (and break even) of this C1 state
is higher than the typical bare metal C1 state. Because hlt causes a vmexit,
and the cost of vmexit + hypervisor overhead + vmenter is typically in the
order of upto 5 microseconds... even if the hypervisor does not actually
goes into a hardware power saving state.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/idle/intel_idle.c | 54 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 55c3e6ece3dd..c4929d8a35a4 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1280,6 +1280,18 @@ static struct cpuidle_state snr_cstates[] __initdata = {
 		.enter = NULL }
 };
 
+static struct cpuidle_state vmguest_cstates[] __initdata = {
+	{
+		.name = "C1",
+		.desc = "HLT",
+		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_IRQ_ENABLE,
+		.exit_latency = 5,
+		.target_residency = 10,
+		.enter = &intel_idle_hlt_irq, },
+	{
+		.enter = NULL }
+};
+
 static const struct idle_cpu idle_cpu_nehalem __initconst = {
 	.state_table = nehalem_cstates,
 	.auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE,
@@ -2105,6 +2117,46 @@ static void __init intel_idle_cpuidle_devices_uninit(void)
 		cpuidle_unregister_device(per_cpu_ptr(intel_idle_cpuidle_devices, i));
 }
 
+static int __init intel_idle_vminit(const struct x86_cpu_id *id)
+{
+	int retval;
+
+	cpuidle_state_table = vmguest_cstates;
+	skip_mwait_check = true; /* hypervisor hides mwait from us normally */
+
+	icpu = (const struct idle_cpu *)id->driver_data;
+
+	pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n",
+		 boot_cpu_data.x86_model);
+
+	intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
+	if (!intel_idle_cpuidle_devices)
+		return -ENOMEM;
+
+	intel_idle_cpuidle_driver_init(&intel_idle_driver);
+
+	retval = cpuidle_register_driver(&intel_idle_driver);
+	if (retval) {
+		struct cpuidle_driver *drv = cpuidle_get_driver();
+		printk(KERN_DEBUG pr_fmt("intel_idle yielding to %s\n"),
+		       drv ? drv->name : "none");
+		goto init_driver_fail;
+	}
+
+	retval = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "idle/intel:online",
+				   intel_idle_cpu_online, NULL);
+	if (retval < 0)
+		goto hp_setup_fail;
+
+	return 0;
+hp_setup_fail:
+	intel_idle_cpuidle_devices_uninit();
+	cpuidle_unregister_driver(&intel_idle_driver);
+init_driver_fail:
+	free_percpu(intel_idle_cpuidle_devices);
+	return retval;
+}
+
 static int __init intel_idle_init(void)
 {
 	const struct x86_cpu_id *id;
@@ -2123,6 +2175,8 @@ static int __init intel_idle_init(void)
 	id = x86_match_cpu(intel_idle_ids);
 	if (id) {
 		if (!boot_cpu_has(X86_FEATURE_MWAIT)) {
+			if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+				return intel_idle_vminit(id);
 			pr_debug("Please enable MWAIT in BIOS SETUP\n");
 			return -ENODEV;
 		}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] intel_idle: Add a "Long HLT" C1 state for the VM guest mode
  2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
                   ` (5 preceding siblings ...)
  2023-06-01 18:28 ` [PATCH 6/7] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
@ 2023-06-01 18:28 ` arjan
  2023-06-04 15:01 ` [PATCH 00/7 Add support for running in VM guests to intel_idle Rafael J. Wysocki
  7 siblings, 0 replies; 17+ messages in thread
From: arjan @ 2023-06-01 18:28 UTC (permalink / raw)
  To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven, Arjan van de Ven

From: Arjan van de Ven <arjan.van.de.ven@intel.com>

intel_idle will, for the bare metal case, usually have one or more deep
power states that have the CPUIDLE_FLAG_TLB_FLUSHED flag set. When
a state with this flag is selected by the cpuidle framework, it will also
flush the TLBs as part of entering this state. The benefit of doing this is
that the kernel does not need to wake the cpu out of this deep power state
just to flush the TLBs... for which the latency can be very high due to
the exit latency of deep power states.

In a VM guest currently, this benefit of avoiding the wakeup does not exist,
while the problem (long exit latency) is even more severe. Linux will need
to wake up a vCPU (causing the host to either come out of a deep C state,
or the VMM to have to deschedule something else to schedule the vCPU) which
can take a very long time.. adding a lot of latency to tlb flush operations
(including munmap and others).

To solve this, add a "Long HLT" C state to the state table for the VM guest
case that has the CPUIDLE_FLAG_TLB_FLUSHED flag set.  The result of that is
that for long idle periods (where the VMM is likely to do things that cause
large latency) the cpuidle framework will flush the TLBs (and avoid the
wakeups), while for short/quick idle durations, the existing behavior is
retained.

Now, there is still only "hlt" available in the guest, but for long idle,
the host can go to a deeper state (say C6).  There is a reasonable debate
one can have to what to set for the exit_latency and break even point for
this "Long HLT" state.  The good news is that intel_idle has these values
available for the underlying CPU (even when mwait is not exposed).  The
solution thus is to just use the latency and break even of the deepest state
from the bare metal CPU.  This is under the assumption that this is a pretty
reasonable estimate of what the VMM would do to cause latency.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/idle/intel_idle.c | 54 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c4929d8a35a4..e056e1ec64a9 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1288,6 +1288,13 @@ static struct cpuidle_state vmguest_cstates[] __initdata = {
 		.exit_latency = 5,
 		.target_residency = 10,
 		.enter = &intel_idle_hlt_irq, },
+	{
+		.name = "C1L",
+		.desc = "Long HLT",
+		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TLB_FLUSHED,
+		.exit_latency = 5,
+		.target_residency = 200,
+		.enter = &intel_idle_hlt, },
 	{
 		.enter = NULL }
 };
@@ -2117,6 +2124,44 @@ static void __init intel_idle_cpuidle_devices_uninit(void)
 		cpuidle_unregister_device(per_cpu_ptr(intel_idle_cpuidle_devices, i));
 }
 
+/*
+ * Match up the latency and break even point of the bare metal (cpu based)
+ * states with the deepest VM available state.
+ *
+ * We only want to do this for the deepest state, the ones that has
+ * the TLB_FLUSHED flag set on the .
+ *
+ * All our short idle states are dominated by vmexit/vmenter latencies,
+ * not the underlying hardware latencies so we keep our values for these.
+ */
+static void matchup_vm_state_with_baremetal(void)
+{
+	int cstate;
+
+	for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
+		int matching_cstate;
+
+		if (intel_idle_max_cstate_reached(cstate))
+			break;
+
+		if (!cpuidle_state_table[cstate].enter &&
+		    !cpuidle_state_table[cstate].enter_s2idle)
+			break;
+
+		if (!(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_TLB_FLUSHED))
+			continue;
+
+		for (matching_cstate = 0; matching_cstate < CPUIDLE_STATE_MAX; ++matching_cstate) {
+			if (icpu->state_table[matching_cstate].exit_latency > cpuidle_state_table[cstate].exit_latency) {
+				cpuidle_state_table[cstate].exit_latency = icpu->state_table[matching_cstate].exit_latency;
+				cpuidle_state_table[cstate].target_residency = icpu->state_table[matching_cstate].target_residency;
+			}
+		}
+
+	}
+}
+
+
 static int __init intel_idle_vminit(const struct x86_cpu_id *id)
 {
 	int retval;
@@ -2133,6 +2178,15 @@ static int __init intel_idle_vminit(const struct x86_cpu_id *id)
 	if (!intel_idle_cpuidle_devices)
 		return -ENOMEM;
 
+	/*
+	 * We don't know exactly what the host will do when we go idle, but as a worst estimate
+	 * we can assume that the exit latency of the deepest host state will be hit for our
+	 * deep (long duration) guest idle state.
+	 * The same logic applies to the break even point for the long duration guest idle state.
+	 * So lets copy these two properties from the table we found for the host CPU type.
+	 */
+	matchup_vm_state_with_baremetal();
+
 	intel_idle_cpuidle_driver_init(&intel_idle_driver);
 
 	retval = cpuidle_register_driver(&intel_idle_driver);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 00/7 Add support for running in VM guests to intel_idle
  2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
                   ` (6 preceding siblings ...)
  2023-06-01 18:28 ` [PATCH 7/7] intel_idle: Add a "Long HLT" C1 state for the VM guest mode arjan
@ 2023-06-04 15:01 ` Rafael J. Wysocki
  7 siblings, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2023-06-04 15:01 UTC (permalink / raw)
  To: arjan; +Cc: linux-pm, artem.bityutskiy, rafael

On Thu, Jun 1, 2023 at 8:28 PM <arjan@linux.intel.com> wrote:
>
> From: Arjan van de Ven <arjan@linux.intel.com>
>
> intel_idle provides the CPU Idle states (for power saving in idle) to the
> cpuidle framework, based on per-cpu tables combined with limited hardware
> enumeration. This combination of cpuidle and intel_idle provides dynamic
> behavior where power saving and performance impact are dynamically balanced
> and where a set of generic knobs are provided in sysfs for users to tune
> the heuristics (and get statistics etc)
>
> However, intel_idle currently does not support running inside VM guests, and
> the linux kernel falls back to either ACPI based idle (if supported by the
> hypervisor/virtual bios) or just the default x86 fallback "hlt" based idle
> method... that was introduced in the 1.2 kernel series... and lacks all the
> dynamic behavior, user control and statistics that cpuidle brings.
>
> While this is obviously functional, it's not great and we can do better
> for the user by hooking up intel_idle into the cpuidle framework also
> for the "in a guest" case.
> And not only not great for the user, it's also not optimal and lacks two
> key capabilities that are supported by the bare metal case:
>
> 1) The ability to flush the TLB for very long idle periods, to avoid
>    a costly (and high latency) IPI wakeup later, of an idle vCPU when a
>    process that used to run on the idle vCPU does an munmap or similar
>    operation. Avoiding high latency IPIs helps avoid performance jitter.
> 2) The ability to use the new Intel C0.2 idle state instead of polling
>    for very short duration idle periods to save power (and carbon footprint)
>
> This patch series adds the basic support to run in a VM guest
> to the intel_idle driver, and then addresses the first of these shortfalls.

Is intel_idle supposed to replace ACPI idle inside VM guests where it
is supported?

If not, how is it prevented from taking over in those cases?

> The C0.2 gap will be fixed with a small additional patch after the
> C0.2 support is merged seperately.
>
> Arjan van de Ven (7):
>   intel_idle: refactor state->enter manipulation into its own function
>   intel_idle: clean up the (new) state_update_enter_method function
>   intel_idle: Add a sanity check in the new state_update_enter_method
>     function
>   intel_idle: Add helper functions to support 'hlt' as idle state
>   intel_idle: Add a way to skip the mwait check on all states
>   intel_idle: Add support for using intel_idle in a VM guest using just
>     hlt
>   intel_idle: Add a "Long HLT" C1 state for the VM guest mode
>
>  drivers/idle/intel_idle.c | 216 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 194 insertions(+), 22 deletions(-)
>
> --

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] intel_idle: clean up the (new) state_update_enter_method function
  2023-06-01 18:27 ` [PATCH 2/7] intel_idle: clean up the (new) state_update_enter_method function arjan
@ 2023-06-04 15:34   ` Rafael J. Wysocki
  2023-06-04 22:35     ` Van De Ven, Arjan
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2023-06-04 15:34 UTC (permalink / raw)
  To: arjan; +Cc: linux-pm, artem.bityutskiy, rafael, Arjan van de Ven

On Thu, Jun 1, 2023 at 8:28 PM <arjan@linux.intel.com> wrote:
>
> From: Arjan van de Ven <arjan.van.de.ven@intel.com>
>
> Now that the logic for state_update_enter_method() is in its own
> function, the long if .. else if .. else if .. else if chain
> can be simplified by just returning from the function
> at the various places. This does not change functionality,
> but it makes the logic much simpler to read or modify later.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>

This and the [1/7] can be applied without the rest of the series, so
please let me know if you want me to do that.

> ---
>  drivers/idle/intel_idle.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index c351b21c0875..256c2d42e350 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -1849,7 +1849,10 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
>                 WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
>                 WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
>                 state->enter = intel_idle_xstate;
> -       } else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
> +               return;
> +       }
> +
> +       if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
>                            state->flags & CPUIDLE_FLAG_IBRS) {
>                 /*
>                  * IBRS mitigation requires that C-states are entered
> @@ -1857,9 +1860,15 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
>                  */
>                 WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
>                 state->enter = intel_idle_ibrs;
> -       } else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
> +               return;
> +       }
> +
> +       if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
>                 state->enter = intel_idle_irq;
> -       } else if (force_irq_on) {
> +               return;
> +       }
> +
> +       if (force_irq_on) {
>                 pr_info("forced intel_idle_irq for state %d\n", cstate);
>                 state->enter = intel_idle_irq;
>         }
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] intel_idle: Add a sanity check in the new state_update_enter_method function
  2023-06-01 18:27 ` [PATCH 3/7] intel_idle: Add a sanity check in the new " arjan
@ 2023-06-04 15:43   ` Rafael J. Wysocki
  0 siblings, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2023-06-04 15:43 UTC (permalink / raw)
  To: arjan; +Cc: linux-pm, artem.bityutskiy, rafael, Arjan van de Ven

On Thu, Jun 1, 2023 at 8:28 PM <arjan@linux.intel.com> wrote:
>
> From: Arjan van de Ven <arjan.van.de.ven@intel.com>
>
> The state_update_enter_method function updates a state's enter function pointer,
> but does so assuming that the current function is "intel_idle" or "intel_idle_irq".
>
> In the code currently that's basically the case, but soon this will change.
> Add a sanity check early in the function to make the assumption explicit,
> and return early if the precondition is not met.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  drivers/idle/intel_idle.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 256c2d42e350..8415965372c7 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -1841,6 +1841,14 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
>
>  static void state_update_enter_method(struct cpuidle_state *state, int cstate)
>  {
> +       /*
> +        * The updates below are only valid if state->enter is actually the
> +        * 'intel_idle' or 'intel_idle_irq' functions; for all other cases
> +        * we just bow out early.
> +        */
> +       if (state->enter != intel_idle && state->enter != intel_idle_irq )
> +               return;

Instead of doing this, I would add a check against intel_idle_hlt_irq
in patch [6/7] and extend it to cover intel_idle_hlt in patch [6/7],
because these are the cases to skip here.

> +
>         if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
>                 /*
>                  * Combining with XSTATE with IBRS or IRQ_ENABLE flags
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] intel_idle: Add helper functions to support 'hlt' as idle state
  2023-06-01 18:27 ` [PATCH 4/7] intel_idle: Add helper functions to support 'hlt' as idle state arjan
@ 2023-06-04 15:46   ` Rafael J. Wysocki
  0 siblings, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2023-06-04 15:46 UTC (permalink / raw)
  To: arjan; +Cc: linux-pm, artem.bityutskiy, rafael, Arjan van de Ven

On Thu, Jun 1, 2023 at 8:28 PM <arjan@linux.intel.com> wrote:
>
> From: Arjan van de Ven <arjan.van.de.ven@intel.com>
>
> Currently the intel_idle driver has a family of functions called
> intel_idle/intel_idle_irq/intel_idle_xsave/... that use the
> mwait instruction to enter into a low power state.
>
> x86 cpus can also use the legacy "hlt" instruction for this,
> and in some cases (VM guests for example) the mwait instruction
> might not be available.
>
> Because of this, add the basic helpers to allow 'hlt' to be used to enter
> a low power state (will be used in later patches), both in the
> regular and the _irq enabled variant.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>

I would prefer this to be combined with patch [6/7] by the rule that
it's better to add helpers along with their users.

> ---
>  drivers/idle/intel_idle.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 8415965372c7..66d262fd267e 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -199,6 +199,43 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev,
>         return __intel_idle(dev, drv, index);
>  }
>
> +static __always_inline int __intel_idle_hlt(struct cpuidle_device *dev,
> +                                       struct cpuidle_driver *drv, int index)
> +{
> +       raw_safe_halt();
> +       raw_local_irq_disable();
> +       return index;
> +}
> +
> +/**
> + * intel_idle_hlt - Ask the processor to enter the given idle state using hlt.
> + * @dev: cpuidle device of the target CPU.
> + * @drv: cpuidle driver (assumed to point to intel_idle_driver).
> + * @index: Target idle state index.
> + *
> + * Use the HLT instruction to notify the processor that the CPU represented by
> + * @dev is idle and it can try to enter the idle state corresponding to @index.
> + *
> + * Must be called under local_irq_disable().
> + */
> +static __cpuidle int intel_idle_hlt(struct cpuidle_device *dev,
> +                               struct cpuidle_driver *drv, int index)
> +{
> +       return __intel_idle_hlt(dev, drv, index);
> +}
> +
> +static __cpuidle int intel_idle_hlt_irq(struct cpuidle_device *dev,
> +                                   struct cpuidle_driver *drv, int index)
> +{
> +       int ret;
> +
> +       raw_local_irq_enable();
> +       ret = __intel_idle_hlt(dev, drv, index);
> +       raw_local_irq_disable();
> +
> +       return ret;
> +}
> +
>  /**
>   * intel_idle_s2idle - Ask the processor to enter the given idle state.
>   * @dev: cpuidle device of the target CPU.
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/7] intel_idle: Add a way to skip the mwait check on all states
  2023-06-01 18:27 ` [PATCH 5/7] intel_idle: Add a way to skip the mwait check on all states arjan
@ 2023-06-04 15:54   ` Rafael J. Wysocki
  2023-06-05 15:24     ` Arjan van de Ven
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2023-06-04 15:54 UTC (permalink / raw)
  To: arjan; +Cc: linux-pm, artem.bityutskiy, rafael, Arjan van de Ven

On Thu, Jun 1, 2023 at 8:28 PM <arjan@linux.intel.com> wrote:
>
> From: Arjan van de Ven <arjan.van.de.ven@intel.com>
>
> Currently, intel_idle verifies that the cpuid instruction enumerates
> that the mwait value for a state is actually supported by the CPU.
>
> Going forward, when running in a VM guest, that check will not work
> and we're going to need a way to turn it off.
>
> Add a global bool for this, and uses this in the check
> function to short circuit this cpuid check.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  drivers/idle/intel_idle.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 66d262fd267e..55c3e6ece3dd 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -69,6 +69,7 @@ static int max_cstate = CPUIDLE_STATE_MAX - 1;
>  static unsigned int disabled_states_mask __read_mostly;
>  static unsigned int preferred_states_mask __read_mostly;
>  static bool force_irq_on __read_mostly;
> +static bool skip_mwait_check __read_mostly;
>
>  static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
>
> @@ -1866,6 +1867,9 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
>         unsigned int num_substates = (mwait_substates >> mwait_cstate * 4) &
>                                         MWAIT_SUBSTATE_MASK;
>
> +       if (skip_mwait_check)
> +               return true;
> +

I don't think that the static variable is needed here.

This function is only called from intel_idle_init_cstates_icpu() which
in turn is only called by intel_idle_cpuidle_driver_init().

The latter is called directly by intel_idle_vminit() in the next patch
and so it can be passed a bool arg indicating whether or not to skip
the mwait checks.

I would even check this arg in intel_idle_init_cstates_icpu() so it is
not necessary to look into intel_idle_verify_cstate() to see what it
is for.

>         /* Ignore the C-state if there are NO sub-states in CPUID for it. */
>         if (num_substates == 0)
>                 return false;
> --

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] intel_idle: Add support for using intel_idle in a VM guest using just hlt
  2023-06-01 18:28 ` [PATCH 6/7] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
@ 2023-06-04 15:59   ` Rafael J. Wysocki
  2023-06-04 22:34     ` Van De Ven, Arjan
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2023-06-04 15:59 UTC (permalink / raw)
  To: arjan; +Cc: linux-pm, artem.bityutskiy, rafael, Arjan van de Ven

On Thu, Jun 1, 2023 at 8:28 PM <arjan@linux.intel.com> wrote:
>
> From: Arjan van de Ven <arjan.van.de.ven@intel.com>
>
> In a typical VM guest, the mwait instruction is not available, leaving only the
> 'hlt' instruction (which causes a VMEXIT to the host).

What if it is available?  It can be AFAICS.

> So currently, intel_idle will detect the lack of mwait, and fail
> to initialize (after which another idle method would step in which will
> just use hlt always).
>
> By providing capability to do this with the intel_idle driver, we can
> do better than this fallback. While this current change only gets us parity
> to the existing behavior, later patches in this series will add new capabilities.
>
> In order to do this, a simplified version of the initialization function
> for VM guests is created, and this will be called if the CPU is recognized,
> but mwait is not supported, and we're in a VM guest.

It will cause intel_idle to become the idle driver in some cases in
which ACPI idle is used nowadays if I'm not mistaken.  Wouldn't that
be regarded as a problem?

> One thing to note is that the latency (and break even) of this C1 state
> is higher than the typical bare metal C1 state. Because hlt causes a vmexit,
> and the cost of vmexit + hypervisor overhead + vmenter is typically in the
> order of upto 5 microseconds... even if the hypervisor does not actually
> goes into a hardware power saving state.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  drivers/idle/intel_idle.c | 54 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 55c3e6ece3dd..c4929d8a35a4 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -1280,6 +1280,18 @@ static struct cpuidle_state snr_cstates[] __initdata = {
>                 .enter = NULL }
>  };
>
> +static struct cpuidle_state vmguest_cstates[] __initdata = {
> +       {
> +               .name = "C1",
> +               .desc = "HLT",
> +               .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_IRQ_ENABLE,
> +               .exit_latency = 5,
> +               .target_residency = 10,
> +               .enter = &intel_idle_hlt_irq, },
> +       {
> +               .enter = NULL }
> +};
> +
>  static const struct idle_cpu idle_cpu_nehalem __initconst = {
>         .state_table = nehalem_cstates,
>         .auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE,
> @@ -2105,6 +2117,46 @@ static void __init intel_idle_cpuidle_devices_uninit(void)
>                 cpuidle_unregister_device(per_cpu_ptr(intel_idle_cpuidle_devices, i));
>  }
>
> +static int __init intel_idle_vminit(const struct x86_cpu_id *id)
> +{
> +       int retval;
> +
> +       cpuidle_state_table = vmguest_cstates;
> +       skip_mwait_check = true; /* hypervisor hides mwait from us normally */
> +
> +       icpu = (const struct idle_cpu *)id->driver_data;
> +
> +       pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n",
> +                boot_cpu_data.x86_model);
> +
> +       intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
> +       if (!intel_idle_cpuidle_devices)
> +               return -ENOMEM;
> +
> +       intel_idle_cpuidle_driver_init(&intel_idle_driver);
> +
> +       retval = cpuidle_register_driver(&intel_idle_driver);
> +       if (retval) {
> +               struct cpuidle_driver *drv = cpuidle_get_driver();
> +               printk(KERN_DEBUG pr_fmt("intel_idle yielding to %s\n"),
> +                      drv ? drv->name : "none");
> +               goto init_driver_fail;
> +       }
> +
> +       retval = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "idle/intel:online",
> +                                  intel_idle_cpu_online, NULL);
> +       if (retval < 0)
> +               goto hp_setup_fail;
> +
> +       return 0;
> +hp_setup_fail:
> +       intel_idle_cpuidle_devices_uninit();
> +       cpuidle_unregister_driver(&intel_idle_driver);
> +init_driver_fail:
> +       free_percpu(intel_idle_cpuidle_devices);
> +       return retval;
> +}
> +
>  static int __init intel_idle_init(void)
>  {
>         const struct x86_cpu_id *id;
> @@ -2123,6 +2175,8 @@ static int __init intel_idle_init(void)
>         id = x86_match_cpu(intel_idle_ids);
>         if (id) {
>                 if (!boot_cpu_has(X86_FEATURE_MWAIT)) {
> +                       if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> +                               return intel_idle_vminit(id);
>                         pr_debug("Please enable MWAIT in BIOS SETUP\n");
>                         return -ENODEV;
>                 }
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 6/7] intel_idle: Add support for using intel_idle in a VM guest using just hlt
  2023-06-04 15:59   ` Rafael J. Wysocki
@ 2023-06-04 22:34     ` Van De Ven, Arjan
  0 siblings, 0 replies; 17+ messages in thread
From: Van De Ven, Arjan @ 2023-06-04 22:34 UTC (permalink / raw)
  To: Rafael J. Wysocki, arjan; +Cc: linux-pm, artem.bityutskiy

> 
> On Thu, Jun 1, 2023 at 8:28 PM <arjan@linux.intel.com> wrote:
> >
> > From: Arjan van de Ven <arjan.van.de.ven@intel.com>
> >
> > In a typical VM guest, the mwait instruction is not available, leaving only the
> > 'hlt' instruction (which causes a VMEXIT to the host).
> 
> What if it is available?  It can be AFAICS.

if it is available, the normal intel idle driver as it is today will be used
(this is only installing the handler inside the "no mwait" path)

> > In order to do this, a simplified version of the initialization function
> > for VM guests is created, and this will be called if the CPU is recognized,
> > but mwait is not supported, and we're in a VM guest.
> 
> It will cause intel_idle to become the idle driver in some cases in
> which ACPI idle is used nowadays if I'm not mistaken.  Wouldn't that
> be regarded as a problem?

ACPI idle will expose .. hlt since mwait is not available.
So even then intel_idle is a better choice in my view


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 2/7] intel_idle: clean up the (new) state_update_enter_method function
  2023-06-04 15:34   ` Rafael J. Wysocki
@ 2023-06-04 22:35     ` Van De Ven, Arjan
  0 siblings, 0 replies; 17+ messages in thread
From: Van De Ven, Arjan @ 2023-06-04 22:35 UTC (permalink / raw)
  To: Rafael J. Wysocki, arjan; +Cc: linux-pm, artem.bityutskiy

> 
> This and the [1/7] can be applied without the rest of the series, so
> please let me know if you want me to do that.


yes please do that; it's a useful cleanup either way and makes the next version of the series smaller/simpler


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/7] intel_idle: Add a way to skip the mwait check on all states
  2023-06-04 15:54   ` Rafael J. Wysocki
@ 2023-06-05 15:24     ` Arjan van de Ven
  0 siblings, 0 replies; 17+ messages in thread
From: Arjan van de Ven @ 2023-06-05 15:24 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, artem.bityutskiy, Arjan van de Ven


> I don't think that the static variable is needed here.
> 
> This function is only called from intel_idle_init_cstates_icpu() which
> in turn is only called by intel_idle_cpuidle_driver_init().
> 
> The latter is called directly by intel_idle_vminit() in the next patch
> and so it can be passed a bool arg indicating whether or not to skip
> the mwait checks.
> 

I think I found a nicer solution; will be in the next rev shortly


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-05 15:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
2023-06-01 18:27 ` [PATCH 1/7] intel_idle: refactor state->enter manipulation into its own function arjan
2023-06-01 18:27 ` [PATCH 2/7] intel_idle: clean up the (new) state_update_enter_method function arjan
2023-06-04 15:34   ` Rafael J. Wysocki
2023-06-04 22:35     ` Van De Ven, Arjan
2023-06-01 18:27 ` [PATCH 3/7] intel_idle: Add a sanity check in the new " arjan
2023-06-04 15:43   ` Rafael J. Wysocki
2023-06-01 18:27 ` [PATCH 4/7] intel_idle: Add helper functions to support 'hlt' as idle state arjan
2023-06-04 15:46   ` Rafael J. Wysocki
2023-06-01 18:27 ` [PATCH 5/7] intel_idle: Add a way to skip the mwait check on all states arjan
2023-06-04 15:54   ` Rafael J. Wysocki
2023-06-05 15:24     ` Arjan van de Ven
2023-06-01 18:28 ` [PATCH 6/7] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
2023-06-04 15:59   ` Rafael J. Wysocki
2023-06-04 22:34     ` Van De Ven, Arjan
2023-06-01 18:28 ` [PATCH 7/7] intel_idle: Add a "Long HLT" C1 state for the VM guest mode arjan
2023-06-04 15:01 ` [PATCH 00/7 Add support for running in VM guests to intel_idle Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.