All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
@ 2022-07-21 20:44 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 12+ messages in thread
From: Srivatsa S. Bhat @ 2022-07-21 20:44 UTC (permalink / raw)
  To: linux-kernel, virtualization
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Alexey Makhalov, Juergen Gross, x86,
	VMware PV-Drivers Reviewers, ganb, sturlapati, bordoloih,
	ankitja, keerthanak, namit, srivatsa, srivatsab

From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>

VMware ESXi allows enabling a passthru mwait CPU-idle state in the
guest using the following VMX option:

monitor_control.mwait_in_guest = "TRUE"

This lets a vCPU in mwait to remain in guest context (instead of
yielding to the hypervisor via a VMEXIT), which helps speed up
wakeups from idle.

However, this runs into problems with CPU hotplug, because the Linux
CPU offline path prefers to put the vCPU-to-be-offlined in mwait
state, whenever mwait is available. As a result, since a vCPU in mwait
remains in guest context and does not yield to the hypervisor, an
offline vCPU *appears* to be 100% busy as viewed from ESXi, which
prevents the hypervisor from running other vCPUs or workloads on the
corresponding pCPU (particularly when vCPU - pCPU mappings are
statically defined by the user). [ Note that such a vCPU is not
actually busy spinning though; it remains in mwait idle state in the
guest ].

Fix this by overriding the CPU offline play_dead() callback for VMware
hypervisor, by putting the CPU in halt state (which actually yields to
the hypervisor), even if mwait support is available.

Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
---

 arch/x86/kernel/cpu/vmware.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index c04b933f48d3..420e359ed9bb 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -27,6 +27,7 @@
 #include <linux/clocksource.h>
 #include <linux/cpu.h>
 #include <linux/reboot.h>
+#include <linux/tboot.h>
 #include <linux/static_call.h>
 #include <asm/div64.h>
 #include <asm/x86_init.h>
@@ -312,6 +313,21 @@ static int vmware_cpu_down_prepare(unsigned int cpu)
 	local_irq_enable();
 	return 0;
 }
+
+static void vmware_play_dead(void)
+{
+	play_dead_common();
+	tboot_shutdown(TB_SHUTDOWN_WFS);
+
+	/*
+	 * Put the vCPU going offline in halt instead of mwait (even
+	 * if mwait support is available), to make sure that the
+	 * offline vCPU yields to the hypervisor (which may not happen
+	 * with mwait, for example, if the guest's VMX is configured
+	 * to retain the vCPU in guest context upon mwait).
+	 */
+	hlt_play_dead();
+}
 #endif
 
 static __init int activate_jump_labels(void)
@@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
 #ifdef CONFIG_SMP
 		smp_ops.smp_prepare_boot_cpu =
 			vmware_smp_prepare_boot_cpu;
+		smp_ops.play_dead = vmware_play_dead;
 		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
 					      "x86/vmware:online",
 					      vmware_cpu_online,



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
@ 2022-07-21 20:44 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 12+ messages in thread
From: Srivatsa S. Bhat @ 2022-07-21 20:44 UTC (permalink / raw)
  To: linux-kernel, virtualization
  Cc: Juergen Gross, namit, x86, Alexey Makhalov, Peter Zijlstra,
	Dave Hansen, sturlapati, keerthanak, VMware PV-Drivers Reviewers,
	ganb, Ingo Molnar, Borislav Petkov, ankitja, H. Peter Anvin,
	bordoloih, Thomas Gleixner

From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>

VMware ESXi allows enabling a passthru mwait CPU-idle state in the
guest using the following VMX option:

monitor_control.mwait_in_guest = "TRUE"

This lets a vCPU in mwait to remain in guest context (instead of
yielding to the hypervisor via a VMEXIT), which helps speed up
wakeups from idle.

However, this runs into problems with CPU hotplug, because the Linux
CPU offline path prefers to put the vCPU-to-be-offlined in mwait
state, whenever mwait is available. As a result, since a vCPU in mwait
remains in guest context and does not yield to the hypervisor, an
offline vCPU *appears* to be 100% busy as viewed from ESXi, which
prevents the hypervisor from running other vCPUs or workloads on the
corresponding pCPU (particularly when vCPU - pCPU mappings are
statically defined by the user). [ Note that such a vCPU is not
actually busy spinning though; it remains in mwait idle state in the
guest ].

Fix this by overriding the CPU offline play_dead() callback for VMware
hypervisor, by putting the CPU in halt state (which actually yields to
the hypervisor), even if mwait support is available.

Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
---

 arch/x86/kernel/cpu/vmware.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index c04b933f48d3..420e359ed9bb 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -27,6 +27,7 @@
 #include <linux/clocksource.h>
 #include <linux/cpu.h>
 #include <linux/reboot.h>
+#include <linux/tboot.h>
 #include <linux/static_call.h>
 #include <asm/div64.h>
 #include <asm/x86_init.h>
@@ -312,6 +313,21 @@ static int vmware_cpu_down_prepare(unsigned int cpu)
 	local_irq_enable();
 	return 0;
 }
+
+static void vmware_play_dead(void)
+{
+	play_dead_common();
+	tboot_shutdown(TB_SHUTDOWN_WFS);
+
+	/*
+	 * Put the vCPU going offline in halt instead of mwait (even
+	 * if mwait support is available), to make sure that the
+	 * offline vCPU yields to the hypervisor (which may not happen
+	 * with mwait, for example, if the guest's VMX is configured
+	 * to retain the vCPU in guest context upon mwait).
+	 */
+	hlt_play_dead();
+}
 #endif
 
 static __init int activate_jump_labels(void)
@@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
 #ifdef CONFIG_SMP
 		smp_ops.smp_prepare_boot_cpu =
 			vmware_smp_prepare_boot_cpu;
+		smp_ops.play_dead = vmware_play_dead;
 		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
 					      "x86/vmware:online",
 					      vmware_cpu_online,


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
  2022-07-21 20:44 ` Srivatsa S. Bhat
@ 2022-09-23  1:57   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 12+ messages in thread
From: Srivatsa S. Bhat @ 2022-09-23  1:57 UTC (permalink / raw)
  To: linux-kernel, virtualization
  Cc: Juergen Gross, namit, x86, Alexey Makhalov, Peter Zijlstra,
	Dave Hansen, sturlapati, keerthanak, VMware PV-Drivers Reviewers,
	ganb, Ingo Molnar, Borislav Petkov, ankitja, H. Peter Anvin,
	bordoloih, Thomas Gleixner


Hi Boris, Thomas, Peter,

On 7/21/22 1:44 PM, Srivatsa S. Bhat wrote:
> From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> 
> VMware ESXi allows enabling a passthru mwait CPU-idle state in the
> guest using the following VMX option:
> 
> monitor_control.mwait_in_guest = "TRUE"
> 
> This lets a vCPU in mwait to remain in guest context (instead of
> yielding to the hypervisor via a VMEXIT), which helps speed up
> wakeups from idle.
> 
> However, this runs into problems with CPU hotplug, because the Linux
> CPU offline path prefers to put the vCPU-to-be-offlined in mwait
> state, whenever mwait is available. As a result, since a vCPU in mwait
> remains in guest context and does not yield to the hypervisor, an
> offline vCPU *appears* to be 100% busy as viewed from ESXi, which
> prevents the hypervisor from running other vCPUs or workloads on the
> corresponding pCPU (particularly when vCPU - pCPU mappings are
> statically defined by the user). [ Note that such a vCPU is not
> actually busy spinning though; it remains in mwait idle state in the
> guest ].
> 
> Fix this by overriding the CPU offline play_dead() callback for VMware
> hypervisor, by putting the CPU in halt state (which actually yields to
> the hypervisor), even if mwait support is available.
> 
> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Alexey Makhalov <amakhalov@vmware.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: x86@kernel.org
> Cc: VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
> ---

Could you share your thoughts on this patch when you get a chance,
please? I verified that this patch still applies cleanly on current
mainline (6.0-rc6). I'm happy to resend the patch though, if it helps.

Thank you!

Regards,
Srivatsa

> 
>  arch/x86/kernel/cpu/vmware.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
> index c04b933f48d3..420e359ed9bb 100644
> --- a/arch/x86/kernel/cpu/vmware.c
> +++ b/arch/x86/kernel/cpu/vmware.c
> @@ -27,6 +27,7 @@
>  #include <linux/clocksource.h>
>  #include <linux/cpu.h>
>  #include <linux/reboot.h>
> +#include <linux/tboot.h>
>  #include <linux/static_call.h>
>  #include <asm/div64.h>
>  #include <asm/x86_init.h>
> @@ -312,6 +313,21 @@ static int vmware_cpu_down_prepare(unsigned int cpu)
>  	local_irq_enable();
>  	return 0;
>  }
> +
> +static void vmware_play_dead(void)
> +{
> +	play_dead_common();
> +	tboot_shutdown(TB_SHUTDOWN_WFS);
> +
> +	/*
> +	 * Put the vCPU going offline in halt instead of mwait (even
> +	 * if mwait support is available), to make sure that the
> +	 * offline vCPU yields to the hypervisor (which may not happen
> +	 * with mwait, for example, if the guest's VMX is configured
> +	 * to retain the vCPU in guest context upon mwait).
> +	 */
> +	hlt_play_dead();
> +}
>  #endif
>  
>  static __init int activate_jump_labels(void)
> @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
>  #ifdef CONFIG_SMP
>  		smp_ops.smp_prepare_boot_cpu =
>  			vmware_smp_prepare_boot_cpu;
> +		smp_ops.play_dead = vmware_play_dead;
>  		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>  					      "x86/vmware:online",
>  					      vmware_cpu_online,
> 
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
@ 2022-09-23  1:57   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 12+ messages in thread
From: Srivatsa S. Bhat @ 2022-09-23  1:57 UTC (permalink / raw)
  To: linux-kernel, virtualization
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Alexey Makhalov, Juergen Gross, x86,
	VMware PV-Drivers Reviewers, ganb, sturlapati, bordoloih,
	ankitja, keerthanak, namit, srivatsab


Hi Boris, Thomas, Peter,

On 7/21/22 1:44 PM, Srivatsa S. Bhat wrote:
> From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> 
> VMware ESXi allows enabling a passthru mwait CPU-idle state in the
> guest using the following VMX option:
> 
> monitor_control.mwait_in_guest = "TRUE"
> 
> This lets a vCPU in mwait to remain in guest context (instead of
> yielding to the hypervisor via a VMEXIT), which helps speed up
> wakeups from idle.
> 
> However, this runs into problems with CPU hotplug, because the Linux
> CPU offline path prefers to put the vCPU-to-be-offlined in mwait
> state, whenever mwait is available. As a result, since a vCPU in mwait
> remains in guest context and does not yield to the hypervisor, an
> offline vCPU *appears* to be 100% busy as viewed from ESXi, which
> prevents the hypervisor from running other vCPUs or workloads on the
> corresponding pCPU (particularly when vCPU - pCPU mappings are
> statically defined by the user). [ Note that such a vCPU is not
> actually busy spinning though; it remains in mwait idle state in the
> guest ].
> 
> Fix this by overriding the CPU offline play_dead() callback for VMware
> hypervisor, by putting the CPU in halt state (which actually yields to
> the hypervisor), even if mwait support is available.
> 
> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Alexey Makhalov <amakhalov@vmware.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: x86@kernel.org
> Cc: VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
> ---

Could you share your thoughts on this patch when you get a chance,
please? I verified that this patch still applies cleanly on current
mainline (6.0-rc6). I'm happy to resend the patch though, if it helps.

Thank you!

Regards,
Srivatsa

> 
>  arch/x86/kernel/cpu/vmware.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
> index c04b933f48d3..420e359ed9bb 100644
> --- a/arch/x86/kernel/cpu/vmware.c
> +++ b/arch/x86/kernel/cpu/vmware.c
> @@ -27,6 +27,7 @@
>  #include <linux/clocksource.h>
>  #include <linux/cpu.h>
>  #include <linux/reboot.h>
> +#include <linux/tboot.h>
>  #include <linux/static_call.h>
>  #include <asm/div64.h>
>  #include <asm/x86_init.h>
> @@ -312,6 +313,21 @@ static int vmware_cpu_down_prepare(unsigned int cpu)
>  	local_irq_enable();
>  	return 0;
>  }
> +
> +static void vmware_play_dead(void)
> +{
> +	play_dead_common();
> +	tboot_shutdown(TB_SHUTDOWN_WFS);
> +
> +	/*
> +	 * Put the vCPU going offline in halt instead of mwait (even
> +	 * if mwait support is available), to make sure that the
> +	 * offline vCPU yields to the hypervisor (which may not happen
> +	 * with mwait, for example, if the guest's VMX is configured
> +	 * to retain the vCPU in guest context upon mwait).
> +	 */
> +	hlt_play_dead();
> +}
>  #endif
>  
>  static __init int activate_jump_labels(void)
> @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
>  #ifdef CONFIG_SMP
>  		smp_ops.smp_prepare_boot_cpu =
>  			vmware_smp_prepare_boot_cpu;
> +		smp_ops.play_dead = vmware_play_dead;
>  		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>  					      "x86/vmware:online",
>  					      vmware_cpu_online,
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
  2022-07-21 20:44 ` Srivatsa S. Bhat
@ 2022-09-23  7:05   ` Peter Zijlstra
  -1 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2022-09-23  7:05 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-kernel, virtualization, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Alexey Makhalov,
	Juergen Gross, x86, VMware PV-Drivers Reviewers, ganb,
	sturlapati, bordoloih, ankitja, keerthanak, namit, srivatsab

On Thu, Jul 21, 2022 at 01:44:33PM -0700, Srivatsa S. Bhat wrote:
> From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> 
> VMware ESXi allows enabling a passthru mwait CPU-idle state in the
> guest using the following VMX option:
> 
> monitor_control.mwait_in_guest = "TRUE"
> 
> This lets a vCPU in mwait to remain in guest context (instead of
> yielding to the hypervisor via a VMEXIT), which helps speed up
> wakeups from idle.
> 
> However, this runs into problems with CPU hotplug, because the Linux
> CPU offline path prefers to put the vCPU-to-be-offlined in mwait
> state, whenever mwait is available. As a result, since a vCPU in mwait
> remains in guest context and does not yield to the hypervisor, an
> offline vCPU *appears* to be 100% busy as viewed from ESXi, which
> prevents the hypervisor from running other vCPUs or workloads on the
> corresponding pCPU (particularly when vCPU - pCPU mappings are
> statically defined by the user).

I would hope vCPU pinning is a mandatory thing when MWAIT passthrough it
set?

> [ Note that such a vCPU is not
> actually busy spinning though; it remains in mwait idle state in the
> guest ].
> 
> Fix this by overriding the CPU offline play_dead() callback for VMware
> hypervisor, by putting the CPU in halt state (which actually yields to
> the hypervisor), even if mwait support is available.
> 
> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> ---

> +static void vmware_play_dead(void)
> +{
> +	play_dead_common();
> +	tboot_shutdown(TB_SHUTDOWN_WFS);
> +
> +	/*
> +	 * Put the vCPU going offline in halt instead of mwait (even
> +	 * if mwait support is available), to make sure that the
> +	 * offline vCPU yields to the hypervisor (which may not happen
> +	 * with mwait, for example, if the guest's VMX is configured
> +	 * to retain the vCPU in guest context upon mwait).
> +	 */
> +	hlt_play_dead();
> +}
>  #endif
>  
>  static __init int activate_jump_labels(void)
> @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
>  #ifdef CONFIG_SMP
>  		smp_ops.smp_prepare_boot_cpu =
>  			vmware_smp_prepare_boot_cpu;
> +		smp_ops.play_dead = vmware_play_dead;
>  		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>  					      "x86/vmware:online",
>  					      vmware_cpu_online,

No real objection here; but would not something like the below fix the
problem more generally? I'm thinking MWAIT passthrough for *any*
hypervisor doesn't want play_dead to use it.

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f24227bc3220..166cb3aaca8a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1759,6 +1759,8 @@ static inline void mwait_play_dead(void)
 		return;
 	if (!this_cpu_has(X86_FEATURE_CLFLUSH))
 		return;
+	if (this_cpu_has(X86_FEATURE_HYPERVISOR))
+		return;
 	if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
 		return;
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
@ 2022-09-23  7:05   ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2022-09-23  7:05 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Juergen Gross, namit, x86, Alexey Makhalov,
	VMware PV-Drivers Reviewers, Dave Hansen, linux-kernel,
	virtualization, keerthanak, ganb, Ingo Molnar, Borislav Petkov,
	ankitja, H. Peter Anvin, bordoloih, Thomas Gleixner, sturlapati

On Thu, Jul 21, 2022 at 01:44:33PM -0700, Srivatsa S. Bhat wrote:
> From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> 
> VMware ESXi allows enabling a passthru mwait CPU-idle state in the
> guest using the following VMX option:
> 
> monitor_control.mwait_in_guest = "TRUE"
> 
> This lets a vCPU in mwait to remain in guest context (instead of
> yielding to the hypervisor via a VMEXIT), which helps speed up
> wakeups from idle.
> 
> However, this runs into problems with CPU hotplug, because the Linux
> CPU offline path prefers to put the vCPU-to-be-offlined in mwait
> state, whenever mwait is available. As a result, since a vCPU in mwait
> remains in guest context and does not yield to the hypervisor, an
> offline vCPU *appears* to be 100% busy as viewed from ESXi, which
> prevents the hypervisor from running other vCPUs or workloads on the
> corresponding pCPU (particularly when vCPU - pCPU mappings are
> statically defined by the user).

I would hope vCPU pinning is a mandatory thing when MWAIT passthrough it
set?

> [ Note that such a vCPU is not
> actually busy spinning though; it remains in mwait idle state in the
> guest ].
> 
> Fix this by overriding the CPU offline play_dead() callback for VMware
> hypervisor, by putting the CPU in halt state (which actually yields to
> the hypervisor), even if mwait support is available.
> 
> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> ---

> +static void vmware_play_dead(void)
> +{
> +	play_dead_common();
> +	tboot_shutdown(TB_SHUTDOWN_WFS);
> +
> +	/*
> +	 * Put the vCPU going offline in halt instead of mwait (even
> +	 * if mwait support is available), to make sure that the
> +	 * offline vCPU yields to the hypervisor (which may not happen
> +	 * with mwait, for example, if the guest's VMX is configured
> +	 * to retain the vCPU in guest context upon mwait).
> +	 */
> +	hlt_play_dead();
> +}
>  #endif
>  
>  static __init int activate_jump_labels(void)
> @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
>  #ifdef CONFIG_SMP
>  		smp_ops.smp_prepare_boot_cpu =
>  			vmware_smp_prepare_boot_cpu;
> +		smp_ops.play_dead = vmware_play_dead;
>  		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>  					      "x86/vmware:online",
>  					      vmware_cpu_online,

No real objection here; but would not something like the below fix the
problem more generally? I'm thinking MWAIT passthrough for *any*
hypervisor doesn't want play_dead to use it.

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f24227bc3220..166cb3aaca8a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1759,6 +1759,8 @@ static inline void mwait_play_dead(void)
 		return;
 	if (!this_cpu_has(X86_FEATURE_CLFLUSH))
 		return;
+	if (this_cpu_has(X86_FEATURE_HYPERVISOR))
+		return;
 	if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
 		return;
 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
  2022-09-23  7:05   ` Peter Zijlstra
@ 2022-09-23 10:45     ` Borislav Petkov
  -1 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2022-09-23 10:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srivatsa S. Bhat, linux-kernel, virtualization, Thomas Gleixner,
	Ingo Molnar, Dave Hansen, H. Peter Anvin, Alexey Makhalov,
	Juergen Gross, x86, VMware PV-Drivers Reviewers, ganb,
	sturlapati, bordoloih, ankitja, keerthanak, namit, srivatsab,
	kvm ML

+ kvm ML and leaving the whole mail quoted in for them.

On Fri, Sep 23, 2022 at 09:05:26AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 21, 2022 at 01:44:33PM -0700, Srivatsa S. Bhat wrote:
> > From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> > 
> > VMware ESXi allows enabling a passthru mwait CPU-idle state in the
> > guest using the following VMX option:
> > 
> > monitor_control.mwait_in_guest = "TRUE"
> > 
> > This lets a vCPU in mwait to remain in guest context (instead of
> > yielding to the hypervisor via a VMEXIT), which helps speed up
> > wakeups from idle.
> > 
> > However, this runs into problems with CPU hotplug, because the Linux
> > CPU offline path prefers to put the vCPU-to-be-offlined in mwait
> > state, whenever mwait is available. As a result, since a vCPU in mwait
> > remains in guest context and does not yield to the hypervisor, an
> > offline vCPU *appears* to be 100% busy as viewed from ESXi, which
> > prevents the hypervisor from running other vCPUs or workloads on the
> > corresponding pCPU (particularly when vCPU - pCPU mappings are
> > statically defined by the user).
> 
> I would hope vCPU pinning is a mandatory thing when MWAIT passthrough it
> set?
> 
> > [ Note that such a vCPU is not
> > actually busy spinning though; it remains in mwait idle state in the
> > guest ].
> > 
> > Fix this by overriding the CPU offline play_dead() callback for VMware
> > hypervisor, by putting the CPU in halt state (which actually yields to
> > the hypervisor), even if mwait support is available.
> > 
> > Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> > ---
> 
> > +static void vmware_play_dead(void)
> > +{
> > +	play_dead_common();
> > +	tboot_shutdown(TB_SHUTDOWN_WFS);
> > +
> > +	/*
> > +	 * Put the vCPU going offline in halt instead of mwait (even
> > +	 * if mwait support is available), to make sure that the
> > +	 * offline vCPU yields to the hypervisor (which may not happen
> > +	 * with mwait, for example, if the guest's VMX is configured
> > +	 * to retain the vCPU in guest context upon mwait).
> > +	 */
> > +	hlt_play_dead();
> > +}
> >  #endif
> >  
> >  static __init int activate_jump_labels(void)
> > @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
> >  #ifdef CONFIG_SMP
> >  		smp_ops.smp_prepare_boot_cpu =
> >  			vmware_smp_prepare_boot_cpu;
> > +		smp_ops.play_dead = vmware_play_dead;
> >  		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> >  					      "x86/vmware:online",
> >  					      vmware_cpu_online,
> 
> No real objection here; but would not something like the below fix the
> problem more generally? I'm thinking MWAIT passthrough for *any*
> hypervisor doesn't want play_dead to use it.
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index f24227bc3220..166cb3aaca8a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1759,6 +1759,8 @@ static inline void mwait_play_dead(void)
>  		return;
>  	if (!this_cpu_has(X86_FEATURE_CLFLUSH))
>  		return;
> +	if (this_cpu_has(X86_FEATURE_HYPERVISOR))
> +		return;
>  	if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
>  		return;

Yeah, it would be nice if we could get a consensus here from all
relevant HVs.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
@ 2022-09-23 10:45     ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2022-09-23 10:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juergen Gross, x86, Alexey Makhalov, kvm ML,
	VMware PV-Drivers Reviewers, Dave Hansen, linux-kernel,
	virtualization, keerthanak, ganb, Ingo Molnar, namit, ankitja,
	H. Peter Anvin, bordoloih, Thomas Gleixner, sturlapati

+ kvm ML and leaving the whole mail quoted in for them.

On Fri, Sep 23, 2022 at 09:05:26AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 21, 2022 at 01:44:33PM -0700, Srivatsa S. Bhat wrote:
> > From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> > 
> > VMware ESXi allows enabling a passthru mwait CPU-idle state in the
> > guest using the following VMX option:
> > 
> > monitor_control.mwait_in_guest = "TRUE"
> > 
> > This lets a vCPU in mwait to remain in guest context (instead of
> > yielding to the hypervisor via a VMEXIT), which helps speed up
> > wakeups from idle.
> > 
> > However, this runs into problems with CPU hotplug, because the Linux
> > CPU offline path prefers to put the vCPU-to-be-offlined in mwait
> > state, whenever mwait is available. As a result, since a vCPU in mwait
> > remains in guest context and does not yield to the hypervisor, an
> > offline vCPU *appears* to be 100% busy as viewed from ESXi, which
> > prevents the hypervisor from running other vCPUs or workloads on the
> > corresponding pCPU (particularly when vCPU - pCPU mappings are
> > statically defined by the user).
> 
> I would hope vCPU pinning is a mandatory thing when MWAIT passthrough it
> set?
> 
> > [ Note that such a vCPU is not
> > actually busy spinning though; it remains in mwait idle state in the
> > guest ].
> > 
> > Fix this by overriding the CPU offline play_dead() callback for VMware
> > hypervisor, by putting the CPU in halt state (which actually yields to
> > the hypervisor), even if mwait support is available.
> > 
> > Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> > ---
> 
> > +static void vmware_play_dead(void)
> > +{
> > +	play_dead_common();
> > +	tboot_shutdown(TB_SHUTDOWN_WFS);
> > +
> > +	/*
> > +	 * Put the vCPU going offline in halt instead of mwait (even
> > +	 * if mwait support is available), to make sure that the
> > +	 * offline vCPU yields to the hypervisor (which may not happen
> > +	 * with mwait, for example, if the guest's VMX is configured
> > +	 * to retain the vCPU in guest context upon mwait).
> > +	 */
> > +	hlt_play_dead();
> > +}
> >  #endif
> >  
> >  static __init int activate_jump_labels(void)
> > @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
> >  #ifdef CONFIG_SMP
> >  		smp_ops.smp_prepare_boot_cpu =
> >  			vmware_smp_prepare_boot_cpu;
> > +		smp_ops.play_dead = vmware_play_dead;
> >  		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> >  					      "x86/vmware:online",
> >  					      vmware_cpu_online,
> 
> No real objection here; but would not something like the below fix the
> problem more generally? I'm thinking MWAIT passthrough for *any*
> hypervisor doesn't want play_dead to use it.
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index f24227bc3220..166cb3aaca8a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1759,6 +1759,8 @@ static inline void mwait_play_dead(void)
>  		return;
>  	if (!this_cpu_has(X86_FEATURE_CLFLUSH))
>  		return;
> +	if (this_cpu_has(X86_FEATURE_HYPERVISOR))
> +		return;
>  	if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
>  		return;

Yeah, it would be nice if we could get a consensus here from all
relevant HVs.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
  2022-09-23  7:05   ` Peter Zijlstra
@ 2022-09-23 10:52     ` Juergen Gross via Virtualization
  -1 siblings, 0 replies; 12+ messages in thread
From: Juergen Gross @ 2022-09-23 10:52 UTC (permalink / raw)
  To: Peter Zijlstra, Srivatsa S. Bhat
  Cc: linux-kernel, virtualization, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Alexey Makhalov,
	x86, VMware PV-Drivers Reviewers, ganb, sturlapati, bordoloih,
	ankitja, keerthanak, namit, srivatsab


[-- Attachment #1.1.1: Type: text/plain, Size: 3200 bytes --]

On 23.09.22 09:05, Peter Zijlstra wrote:
> On Thu, Jul 21, 2022 at 01:44:33PM -0700, Srivatsa S. Bhat wrote:
>> From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
>>
>> VMware ESXi allows enabling a passthru mwait CPU-idle state in the
>> guest using the following VMX option:
>>
>> monitor_control.mwait_in_guest = "TRUE"
>>
>> This lets a vCPU in mwait to remain in guest context (instead of
>> yielding to the hypervisor via a VMEXIT), which helps speed up
>> wakeups from idle.
>>
>> However, this runs into problems with CPU hotplug, because the Linux
>> CPU offline path prefers to put the vCPU-to-be-offlined in mwait
>> state, whenever mwait is available. As a result, since a vCPU in mwait
>> remains in guest context and does not yield to the hypervisor, an
>> offline vCPU *appears* to be 100% busy as viewed from ESXi, which
>> prevents the hypervisor from running other vCPUs or workloads on the
>> corresponding pCPU (particularly when vCPU - pCPU mappings are
>> statically defined by the user).
> 
> I would hope vCPU pinning is a mandatory thing when MWAIT passthrough it
> set?
> 
>> [ Note that such a vCPU is not
>> actually busy spinning though; it remains in mwait idle state in the
>> guest ].
>>
>> Fix this by overriding the CPU offline play_dead() callback for VMware
>> hypervisor, by putting the CPU in halt state (which actually yields to
>> the hypervisor), even if mwait support is available.
>>
>> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
>> ---
> 
>> +static void vmware_play_dead(void)
>> +{
>> +	play_dead_common();
>> +	tboot_shutdown(TB_SHUTDOWN_WFS);
>> +
>> +	/*
>> +	 * Put the vCPU going offline in halt instead of mwait (even
>> +	 * if mwait support is available), to make sure that the
>> +	 * offline vCPU yields to the hypervisor (which may not happen
>> +	 * with mwait, for example, if the guest's VMX is configured
>> +	 * to retain the vCPU in guest context upon mwait).
>> +	 */
>> +	hlt_play_dead();
>> +}
>>   #endif
>>   
>>   static __init int activate_jump_labels(void)
>> @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
>>   #ifdef CONFIG_SMP
>>   		smp_ops.smp_prepare_boot_cpu =
>>   			vmware_smp_prepare_boot_cpu;
>> +		smp_ops.play_dead = vmware_play_dead;
>>   		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>>   					      "x86/vmware:online",
>>   					      vmware_cpu_online,
> 
> No real objection here; but would not something like the below fix the
> problem more generally? I'm thinking MWAIT passthrough for *any*
> hypervisor doesn't want play_dead to use it.
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index f24227bc3220..166cb3aaca8a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1759,6 +1759,8 @@ static inline void mwait_play_dead(void)
>   		return;
>   	if (!this_cpu_has(X86_FEATURE_CLFLUSH))
>   		return;
> +	if (this_cpu_has(X86_FEATURE_HYPERVISOR))
> +		return;
>   	if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
>   		return;
>   

With my Xen hat on I agree with this approach.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
@ 2022-09-23 10:52     ` Juergen Gross via Virtualization
  0 siblings, 0 replies; 12+ messages in thread
From: Juergen Gross via Virtualization @ 2022-09-23 10:52 UTC (permalink / raw)
  To: Peter Zijlstra, Srivatsa S. Bhat
  Cc: namit, x86, Alexey Makhalov, VMware PV-Drivers Reviewers,
	Dave Hansen, linux-kernel, virtualization, keerthanak, ganb,
	Ingo Molnar, Borislav Petkov, ankitja, H. Peter Anvin, bordoloih,
	Thomas Gleixner, sturlapati


[-- Attachment #1.1.1.1: Type: text/plain, Size: 3200 bytes --]

On 23.09.22 09:05, Peter Zijlstra wrote:
> On Thu, Jul 21, 2022 at 01:44:33PM -0700, Srivatsa S. Bhat wrote:
>> From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
>>
>> VMware ESXi allows enabling a passthru mwait CPU-idle state in the
>> guest using the following VMX option:
>>
>> monitor_control.mwait_in_guest = "TRUE"
>>
>> This lets a vCPU in mwait to remain in guest context (instead of
>> yielding to the hypervisor via a VMEXIT), which helps speed up
>> wakeups from idle.
>>
>> However, this runs into problems with CPU hotplug, because the Linux
>> CPU offline path prefers to put the vCPU-to-be-offlined in mwait
>> state, whenever mwait is available. As a result, since a vCPU in mwait
>> remains in guest context and does not yield to the hypervisor, an
>> offline vCPU *appears* to be 100% busy as viewed from ESXi, which
>> prevents the hypervisor from running other vCPUs or workloads on the
>> corresponding pCPU (particularly when vCPU - pCPU mappings are
>> statically defined by the user).
> 
> I would hope vCPU pinning is a mandatory thing when MWAIT passthrough it
> set?
> 
>> [ Note that such a vCPU is not
>> actually busy spinning though; it remains in mwait idle state in the
>> guest ].
>>
>> Fix this by overriding the CPU offline play_dead() callback for VMware
>> hypervisor, by putting the CPU in halt state (which actually yields to
>> the hypervisor), even if mwait support is available.
>>
>> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
>> ---
> 
>> +static void vmware_play_dead(void)
>> +{
>> +	play_dead_common();
>> +	tboot_shutdown(TB_SHUTDOWN_WFS);
>> +
>> +	/*
>> +	 * Put the vCPU going offline in halt instead of mwait (even
>> +	 * if mwait support is available), to make sure that the
>> +	 * offline vCPU yields to the hypervisor (which may not happen
>> +	 * with mwait, for example, if the guest's VMX is configured
>> +	 * to retain the vCPU in guest context upon mwait).
>> +	 */
>> +	hlt_play_dead();
>> +}
>>   #endif
>>   
>>   static __init int activate_jump_labels(void)
>> @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
>>   #ifdef CONFIG_SMP
>>   		smp_ops.smp_prepare_boot_cpu =
>>   			vmware_smp_prepare_boot_cpu;
>> +		smp_ops.play_dead = vmware_play_dead;
>>   		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>>   					      "x86/vmware:online",
>>   					      vmware_cpu_online,
> 
> No real objection here; but would not something like the below fix the
> problem more generally? I'm thinking MWAIT passthrough for *any*
> hypervisor doesn't want play_dead to use it.
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index f24227bc3220..166cb3aaca8a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1759,6 +1759,8 @@ static inline void mwait_play_dead(void)
>   		return;
>   	if (!this_cpu_has(X86_FEATURE_CLFLUSH))
>   		return;
> +	if (this_cpu_has(X86_FEATURE_HYPERVISOR))
> +		return;
>   	if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
>   		return;
>   

With my Xen hat on I agree with this approach.


Juergen

[-- Attachment #1.1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
  2022-09-23 10:45     ` Borislav Petkov
@ 2022-09-26 22:41       ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 12+ messages in thread
From: Srivatsa S. Bhat @ 2022-09-26 22:41 UTC (permalink / raw)
  To: Borislav Petkov, Peter Zijlstra
  Cc: linux-kernel, virtualization, Thomas Gleixner, Ingo Molnar,
	Dave Hansen, H. Peter Anvin, Alexey Makhalov, Juergen Gross, x86,
	VMware PV-Drivers Reviewers, ganb, sturlapati, bordoloih,
	ankitja, keerthanak, namit, srivatsab, kvm ML

On 9/23/22 3:45 AM, Borislav Petkov wrote:
> + kvm ML and leaving the whole mail quoted in for them.
> 
> On Fri, Sep 23, 2022 at 09:05:26AM +0200, Peter Zijlstra wrote:
>> On Thu, Jul 21, 2022 at 01:44:33PM -0700, Srivatsa S. Bhat wrote:
>>> From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
>>>
>>> VMware ESXi allows enabling a passthru mwait CPU-idle state in the
>>> guest using the following VMX option:
>>>
>>> monitor_control.mwait_in_guest = "TRUE"
>>>
>>> This lets a vCPU in mwait to remain in guest context (instead of
>>> yielding to the hypervisor via a VMEXIT), which helps speed up
>>> wakeups from idle.
>>>
>>> However, this runs into problems with CPU hotplug, because the Linux
>>> CPU offline path prefers to put the vCPU-to-be-offlined in mwait
>>> state, whenever mwait is available. As a result, since a vCPU in mwait
>>> remains in guest context and does not yield to the hypervisor, an
>>> offline vCPU *appears* to be 100% busy as viewed from ESXi, which
>>> prevents the hypervisor from running other vCPUs or workloads on the
>>> corresponding pCPU (particularly when vCPU - pCPU mappings are
>>> statically defined by the user).
>>
>> I would hope vCPU pinning is a mandatory thing when MWAIT passthrough it
>> set?
>>
>>> [ Note that such a vCPU is not
>>> actually busy spinning though; it remains in mwait idle state in the
>>> guest ].
>>>
>>> Fix this by overriding the CPU offline play_dead() callback for VMware
>>> hypervisor, by putting the CPU in halt state (which actually yields to
>>> the hypervisor), even if mwait support is available.
>>>
>>> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
>>> ---
>>
>>> +static void vmware_play_dead(void)
>>> +{
>>> +	play_dead_common();
>>> +	tboot_shutdown(TB_SHUTDOWN_WFS);
>>> +
>>> +	/*
>>> +	 * Put the vCPU going offline in halt instead of mwait (even
>>> +	 * if mwait support is available), to make sure that the
>>> +	 * offline vCPU yields to the hypervisor (which may not happen
>>> +	 * with mwait, for example, if the guest's VMX is configured
>>> +	 * to retain the vCPU in guest context upon mwait).
>>> +	 */
>>> +	hlt_play_dead();
>>> +}
>>>  #endif
>>>  
>>>  static __init int activate_jump_labels(void)
>>> @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
>>>  #ifdef CONFIG_SMP
>>>  		smp_ops.smp_prepare_boot_cpu =
>>>  			vmware_smp_prepare_boot_cpu;
>>> +		smp_ops.play_dead = vmware_play_dead;
>>>  		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>>>  					      "x86/vmware:online",
>>>  					      vmware_cpu_online,
>>
>> No real objection here; but would not something like the below fix the
>> problem more generally? I'm thinking MWAIT passthrough for *any*
>> hypervisor doesn't want play_dead to use it.
>>

That would be better indeed, thank you!

>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index f24227bc3220..166cb3aaca8a 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -1759,6 +1759,8 @@ static inline void mwait_play_dead(void)
>>  		return;
>>  	if (!this_cpu_has(X86_FEATURE_CLFLUSH))
>>  		return;
>> +	if (this_cpu_has(X86_FEATURE_HYPERVISOR))
>> +		return;
>>  	if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
>>  		return;
> 
> Yeah, it would be nice if we could get a consensus here from all
> relevant HVs.
> 

I'll send out a v2 after trying out these changes.

Thank you!

Regards,
Srivatsa
VMware Photon OS

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait
@ 2022-09-26 22:41       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 12+ messages in thread
From: Srivatsa S. Bhat @ 2022-09-26 22:41 UTC (permalink / raw)
  To: Borislav Petkov, Peter Zijlstra
  Cc: Juergen Gross, x86, Alexey Makhalov, kvm ML,
	VMware PV-Drivers Reviewers, Dave Hansen, linux-kernel,
	virtualization, keerthanak, ganb, Ingo Molnar, namit, ankitja,
	H. Peter Anvin, bordoloih, Thomas Gleixner, sturlapati

On 9/23/22 3:45 AM, Borislav Petkov wrote:
> + kvm ML and leaving the whole mail quoted in for them.
> 
> On Fri, Sep 23, 2022 at 09:05:26AM +0200, Peter Zijlstra wrote:
>> On Thu, Jul 21, 2022 at 01:44:33PM -0700, Srivatsa S. Bhat wrote:
>>> From: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
>>>
>>> VMware ESXi allows enabling a passthru mwait CPU-idle state in the
>>> guest using the following VMX option:
>>>
>>> monitor_control.mwait_in_guest = "TRUE"
>>>
>>> This lets a vCPU in mwait to remain in guest context (instead of
>>> yielding to the hypervisor via a VMEXIT), which helps speed up
>>> wakeups from idle.
>>>
>>> However, this runs into problems with CPU hotplug, because the Linux
>>> CPU offline path prefers to put the vCPU-to-be-offlined in mwait
>>> state, whenever mwait is available. As a result, since a vCPU in mwait
>>> remains in guest context and does not yield to the hypervisor, an
>>> offline vCPU *appears* to be 100% busy as viewed from ESXi, which
>>> prevents the hypervisor from running other vCPUs or workloads on the
>>> corresponding pCPU (particularly when vCPU - pCPU mappings are
>>> statically defined by the user).
>>
>> I would hope vCPU pinning is a mandatory thing when MWAIT passthrough it
>> set?
>>
>>> [ Note that such a vCPU is not
>>> actually busy spinning though; it remains in mwait idle state in the
>>> guest ].
>>>
>>> Fix this by overriding the CPU offline play_dead() callback for VMware
>>> hypervisor, by putting the CPU in halt state (which actually yields to
>>> the hypervisor), even if mwait support is available.
>>>
>>> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
>>> ---
>>
>>> +static void vmware_play_dead(void)
>>> +{
>>> +	play_dead_common();
>>> +	tboot_shutdown(TB_SHUTDOWN_WFS);
>>> +
>>> +	/*
>>> +	 * Put the vCPU going offline in halt instead of mwait (even
>>> +	 * if mwait support is available), to make sure that the
>>> +	 * offline vCPU yields to the hypervisor (which may not happen
>>> +	 * with mwait, for example, if the guest's VMX is configured
>>> +	 * to retain the vCPU in guest context upon mwait).
>>> +	 */
>>> +	hlt_play_dead();
>>> +}
>>>  #endif
>>>  
>>>  static __init int activate_jump_labels(void)
>>> @@ -349,6 +365,7 @@ static void __init vmware_paravirt_ops_setup(void)
>>>  #ifdef CONFIG_SMP
>>>  		smp_ops.smp_prepare_boot_cpu =
>>>  			vmware_smp_prepare_boot_cpu;
>>> +		smp_ops.play_dead = vmware_play_dead;
>>>  		if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>>>  					      "x86/vmware:online",
>>>  					      vmware_cpu_online,
>>
>> No real objection here; but would not something like the below fix the
>> problem more generally? I'm thinking MWAIT passthrough for *any*
>> hypervisor doesn't want play_dead to use it.
>>

That would be better indeed, thank you!

>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index f24227bc3220..166cb3aaca8a 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -1759,6 +1759,8 @@ static inline void mwait_play_dead(void)
>>  		return;
>>  	if (!this_cpu_has(X86_FEATURE_CLFLUSH))
>>  		return;
>> +	if (this_cpu_has(X86_FEATURE_HYPERVISOR))
>> +		return;
>>  	if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
>>  		return;
> 
> Yeah, it would be nice if we could get a consensus here from all
> relevant HVs.
> 

I'll send out a v2 after trying out these changes.

Thank you!

Regards,
Srivatsa
VMware Photon OS
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-09-26 22:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-21 20:44 [PATCH] smp/hotplug, x86/vmware: Put offline vCPUs in halt instead of mwait Srivatsa S. Bhat
2022-07-21 20:44 ` Srivatsa S. Bhat
2022-09-23  1:57 ` Srivatsa S. Bhat
2022-09-23  1:57   ` Srivatsa S. Bhat
2022-09-23  7:05 ` Peter Zijlstra
2022-09-23  7:05   ` Peter Zijlstra
2022-09-23 10:45   ` Borislav Petkov
2022-09-23 10:45     ` Borislav Petkov
2022-09-26 22:41     ` Srivatsa S. Bhat
2022-09-26 22:41       ` Srivatsa S. Bhat
2022-09-23 10:52   ` Juergen Gross
2022-09-23 10:52     ` Juergen Gross via Virtualization

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.