All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities
@ 2017-11-25 13:09 Jan H. Schönherr
  2017-11-25 13:09 ` [PATCH 1/3] KVM: Don't enable MWAIT in guest by default Jan H. Schönherr
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-25 13:09 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Jan H. Schönherr, Joerg Roedel, Michael S. Tsirkin,
	KarimAllah Ahmed, kvm

KVM no longer intercepts MWAIT instructions since Linux 4.12 commit 668fffa3f838
("kvm: better MWAIT emulation for guests") for improved latency in some
workloads.

This series takes the idea further and makes the interception of HLT (patch 2)
and PAUSE (patch 3) optional as well for similar reasons. Both interceptions
can be turned off for an individual VM by enabling the corresponding capability.

It also converts KVM_CAP_X86_GUEST_MWAIT into an initially disabled capability
that has to be enabled explicitly for a VM. This restores pre Linux 4.12
behavior in the default case, so that a guest cannot put CPUs into low power
states, which may exceed the host's latency requirements (patch 1).

Jan H. Schönherr (3):
  KVM: Don't enable MWAIT in guest by default
  KVM: Add capability to not exit on HLT
  KVM: Add capability to not exit on PAUSE

 Documentation/virtual/kvm/api.txt | 38 +++++++++++++++++++------
 arch/x86/include/asm/kvm_host.h   |  4 +++
 arch/x86/kvm/svm.c                |  8 ++++--
 arch/x86/kvm/vmx.c                | 59 +++++++++++++++++++++++++++++----------
 arch/x86/kvm/x86.c                | 54 ++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.h                | 41 +++++++--------------------
 include/uapi/linux/kvm.h          |  2 ++
 7 files changed, 148 insertions(+), 58 deletions(-)

-- 
2.3.1.dirty

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
  2017-11-25 13:09 [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities Jan H. Schönherr
@ 2017-11-25 13:09 ` Jan H. Schönherr
  2017-11-27 18:13   ` Jim Mattson
                     ` (2 more replies)
  2017-11-25 13:09 ` [PATCH 2/3] KVM: Add capability to not exit on HLT Jan H. Schönherr
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-25 13:09 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Jan H. Schönherr, Joerg Roedel, Michael S. Tsirkin,
	KarimAllah Ahmed, kvm

Allowing a guest to execute MWAIT without interception enables a guest
to put a (physical) CPU into a power saving state, where it takes
longer to return from than what may be desired by the host.

Don't give a guest that power over a host by default. (Especially,
since nothing prevents a guest from using MWAIT even when it is not
advertised via CPUID.)

This restores the behavior from before Linux 4.12 commit 668fffa3f838
("kvm: better MWAIT emulation for guests") but keeps the option to
enable MWAIT in guest for individual VMs.

Suggested-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
Note: AMD code paths are only compile tested
---
 Documentation/virtual/kvm/api.txt | 20 ++++++++++--------
 arch/x86/include/asm/kvm_host.h   |  2 ++
 arch/x86/kvm/svm.c                |  2 +-
 arch/x86/kvm/vmx.c                |  9 ++++----
 arch/x86/kvm/x86.c                | 44 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.h                | 35 ++-----------------------------
 6 files changed, 64 insertions(+), 48 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index f670e4b..0ee812c 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4163,6 +4163,17 @@ enables QEMU to build error log and branch to guest kernel registered
 machine check handling routine. Without this capability KVM will
 branch to guests' 0x200 interrupt vector.
 
+7.13 KVM_CAP_X86_GUEST_MWAIT
+
+Architectures: x86
+Parameters: none
+Returns: 0 on success
+
+This capability indicates that a guest using memory monitoring instructions
+(MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
+spent while a virtual CPU is halted in this way will then be accounted for as
+guest running time on the host (as opposed to e.g. HLT).
+
 8. Other capabilities.
 ----------------------
 
@@ -4275,15 +4286,6 @@ reserved.
     Both registers and addresses are 64-bits wide.
     It will be possible to run 64-bit or 32-bit guest code.
 
-8.8 KVM_CAP_X86_GUEST_MWAIT
-
-Architectures: x86
-
-This capability indicates that guest using memory monotoring instructions
-(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit.  As such time
-spent while virtual CPU is halted in this way will then be accounted for as
-guest running time on the host (as opposed to e.g. HLT).
-
 8.9 KVM_CAP_ARM_USER_IRQ
 
 Architectures: arm, arm64
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b97726e..f7bcfaa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -781,6 +781,8 @@ struct kvm_arch {
 
 	gpa_t wall_clock;
 
+	bool mwait_in_guest;
+
 	bool ept_identity_pagetable_done;
 	gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1f3e7f2..ef1b320 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1253,7 +1253,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 	set_intercept(svm, INTERCEPT_WBINVD);
 	set_intercept(svm, INTERCEPT_XSETBV);
 
-	if (!kvm_mwait_in_guest()) {
+	if (!kvm_mwait_in_guest(svm->vcpu.kvm)) {
 		set_intercept(svm, INTERCEPT_MONITOR);
 		set_intercept(svm, INTERCEPT_MWAIT);
 	}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1eb7053..a067735 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3635,13 +3635,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 	      CPU_BASED_USE_IO_BITMAPS |
 	      CPU_BASED_MOV_DR_EXITING |
 	      CPU_BASED_USE_TSC_OFFSETING |
+	      CPU_BASED_MWAIT_EXITING |
+	      CPU_BASED_MONITOR_EXITING |
 	      CPU_BASED_INVLPG_EXITING |
 	      CPU_BASED_RDPMC_EXITING;
 
-	if (!kvm_mwait_in_guest())
-		min |= CPU_BASED_MWAIT_EXITING |
-			CPU_BASED_MONITOR_EXITING;
-
 	opt = CPU_BASED_TPR_SHADOW |
 	      CPU_BASED_USE_MSR_BITMAPS |
 	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
@@ -5297,6 +5295,9 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
 		exec_control |= CPU_BASED_CR3_STORE_EXITING |
 				CPU_BASED_CR3_LOAD_EXITING  |
 				CPU_BASED_INVLPG_EXITING;
+	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
+		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
+				  CPU_BASED_MONITOR_EXITING);
 	return exec_control;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 985a305..fe6627a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -67,6 +67,7 @@
 #include <asm/pvclock.h>
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
+#include <asm/mwait.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -2672,6 +2673,40 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
 	return r;
 }
 
+static bool kvm_mwait_in_guest_possible(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+
+	if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
+		return false;
+
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_AMD:
+		/* All AMD CPUs have a working MWAIT implementation */
+		return true;
+	case X86_VENDOR_INTEL:
+		/* Handle Intel below */
+		break;
+	default:
+		return false;
+	}
+
+	/*
+	 * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
+	 * they would allow guest to stop the CPU completely by disabling
+	 * interrupts then invoking MWAIT.
+	 */
+	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
+		return false;
+
+	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
+
+	if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
+		return false;
+
+	return true;
+}
+
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 {
 	int r;
@@ -2726,7 +2761,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = KVM_CLOCK_TSC_STABLE;
 		break;
 	case KVM_CAP_X86_GUEST_MWAIT:
-		r = kvm_mwait_in_guest();
+		r = kvm_mwait_in_guest_possible();
 		break;
 	case KVM_CAP_X86_SMM:
 		/* SMBASE is usually relocated above 1M on modern chipsets,
@@ -4026,6 +4061,13 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 
 		r = 0;
 		break;
+	case KVM_CAP_X86_GUEST_MWAIT:
+		r = -EINVAL;
+		if (kvm_mwait_in_guest_possible()) {
+			kvm->arch.mwait_in_guest = true;
+			r = 0;
+		}
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d0b95b7..ed8e150 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -2,8 +2,6 @@
 #ifndef ARCH_X86_KVM_X86_H
 #define ARCH_X86_KVM_X86_H
 
-#include <asm/processor.h>
-#include <asm/mwait.h>
 #include <linux/kvm_host.h>
 #include <asm/pvclock.h>
 #include "kvm_cache_regs.h"
@@ -263,38 +261,9 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
 	    __rem;						\
 	 })
 
-static inline bool kvm_mwait_in_guest(void)
+static inline bool kvm_mwait_in_guest(struct kvm *kvm)
 {
-	unsigned int eax, ebx, ecx, edx;
-
-	if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
-		return false;
-
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		/* All AMD CPUs have a working MWAIT implementation */
-		return true;
-	case X86_VENDOR_INTEL:
-		/* Handle Intel below */
-		break;
-	default:
-		return false;
-	}
-
-	/*
-	 * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
-	 * they would allow guest to stop the CPU completely by disabling
-	 * interrupts then invoking MWAIT.
-	 */
-	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
-		return false;
-
-	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
-
-	if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
-		return false;
-
-	return true;
+	return kvm->arch.mwait_in_guest;
 }
 
 #endif
-- 
2.3.1.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/3] KVM: Add capability to not exit on HLT
  2017-11-25 13:09 [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities Jan H. Schönherr
  2017-11-25 13:09 ` [PATCH 1/3] KVM: Don't enable MWAIT in guest by default Jan H. Schönherr
@ 2017-11-25 13:09 ` Jan H. Schönherr
  2017-11-27  1:32   ` Wanpeng Li
                     ` (2 more replies)
  2017-11-25 13:09 ` [PATCH 3/3] KVM: Add capability to not exit on PAUSE Jan H. Schönherr
       [not found] ` <a3c80a22-ff69-fa51-ea90-48f039eb449a@redhat.com>
  3 siblings, 3 replies; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-25 13:09 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Jan H. Schönherr, Joerg Roedel, Michael S. Tsirkin,
	KarimAllah Ahmed, kvm

If host CPUs are dedicated to a VM, we can avoid VM exits on HLT,
reducing the wake-up latency on posted interrupts.

This reintroduces a feature that has been there at some point --
see Linux 3.4 commit 10166744b80a ("KVM: VMX: remove yield_on_hlt")
for the removal -- but with the additional ability to enable it only
for selected VMs (and supporting SVM as well).

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
Note: AMD code paths are only compile tested
---
 Documentation/virtual/kvm/api.txt | 12 +++++++++++-
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/kvm/svm.c                |  3 ++-
 arch/x86/kvm/vmx.c                | 33 +++++++++++++++++++++++++++------
 arch/x86/kvm/x86.c                |  5 +++++
 arch/x86/kvm/x86.h                |  5 +++++
 include/uapi/linux/kvm.h          |  1 +
 7 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 0ee812c..c06bb41 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4172,7 +4172,17 @@ Returns: 0 on success
 This capability indicates that a guest using memory monitoring instructions
 (MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
 spent while a virtual CPU is halted in this way will then be accounted for as
-guest running time on the host (as opposed to e.g. HLT).
+guest running time on the host.
+
+7.14 KVM_CAP_X86_GUEST_HLT
+
+Architectures: x86
+Parameters: none
+Returns: 0 on success
+
+This capability indicates that a guest using HLT to stop a virtual CPU will not
+cause a VM exit. As such, time spent while a virtual CPU is halted in this way
+will then be accounted for as guest running time on the host.
 
 8. Other capabilities.
 ----------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f7bcfaa..3197c2d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -781,6 +781,7 @@ struct kvm_arch {
 
 	gpa_t wall_clock;
 
+	bool hlt_in_guest;
 	bool mwait_in_guest;
 
 	bool ept_identity_pagetable_done;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ef1b320..c135b98 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1236,7 +1236,6 @@ static void init_vmcb(struct vcpu_svm *svm)
 	set_intercept(svm, INTERCEPT_RDPMC);
 	set_intercept(svm, INTERCEPT_CPUID);
 	set_intercept(svm, INTERCEPT_INVD);
-	set_intercept(svm, INTERCEPT_HLT);
 	set_intercept(svm, INTERCEPT_INVLPG);
 	set_intercept(svm, INTERCEPT_INVLPGA);
 	set_intercept(svm, INTERCEPT_IOIO_PROT);
@@ -1257,6 +1256,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 		set_intercept(svm, INTERCEPT_MONITOR);
 		set_intercept(svm, INTERCEPT_MWAIT);
 	}
+	if (!kvm_hlt_in_guest(svm->vcpu.kvm))
+		set_intercept(svm, INTERCEPT_HLT);
 
 	control->iopm_base_pa = __sme_set(iopm_base);
 	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a067735..1b67433 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2446,6 +2446,25 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	vmx_set_interrupt_shadow(vcpu, 0);
 }
 
+static void vmx_set_intr_info(struct kvm_vcpu *vcpu, u32 intr)
+{
+	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
+
+	/*
+	 * Ensure that we clear the HLT state in the VMCS.  We don't need to
+	 * explicitly skip the instruction because if the HLT state is set, then
+	 * the instruction is already executing and RIP has already been
+	 * advanced.
+	 */
+	if (!kvm_hlt_in_guest(vcpu->kvm) || !(intr & INTR_INFO_VALID_MASK))
+		return;
+	if (is_external_interrupt(intr) || is_nmi(intr))
+		return;
+	if (vmcs_read32(GUEST_ACTIVITY_STATE) != GUEST_ACTIVITY_HLT)
+		return;
+	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
+}
+
 static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
 					       unsigned long exit_qual)
 {
@@ -2540,7 +2559,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 	} else
 		intr_info |= INTR_TYPE_HARD_EXCEPTION;
 
-	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
+	vmx_set_intr_info(vcpu, intr_info);
 }
 
 static bool vmx_rdtscp_supported(void)
@@ -5298,6 +5317,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
 	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
 		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
 				  CPU_BASED_MONITOR_EXITING);
+	if (kvm_hlt_in_guest(vmx->vcpu.kvm))
+		exec_control &= ~CPU_BASED_HLT_EXITING;
 	return exec_control;
 }
 
@@ -5635,7 +5656,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	setup_msrs(vmx);
 
-	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
+	vmx_set_intr_info(vcpu, 0);  /* 22.2.1 */
 
 	if (cpu_has_vmx_tpr_shadow() && !init_event) {
 		vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
@@ -5729,7 +5750,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
 			     vmx->vcpu.arch.event_exit_inst_len);
 	} else
 		intr |= INTR_TYPE_EXT_INTR;
-	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
+	vmx_set_intr_info(vcpu, intr);
 }
 
 static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
@@ -5758,8 +5779,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 		return;
 	}
 
-	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
-			INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
+	vmx_set_intr_info(vcpu, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK |
+				NMI_VECTOR);
 }
 
 static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
@@ -9301,7 +9322,7 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
 				  VM_ENTRY_INSTRUCTION_LEN,
 				  VM_ENTRY_EXCEPTION_ERROR_CODE);
 
-	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
+	vmx_set_intr_info(vcpu, 0);
 }
 
 static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe6627a..f17c520 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2755,6 +2755,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_BOOT_CPU_ID:
  	case KVM_CAP_SPLIT_IRQCHIP:
 	case KVM_CAP_IMMEDIATE_EXIT:
+	case KVM_CAP_X86_GUEST_HLT:
 		r = 1;
 		break;
 	case KVM_CAP_ADJUST_CLOCK:
@@ -4068,6 +4069,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			r = 0;
 		}
 		break;
+	case KVM_CAP_X86_GUEST_HLT:
+		kvm->arch.hlt_in_guest = true;
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index ed8e150..b2066aa 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -266,4 +266,9 @@ static inline bool kvm_mwait_in_guest(struct kvm *kvm)
 	return kvm->arch.mwait_in_guest;
 }
 
+static inline bool kvm_hlt_in_guest(struct kvm *kvm)
+{
+	return kvm->arch.hlt_in_guest;
+}
+
 #endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 282d7613..ff8f266 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_HYPERV_SYNIC2 148
 #define KVM_CAP_HYPERV_VP_INDEX 149
 #define KVM_CAP_S390_AIS_MIGRATION 150
+#define KVM_CAP_X86_GUEST_HLT 151
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.3.1.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/3] KVM: Add capability to not exit on PAUSE
  2017-11-25 13:09 [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities Jan H. Schönherr
  2017-11-25 13:09 ` [PATCH 1/3] KVM: Don't enable MWAIT in guest by default Jan H. Schönherr
  2017-11-25 13:09 ` [PATCH 2/3] KVM: Add capability to not exit on HLT Jan H. Schönherr
@ 2017-11-25 13:09 ` Jan H. Schönherr
  2017-11-27 20:48   ` Michael S. Tsirkin
  2017-11-28  3:37   ` Longpeng (Mike)
       [not found] ` <a3c80a22-ff69-fa51-ea90-48f039eb449a@redhat.com>
  3 siblings, 2 replies; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-25 13:09 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Jan H. Schönherr, Joerg Roedel, Michael S. Tsirkin,
	KarimAllah Ahmed, kvm

Allow to disable pause loop exit/pause filtering on a per VM basis.

If some VMs have dedicated host CPUs, they won't be negatively affected
due to needlessly intercepted PAUSE instructions.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
Note: AMD code paths are only compile tested
---
 Documentation/virtual/kvm/api.txt |  8 ++++++++
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/kvm/svm.c                |  3 ++-
 arch/x86/kvm/vmx.c                | 17 +++++++++++++----
 arch/x86/kvm/x86.c                |  5 +++++
 arch/x86/kvm/x86.h                |  5 +++++
 include/uapi/linux/kvm.h          |  1 +
 7 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index c06bb41..42a54d1 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4184,6 +4184,14 @@ This capability indicates that a guest using HLT to stop a virtual CPU will not
 cause a VM exit. As such, time spent while a virtual CPU is halted in this way
 will then be accounted for as guest running time on the host.
 
+7.15 KVM_CAP_X86_GUEST_PAUSE
+
+Architectures: x86
+Parameters: none
+Returns: 0 on success
+
+This capability indicates that a guest using PAUSE will not cause a VM exit.
+
 8. Other capabilities.
 ----------------------
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3197c2d..0d4ea32 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -782,6 +782,7 @@ struct kvm_arch {
 	gpa_t wall_clock;
 
 	bool hlt_in_guest;
+	bool pause_in_guest;
 	bool mwait_in_guest;
 
 	bool ept_identity_pagetable_done;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c135b98..a5eb60a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1314,7 +1314,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 	svm->nested.vmcb = 0;
 	svm->vcpu.arch.hflags = 0;
 
-	if (boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
+	if (boot_cpu_has(X86_FEATURE_PAUSEFILTER) &&
+	    !kvm_pause_in_guest(svm->vcpu.kvm)) {
 		control->pause_filter_count = 3000;
 		set_intercept(svm, INTERCEPT_PAUSE);
 	}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1b67433..5f8c33b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5352,7 +5352,7 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 	}
 	if (!enable_unrestricted_guest)
 		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
-	if (!ple_gap)
+	if (kvm_pause_in_guest(vmx->vcpu.kvm))
 		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 	if (!kvm_vcpu_apicv_active(vcpu))
 		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
@@ -5519,7 +5519,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
 	}
 
-	if (ple_gap) {
+	if (!kvm_pause_in_guest(vmx->vcpu.kvm)) {
 		vmcs_write32(PLE_GAP, ple_gap);
 		vmx->ple_window = ple_window;
 		vmx->ple_window_dirty = true;
@@ -6975,7 +6975,7 @@ static __exit void hardware_unsetup(void)
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
-	if (ple_gap)
+	if (!kvm_pause_in_guest(vcpu->kvm))
 		grow_ple_window(vcpu);
 
 	/*
@@ -9730,6 +9730,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	return ERR_PTR(err);
 }
 
+static int vmx_vm_init(struct kvm *kvm)
+{
+	if (!ple_gap)
+		kvm->arch.pause_in_guest = true;
+	return 0;
+}
+
 static void __init vmx_check_processor_compat(void *rtn)
 {
 	struct vmcs_config vmcs_conf;
@@ -11793,7 +11800,7 @@ static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
 
 static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
-	if (ple_gap)
+	if (!kvm_pause_in_guest(vcpu->kvm))
 		shrink_ple_window(vcpu);
 }
 
@@ -12152,6 +12159,8 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.cpu_has_accelerated_tpr = report_flexpriority,
 	.cpu_has_high_real_mode_segbase = vmx_has_high_real_mode_segbase,
 
+	.vm_init = vmx_vm_init,
+
 	.vcpu_create = vmx_create_vcpu,
 	.vcpu_free = vmx_free_vcpu,
 	.vcpu_reset = vmx_vcpu_reset,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f17c520..e13df00 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2756,6 +2756,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
  	case KVM_CAP_SPLIT_IRQCHIP:
 	case KVM_CAP_IMMEDIATE_EXIT:
 	case KVM_CAP_X86_GUEST_HLT:
+	case KVM_CAP_X86_GUEST_PAUSE:
 		r = 1;
 		break;
 	case KVM_CAP_ADJUST_CLOCK:
@@ -4073,6 +4074,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		kvm->arch.hlt_in_guest = true;
 		r = 0;
 		break;
+	case KVM_CAP_X86_GUEST_PAUSE:
+		kvm->arch.pause_in_guest = true;
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index b2066aa..56297c4 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -271,4 +271,9 @@ static inline bool kvm_hlt_in_guest(struct kvm *kvm)
 	return kvm->arch.hlt_in_guest;
 }
 
+static inline bool kvm_pause_in_guest(struct kvm *kvm)
+{
+	return kvm->arch.pause_in_guest;
+}
+
 #endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ff8f266..bc2b654 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -933,6 +933,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_HYPERV_VP_INDEX 149
 #define KVM_CAP_S390_AIS_MIGRATION 150
 #define KVM_CAP_X86_GUEST_HLT 151
+#define KVM_CAP_X86_GUEST_PAUSE 152
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.3.1.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
  2017-11-25 13:09 ` [PATCH 2/3] KVM: Add capability to not exit on HLT Jan H. Schönherr
@ 2017-11-27  1:32   ` Wanpeng Li
  2017-11-27  1:47     ` Wanpeng Li
       [not found]     ` <421c71fd-6dff-c01e-9e78-42f114711ea9@redhat.com>
       [not found]   ` <e17ea420-c141-18b6-2622-e33a3f540c61@redhat.com>
  2017-11-27 20:45   ` Michael S. Tsirkin
  2 siblings, 2 replies; 33+ messages in thread
From: Wanpeng Li @ 2017-11-27  1:32 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm

2017-11-25 21:09 GMT+08:00 Jan H. Schönherr <jschoenh@amazon.de>:
> If host CPUs are dedicated to a VM, we can avoid VM exits on HLT,
> reducing the wake-up latency on posted interrupts.
>
> This reintroduces a feature that has been there at some point --
> see Linux 3.4 commit 10166744b80a ("KVM: VMX: remove yield_on_hlt")
> for the removal -- but with the additional ability to enable it only
> for selected VMs (and supporting SVM as well).
>
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>

This patch results in guest hang during boot. Just the same when I
tried to avoid VM exits on HLT several months ago.
http://www.spinics.net/lists/kvm/msg152397.html

Regards,
Wanpeng Li

> ---
> Note: AMD code paths are only compile tested
> ---
>  Documentation/virtual/kvm/api.txt | 12 +++++++++++-
>  arch/x86/include/asm/kvm_host.h   |  1 +
>  arch/x86/kvm/svm.c                |  3 ++-
>  arch/x86/kvm/vmx.c                | 33 +++++++++++++++++++++++++++------
>  arch/x86/kvm/x86.c                |  5 +++++
>  arch/x86/kvm/x86.h                |  5 +++++
>  include/uapi/linux/kvm.h          |  1 +
>  7 files changed, 52 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 0ee812c..c06bb41 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4172,7 +4172,17 @@ Returns: 0 on success
>  This capability indicates that a guest using memory monitoring instructions
>  (MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
>  spent while a virtual CPU is halted in this way will then be accounted for as
> -guest running time on the host (as opposed to e.g. HLT).
> +guest running time on the host.
> +
> +7.14 KVM_CAP_X86_GUEST_HLT
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0 on success
> +
> +This capability indicates that a guest using HLT to stop a virtual CPU will not
> +cause a VM exit. As such, time spent while a virtual CPU is halted in this way
> +will then be accounted for as guest running time on the host.
>
>  8. Other capabilities.
>  ----------------------
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f7bcfaa..3197c2d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -781,6 +781,7 @@ struct kvm_arch {
>
>         gpa_t wall_clock;
>
> +       bool hlt_in_guest;
>         bool mwait_in_guest;
>
>         bool ept_identity_pagetable_done;
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ef1b320..c135b98 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1236,7 +1236,6 @@ static void init_vmcb(struct vcpu_svm *svm)
>         set_intercept(svm, INTERCEPT_RDPMC);
>         set_intercept(svm, INTERCEPT_CPUID);
>         set_intercept(svm, INTERCEPT_INVD);
> -       set_intercept(svm, INTERCEPT_HLT);
>         set_intercept(svm, INTERCEPT_INVLPG);
>         set_intercept(svm, INTERCEPT_INVLPGA);
>         set_intercept(svm, INTERCEPT_IOIO_PROT);
> @@ -1257,6 +1256,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>                 set_intercept(svm, INTERCEPT_MONITOR);
>                 set_intercept(svm, INTERCEPT_MWAIT);
>         }
> +       if (!kvm_hlt_in_guest(svm->vcpu.kvm))
> +               set_intercept(svm, INTERCEPT_HLT);
>
>         control->iopm_base_pa = __sme_set(iopm_base);
>         control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a067735..1b67433 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2446,6 +2446,25 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>         vmx_set_interrupt_shadow(vcpu, 0);
>  }
>
> +static void vmx_set_intr_info(struct kvm_vcpu *vcpu, u32 intr)
> +{
> +       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
> +
> +       /*
> +        * Ensure that we clear the HLT state in the VMCS.  We don't need to
> +        * explicitly skip the instruction because if the HLT state is set, then
> +        * the instruction is already executing and RIP has already been
> +        * advanced.
> +        */
> +       if (!kvm_hlt_in_guest(vcpu->kvm) || !(intr & INTR_INFO_VALID_MASK))
> +               return;
> +       if (is_external_interrupt(intr) || is_nmi(intr))
> +               return;
> +       if (vmcs_read32(GUEST_ACTIVITY_STATE) != GUEST_ACTIVITY_HLT)
> +               return;
> +       vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
> +}
> +
>  static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
>                                                unsigned long exit_qual)
>  {
> @@ -2540,7 +2559,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
>         } else
>                 intr_info |= INTR_TYPE_HARD_EXCEPTION;
>
> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
> +       vmx_set_intr_info(vcpu, intr_info);
>  }
>
>  static bool vmx_rdtscp_supported(void)
> @@ -5298,6 +5317,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
>         if (kvm_mwait_in_guest(vmx->vcpu.kvm))
>                 exec_control &= ~(CPU_BASED_MWAIT_EXITING |
>                                   CPU_BASED_MONITOR_EXITING);
> +       if (kvm_hlt_in_guest(vmx->vcpu.kvm))
> +               exec_control &= ~CPU_BASED_HLT_EXITING;
>         return exec_control;
>  }
>
> @@ -5635,7 +5656,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>
>         setup_msrs(vmx);
>
> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
> +       vmx_set_intr_info(vcpu, 0);  /* 22.2.1 */
>
>         if (cpu_has_vmx_tpr_shadow() && !init_event) {
>                 vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
> @@ -5729,7 +5750,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
>                              vmx->vcpu.arch.event_exit_inst_len);
>         } else
>                 intr |= INTR_TYPE_EXT_INTR;
> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
> +       vmx_set_intr_info(vcpu, intr);
>  }
>
>  static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
> @@ -5758,8 +5779,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>                 return;
>         }
>
> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> -                       INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
> +       vmx_set_intr_info(vcpu, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK |
> +                               NMI_VECTOR);
>  }
>
>  static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
> @@ -9301,7 +9322,7 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>                                   VM_ENTRY_INSTRUCTION_LEN,
>                                   VM_ENTRY_EXCEPTION_ERROR_CODE);
>
> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
> +       vmx_set_intr_info(vcpu, 0);
>  }
>
>  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fe6627a..f17c520 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2755,6 +2755,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>         case KVM_CAP_SET_BOOT_CPU_ID:
>         case KVM_CAP_SPLIT_IRQCHIP:
>         case KVM_CAP_IMMEDIATE_EXIT:
> +       case KVM_CAP_X86_GUEST_HLT:
>                 r = 1;
>                 break;
>         case KVM_CAP_ADJUST_CLOCK:
> @@ -4068,6 +4069,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                         r = 0;
>                 }
>                 break;
> +       case KVM_CAP_X86_GUEST_HLT:
> +               kvm->arch.hlt_in_guest = true;
> +               r = 0;
> +               break;
>         default:
>                 r = -EINVAL;
>                 break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index ed8e150..b2066aa 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -266,4 +266,9 @@ static inline bool kvm_mwait_in_guest(struct kvm *kvm)
>         return kvm->arch.mwait_in_guest;
>  }
>
> +static inline bool kvm_hlt_in_guest(struct kvm *kvm)
> +{
> +       return kvm->arch.hlt_in_guest;
> +}
> +
>  #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 282d7613..ff8f266 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_HYPERV_SYNIC2 148
>  #define KVM_CAP_HYPERV_VP_INDEX 149
>  #define KVM_CAP_S390_AIS_MIGRATION 150
> +#define KVM_CAP_X86_GUEST_HLT 151
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.3.1.dirty
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
  2017-11-27  1:32   ` Wanpeng Li
@ 2017-11-27  1:47     ` Wanpeng Li
       [not found]       ` <a2f4cf7f-5d7b-a1cc-30d5-d18df4d49173@redhat.com>
       [not found]     ` <421c71fd-6dff-c01e-9e78-42f114711ea9@redhat.com>
  1 sibling, 1 reply; 33+ messages in thread
From: Wanpeng Li @ 2017-11-27  1:47 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm,
	Eduardo Valentin

Cc Eduardo,
2017-11-27 9:32 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2017-11-25 21:09 GMT+08:00 Jan H. Schönherr <jschoenh@amazon.de>:
>> If host CPUs are dedicated to a VM, we can avoid VM exits on HLT,
>> reducing the wake-up latency on posted interrupts.
>>
>> This reintroduces a feature that has been there at some point --
>> see Linux 3.4 commit 10166744b80a ("KVM: VMX: remove yield_on_hlt")
>> for the removal -- but with the additional ability to enable it only
>> for selected VMs (and supporting SVM as well).
>>
>> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
>
> This patch results in guest hang during boot. Just the same when I
> tried to avoid VM exits on HLT several months ago.
> http://www.spinics.net/lists/kvm/msg152397.html

https://lkml.org/lkml/2017/11/6/863 The PV_DEDICATED will use the
qspinlock instead of the pvspinlock and the same scenario as "host
CPUs are dedicated to a VM". I think this can workaround the issue
which I mentioned.

>
> Regards,
> Wanpeng Li
>
>> ---
>> Note: AMD code paths are only compile tested
>> ---
>>  Documentation/virtual/kvm/api.txt | 12 +++++++++++-
>>  arch/x86/include/asm/kvm_host.h   |  1 +
>>  arch/x86/kvm/svm.c                |  3 ++-
>>  arch/x86/kvm/vmx.c                | 33 +++++++++++++++++++++++++++------
>>  arch/x86/kvm/x86.c                |  5 +++++
>>  arch/x86/kvm/x86.h                |  5 +++++
>>  include/uapi/linux/kvm.h          |  1 +
>>  7 files changed, 52 insertions(+), 8 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 0ee812c..c06bb41 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4172,7 +4172,17 @@ Returns: 0 on success
>>  This capability indicates that a guest using memory monitoring instructions
>>  (MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
>>  spent while a virtual CPU is halted in this way will then be accounted for as
>> -guest running time on the host (as opposed to e.g. HLT).
>> +guest running time on the host.
>> +
>> +7.14 KVM_CAP_X86_GUEST_HLT
>> +
>> +Architectures: x86
>> +Parameters: none
>> +Returns: 0 on success
>> +
>> +This capability indicates that a guest using HLT to stop a virtual CPU will not
>> +cause a VM exit. As such, time spent while a virtual CPU is halted in this way
>> +will then be accounted for as guest running time on the host.
>>
>>  8. Other capabilities.
>>  ----------------------
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index f7bcfaa..3197c2d 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -781,6 +781,7 @@ struct kvm_arch {
>>
>>         gpa_t wall_clock;
>>
>> +       bool hlt_in_guest;
>>         bool mwait_in_guest;
>>
>>         bool ept_identity_pagetable_done;
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index ef1b320..c135b98 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -1236,7 +1236,6 @@ static void init_vmcb(struct vcpu_svm *svm)
>>         set_intercept(svm, INTERCEPT_RDPMC);
>>         set_intercept(svm, INTERCEPT_CPUID);
>>         set_intercept(svm, INTERCEPT_INVD);
>> -       set_intercept(svm, INTERCEPT_HLT);
>>         set_intercept(svm, INTERCEPT_INVLPG);
>>         set_intercept(svm, INTERCEPT_INVLPGA);
>>         set_intercept(svm, INTERCEPT_IOIO_PROT);
>> @@ -1257,6 +1256,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>>                 set_intercept(svm, INTERCEPT_MONITOR);
>>                 set_intercept(svm, INTERCEPT_MWAIT);
>>         }
>> +       if (!kvm_hlt_in_guest(svm->vcpu.kvm))
>> +               set_intercept(svm, INTERCEPT_HLT);
>>
>>         control->iopm_base_pa = __sme_set(iopm_base);
>>         control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index a067735..1b67433 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2446,6 +2446,25 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>>         vmx_set_interrupt_shadow(vcpu, 0);
>>  }
>>
>> +static void vmx_set_intr_info(struct kvm_vcpu *vcpu, u32 intr)
>> +{
>> +       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
>> +
>> +       /*
>> +        * Ensure that we clear the HLT state in the VMCS.  We don't need to
>> +        * explicitly skip the instruction because if the HLT state is set, then
>> +        * the instruction is already executing and RIP has already been
>> +        * advanced.
>> +        */
>> +       if (!kvm_hlt_in_guest(vcpu->kvm) || !(intr & INTR_INFO_VALID_MASK))
>> +               return;
>> +       if (is_external_interrupt(intr) || is_nmi(intr))
>> +               return;
>> +       if (vmcs_read32(GUEST_ACTIVITY_STATE) != GUEST_ACTIVITY_HLT)
>> +               return;
>> +       vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
>> +}
>> +
>>  static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
>>                                                unsigned long exit_qual)
>>  {
>> @@ -2540,7 +2559,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
>>         } else
>>                 intr_info |= INTR_TYPE_HARD_EXCEPTION;
>>
>> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
>> +       vmx_set_intr_info(vcpu, intr_info);
>>  }
>>
>>  static bool vmx_rdtscp_supported(void)
>> @@ -5298,6 +5317,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
>>         if (kvm_mwait_in_guest(vmx->vcpu.kvm))
>>                 exec_control &= ~(CPU_BASED_MWAIT_EXITING |
>>                                   CPU_BASED_MONITOR_EXITING);
>> +       if (kvm_hlt_in_guest(vmx->vcpu.kvm))
>> +               exec_control &= ~CPU_BASED_HLT_EXITING;
>>         return exec_control;
>>  }
>>
>> @@ -5635,7 +5656,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>
>>         setup_msrs(vmx);
>>
>> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
>> +       vmx_set_intr_info(vcpu, 0);  /* 22.2.1 */
>>
>>         if (cpu_has_vmx_tpr_shadow() && !init_event) {
>>                 vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
>> @@ -5729,7 +5750,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
>>                              vmx->vcpu.arch.event_exit_inst_len);
>>         } else
>>                 intr |= INTR_TYPE_EXT_INTR;
>> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
>> +       vmx_set_intr_info(vcpu, intr);
>>  }
>>
>>  static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>> @@ -5758,8 +5779,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>>                 return;
>>         }
>>
>> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>> -                       INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
>> +       vmx_set_intr_info(vcpu, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK |
>> +                               NMI_VECTOR);
>>  }
>>
>>  static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
>> @@ -9301,7 +9322,7 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>>                                   VM_ENTRY_INSTRUCTION_LEN,
>>                                   VM_ENTRY_EXCEPTION_ERROR_CODE);
>>
>> -       vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
>> +       vmx_set_intr_info(vcpu, 0);
>>  }
>>
>>  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index fe6627a..f17c520 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2755,6 +2755,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>         case KVM_CAP_SET_BOOT_CPU_ID:
>>         case KVM_CAP_SPLIT_IRQCHIP:
>>         case KVM_CAP_IMMEDIATE_EXIT:
>> +       case KVM_CAP_X86_GUEST_HLT:
>>                 r = 1;
>>                 break;
>>         case KVM_CAP_ADJUST_CLOCK:
>> @@ -4068,6 +4069,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>                         r = 0;
>>                 }
>>                 break;
>> +       case KVM_CAP_X86_GUEST_HLT:
>> +               kvm->arch.hlt_in_guest = true;
>> +               r = 0;
>> +               break;
>>         default:
>>                 r = -EINVAL;
>>                 break;
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index ed8e150..b2066aa 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -266,4 +266,9 @@ static inline bool kvm_mwait_in_guest(struct kvm *kvm)
>>         return kvm->arch.mwait_in_guest;
>>  }
>>
>> +static inline bool kvm_hlt_in_guest(struct kvm *kvm)
>> +{
>> +       return kvm->arch.hlt_in_guest;
>> +}
>> +
>>  #endif
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 282d7613..ff8f266 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_HYPERV_SYNIC2 148
>>  #define KVM_CAP_HYPERV_VP_INDEX 149
>>  #define KVM_CAP_S390_AIS_MIGRATION 150
>> +#define KVM_CAP_X86_GUEST_HLT 151
>>
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>
>> --
>> 2.3.1.dirty
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
       [not found]       ` <a2f4cf7f-5d7b-a1cc-30d5-d18df4d49173@redhat.com>
@ 2017-11-27 12:29         ` Jan H. Schönherr
  0 siblings, 0 replies; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-27 12:29 UTC (permalink / raw)
  To: Paolo Bonzini, Wanpeng Li
  Cc: Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm,
	Eduardo Valentin

On 11/27/2017 12:40 PM, Paolo Bonzini wrote:
> Only on new guests that know PV_DEDICATED, which makes it a bit flaky.
> I'd rather document that you should not enable KVM_FEATURE_PV_UNHALT if
> you block HLT.

Yes, PV_UNHALT and disabled HLT exiting shouldn't be combined.
I'll add that to the documentation.

Regards
Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
       [not found]     ` <421c71fd-6dff-c01e-9e78-42f114711ea9@redhat.com>
@ 2017-11-27 15:27       ` Jan H. Schönherr
  0 siblings, 0 replies; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-27 15:27 UTC (permalink / raw)
  To: Paolo Bonzini, Wanpeng Li
  Cc: Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm

On 11/27/2017 12:56 PM, Paolo Bonzini wrote:
> On 27/11/2017 02:32, Wanpeng Li wrote:
>> 2017-11-25 21:09 GMT+08:00 Jan H. Schönherr <jschoenh@amazon.de>:
>>> If host CPUs are dedicated to a VM, we can avoid VM exits on HLT,
>>> reducing the wake-up latency on posted interrupts.
>>>
>>> This reintroduces a feature that has been there at some point --
>>> see Linux 3.4 commit 10166744b80a ("KVM: VMX: remove yield_on_hlt")
>>> for the removal -- but with the additional ability to enable it only
>>> for selected VMs (and supporting SVM as well).
>>>
>>> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
>>
>> This patch results in guest hang during boot. Just the same when I
>> tried to avoid VM exits on HLT several months ago.
>> http://www.spinics.net/lists/kvm/msg152397.html

I missed that thread. Thanks for the pointer. The extra to take away
from that conversation is the errata Paolo pointed out below as well,
and Radim's comment about cancel_injection().

Do we have to handle the cancel_injection() case in some way?

> There is also errata #415 in family 0x10.
> 
> Can you change kvm_mwait_in_guest() to a kvm_x86_ops hook
> kvm_x86_can_disable_exits, so that Intel can use it to force mwait and
> AMD can use it to force hlt?
> 
> From http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf,
> bit 2 of MSR_AMD64_OSVW_STATUS tells you that you have erratum #415.

I'll have a look at this for v2.

Regards
Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
       [not found]   ` <e17ea420-c141-18b6-2622-e33a3f540c61@redhat.com>
@ 2017-11-27 16:12     ` Jan H. Schönherr
  0 siblings, 0 replies; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-27 16:12 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm

On 11/27/2017 12:37 PM, Paolo Bonzini wrote:
> On 25/11/2017 14:09, Jan H. Schönherr wrote:
>> +static void vmx_set_intr_info(struct kvm_vcpu *vcpu, u32 intr)
>> +{
>> +	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
> 
> Would you mind introducing vmx_set_intr_info in a separate patch?

No problem.

>> +	/*
>> +	 * Ensure that we clear the HLT state in the VMCS.  We don't need to
>> +	 * explicitly skip the instruction because if the HLT state is set, then
>> +	 * the instruction is already executing and RIP has already been
>> +	 * advanced.
>> +	 */
>> +	if (!kvm_hlt_in_guest(vcpu->kvm) || !(intr & INTR_INFO_VALID_MASK))
>> +		return;
> 
> I think this is unnecessary; IMO you can just do
> 
> 	if (intr & INTR_INFO_VALID_MASK)
> 		vmcs_write32(GUEST_ACTIVITY_STATE,
> 			     GUEST_ACTIVITY_ACTIVE);
> 
> after the write to VM_ENTRY_INTR_INFO_FIELD.  This is because:
> 
> - if !kvm_hlt_in_guest(vcpu->kvm), the activity state is always "active"

If we're intercepting HLT, the activity state is always active and there
is no need to overwrite it in the first place. That's why it is there
(and should stay there, in my opinion).

> - a vmread is almost as expensive as a vmwrite, so the vmread+vmwrite is
> almost never cheaper especially in the case where guest activity is
> "hlt" (which is also when sensitivity to latency is highest)

So, just blindy overwrite. But only in cases, when it actually could
be "halted"... will do -- though see comment at the bottom.

>> +	if (is_external_interrupt(intr) || is_nmi(intr))
>> +		return;
> 
> What is this for?

This avoids the vmwrite, when INTR_INFO is set to something, that is able
to get the vCPU out of "halted" in hardware. There is one more case
in the SDM (MTF VM Exit) in addition to that, but I thought, this covers
the most likely ones.

> 
> Thanks,
> 
> Paolo
> 
>> +	if (vmcs_read32(GUEST_ACTIVITY_STATE) != GUEST_ACTIVITY_HLT)
>> +		return;

Will drop this "if" as result of the discussion above.

Do we care about a potential future support for other activity states
(shutdown, wait-for-sipi) at this point? In that case, the "if" may
still be needed.

Regards
Jan

>> +	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
>> +}
>> +

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
  2017-11-25 13:09 ` [PATCH 1/3] KVM: Don't enable MWAIT in guest by default Jan H. Schönherr
@ 2017-11-27 18:13   ` Jim Mattson
       [not found]     ` <82e1f7c8-fdd6-835e-319a-bec72d771ef9@redhat.com>
  2017-11-28 23:58     ` Jan H. Schönherr
  2017-11-27 20:46   ` Michael S. Tsirkin
  2017-11-27 20:50   ` Michael S. Tsirkin
  2 siblings, 2 replies; 33+ messages in thread
From: Jim Mattson @ 2017-11-27 18:13 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm list

I don't understand the concern regarding CPUID5_ECX_INTERRUPT_BREAK.
Even if the CPU has this feature, can't the guest bypass it by
disabling interrupts and invoking MWAIT with bit 0 of ECX clear?

On Sat, Nov 25, 2017 at 5:09 AM, Jan H. Schönherr <jschoenh@amazon.de> wrote:
> Allowing a guest to execute MWAIT without interception enables a guest
> to put a (physical) CPU into a power saving state, where it takes
> longer to return from than what may be desired by the host.
>
> Don't give a guest that power over a host by default. (Especially,
> since nothing prevents a guest from using MWAIT even when it is not
> advertised via CPUID.)
>
> This restores the behavior from before Linux 4.12 commit 668fffa3f838
> ("kvm: better MWAIT emulation for guests") but keeps the option to
> enable MWAIT in guest for individual VMs.
>
> Suggested-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> ---
> Note: AMD code paths are only compile tested
> ---
>  Documentation/virtual/kvm/api.txt | 20 ++++++++++--------
>  arch/x86/include/asm/kvm_host.h   |  2 ++
>  arch/x86/kvm/svm.c                |  2 +-
>  arch/x86/kvm/vmx.c                |  9 ++++----
>  arch/x86/kvm/x86.c                | 44 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.h                | 35 ++-----------------------------
>  6 files changed, 64 insertions(+), 48 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index f670e4b..0ee812c 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4163,6 +4163,17 @@ enables QEMU to build error log and branch to guest kernel registered
>  machine check handling routine. Without this capability KVM will
>  branch to guests' 0x200 interrupt vector.
>
> +7.13 KVM_CAP_X86_GUEST_MWAIT
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0 on success
> +
> +This capability indicates that a guest using memory monitoring instructions
> +(MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
> +spent while a virtual CPU is halted in this way will then be accounted for as
> +guest running time on the host (as opposed to e.g. HLT).
> +
>  8. Other capabilities.
>  ----------------------
>
> @@ -4275,15 +4286,6 @@ reserved.
>      Both registers and addresses are 64-bits wide.
>      It will be possible to run 64-bit or 32-bit guest code.
>
> -8.8 KVM_CAP_X86_GUEST_MWAIT
> -
> -Architectures: x86
> -
> -This capability indicates that guest using memory monotoring instructions
> -(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit.  As such time
> -spent while virtual CPU is halted in this way will then be accounted for as
> -guest running time on the host (as opposed to e.g. HLT).
> -
>  8.9 KVM_CAP_ARM_USER_IRQ
>
>  Architectures: arm, arm64
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b97726e..f7bcfaa 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -781,6 +781,8 @@ struct kvm_arch {
>
>         gpa_t wall_clock;
>
> +       bool mwait_in_guest;
> +
>         bool ept_identity_pagetable_done;
>         gpa_t ept_identity_map_addr;
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1f3e7f2..ef1b320 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1253,7 +1253,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>         set_intercept(svm, INTERCEPT_WBINVD);
>         set_intercept(svm, INTERCEPT_XSETBV);
>
> -       if (!kvm_mwait_in_guest()) {
> +       if (!kvm_mwait_in_guest(svm->vcpu.kvm)) {
>                 set_intercept(svm, INTERCEPT_MONITOR);
>                 set_intercept(svm, INTERCEPT_MWAIT);
>         }
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1eb7053..a067735 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3635,13 +3635,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>               CPU_BASED_USE_IO_BITMAPS |
>               CPU_BASED_MOV_DR_EXITING |
>               CPU_BASED_USE_TSC_OFFSETING |
> +             CPU_BASED_MWAIT_EXITING |
> +             CPU_BASED_MONITOR_EXITING |
>               CPU_BASED_INVLPG_EXITING |
>               CPU_BASED_RDPMC_EXITING;
>
> -       if (!kvm_mwait_in_guest())
> -               min |= CPU_BASED_MWAIT_EXITING |
> -                       CPU_BASED_MONITOR_EXITING;
> -
>         opt = CPU_BASED_TPR_SHADOW |
>               CPU_BASED_USE_MSR_BITMAPS |
>               CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> @@ -5297,6 +5295,9 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
>                 exec_control |= CPU_BASED_CR3_STORE_EXITING |
>                                 CPU_BASED_CR3_LOAD_EXITING  |
>                                 CPU_BASED_INVLPG_EXITING;
> +       if (kvm_mwait_in_guest(vmx->vcpu.kvm))
> +               exec_control &= ~(CPU_BASED_MWAIT_EXITING |
> +                                 CPU_BASED_MONITOR_EXITING);
>         return exec_control;
>  }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 985a305..fe6627a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -67,6 +67,7 @@
>  #include <asm/pvclock.h>
>  #include <asm/div64.h>
>  #include <asm/irq_remapping.h>
> +#include <asm/mwait.h>
>
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
> @@ -2672,6 +2673,40 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
>         return r;
>  }
>
> +static bool kvm_mwait_in_guest_possible(void)
> +{
> +       unsigned int eax, ebx, ecx, edx;
> +
> +       if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
> +               return false;
> +
> +       switch (boot_cpu_data.x86_vendor) {
> +       case X86_VENDOR_AMD:
> +               /* All AMD CPUs have a working MWAIT implementation */
> +               return true;
> +       case X86_VENDOR_INTEL:
> +               /* Handle Intel below */
> +               break;
> +       default:
> +               return false;
> +       }
> +
> +       /*
> +        * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
> +        * they would allow guest to stop the CPU completely by disabling
> +        * interrupts then invoking MWAIT.
> +        */
> +       if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> +               return false;
> +
> +       cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> +
> +       if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
> +               return false;
> +
> +       return true;
> +}
> +
>  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  {
>         int r;
> @@ -2726,7 +2761,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>                 r = KVM_CLOCK_TSC_STABLE;
>                 break;
>         case KVM_CAP_X86_GUEST_MWAIT:
> -               r = kvm_mwait_in_guest();
> +               r = kvm_mwait_in_guest_possible();
>                 break;
>         case KVM_CAP_X86_SMM:
>                 /* SMBASE is usually relocated above 1M on modern chipsets,
> @@ -4026,6 +4061,13 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>
>                 r = 0;
>                 break;
> +       case KVM_CAP_X86_GUEST_MWAIT:
> +               r = -EINVAL;
> +               if (kvm_mwait_in_guest_possible()) {
> +                       kvm->arch.mwait_in_guest = true;
> +                       r = 0;
> +               }
> +               break;
>         default:
>                 r = -EINVAL;
>                 break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index d0b95b7..ed8e150 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -2,8 +2,6 @@
>  #ifndef ARCH_X86_KVM_X86_H
>  #define ARCH_X86_KVM_X86_H
>
> -#include <asm/processor.h>
> -#include <asm/mwait.h>
>  #include <linux/kvm_host.h>
>  #include <asm/pvclock.h>
>  #include "kvm_cache_regs.h"
> @@ -263,38 +261,9 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
>             __rem;                                              \
>          })
>
> -static inline bool kvm_mwait_in_guest(void)
> +static inline bool kvm_mwait_in_guest(struct kvm *kvm)
>  {
> -       unsigned int eax, ebx, ecx, edx;
> -
> -       if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
> -               return false;
> -
> -       switch (boot_cpu_data.x86_vendor) {
> -       case X86_VENDOR_AMD:
> -               /* All AMD CPUs have a working MWAIT implementation */
> -               return true;
> -       case X86_VENDOR_INTEL:
> -               /* Handle Intel below */
> -               break;
> -       default:
> -               return false;
> -       }
> -
> -       /*
> -        * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
> -        * they would allow guest to stop the CPU completely by disabling
> -        * interrupts then invoking MWAIT.
> -        */
> -       if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> -               return false;
> -
> -       cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> -
> -       if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
> -               return false;
> -
> -       return true;
> +       return kvm->arch.mwait_in_guest;
>  }
>
>  #endif
> --
> 2.3.1.dirty
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
       [not found]     ` <82e1f7c8-fdd6-835e-319a-bec72d771ef9@redhat.com>
@ 2017-11-27 18:32       ` Jim Mattson
  0 siblings, 0 replies; 33+ messages in thread
From: Jim Mattson @ 2017-11-27 18:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan H. Schönherr, Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm list

Right. The guest's RFLAGS.IF may be clear, but "an external interrupt
causes a VM exit if the 'external-interrupt exiting' VM-execution
control is 1."

On Mon, Nov 27, 2017 at 10:27 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 27/11/2017 19:13, Jim Mattson wrote:
>> I don't understand the concern regarding CPUID5_ECX_INTERRUPT_BREAK.
>> Even if the CPU has this feature, can't the guest bypass it by
>> disabling interrupts and invoking MWAIT with bit 0 of ECX clear?
>
> Hmm, you're right.  In fact, an external interrupt should cause a VM
> exit if the “external-interrupt exiting” VM-execution control is 1, even
> if the VM is executing MWAIT.
>
> We should write a unit test for that though.
>
> Thanks,
>
> Paolo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
  2017-11-25 13:09 ` [PATCH 2/3] KVM: Add capability to not exit on HLT Jan H. Schönherr
  2017-11-27  1:32   ` Wanpeng Li
       [not found]   ` <e17ea420-c141-18b6-2622-e33a3f540c61@redhat.com>
@ 2017-11-27 20:45   ` Michael S. Tsirkin
       [not found]     ` <8ce45bad-b43c-4e97-aa69-74d7fc9cecb5@redhat.com>
  2 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-27 20:45 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Sat, Nov 25, 2017 at 02:09:32PM +0100, Jan H. Schönherr wrote:
> If host CPUs are dedicated to a VM, we can avoid VM exits on HLT,
> reducing the wake-up latency on posted interrupts.
> 
> This reintroduces a feature that has been there at some point --
> see Linux 3.4 commit 10166744b80a ("KVM: VMX: remove yield_on_hlt")
> for the removal -- but with the additional ability to enable it only
> for selected VMs (and supporting SVM as well).
> 
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>


If you are going to do this, why not expose mwait
in the cpuid thus making guests use mwait to halt?

What are the advantages of doing this using halt specifically?



> ---
> Note: AMD code paths are only compile tested
> ---
>  Documentation/virtual/kvm/api.txt | 12 +++++++++++-
>  arch/x86/include/asm/kvm_host.h   |  1 +
>  arch/x86/kvm/svm.c                |  3 ++-
>  arch/x86/kvm/vmx.c                | 33 +++++++++++++++++++++++++++------
>  arch/x86/kvm/x86.c                |  5 +++++
>  arch/x86/kvm/x86.h                |  5 +++++
>  include/uapi/linux/kvm.h          |  1 +
>  7 files changed, 52 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 0ee812c..c06bb41 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4172,7 +4172,17 @@ Returns: 0 on success
>  This capability indicates that a guest using memory monitoring instructions
>  (MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
>  spent while a virtual CPU is halted in this way will then be accounted for as
> -guest running time on the host (as opposed to e.g. HLT).
> +guest running time on the host.
> +
> +7.14 KVM_CAP_X86_GUEST_HLT
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0 on success
> +
> +This capability indicates that a guest using HLT to stop a virtual CPU will not
> +cause a VM exit. As such, time spent while a virtual CPU is halted in this way
> +will then be accounted for as guest running time on the host.
>  
>  8. Other capabilities.
>  ----------------------
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f7bcfaa..3197c2d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -781,6 +781,7 @@ struct kvm_arch {
>  
>  	gpa_t wall_clock;
>  
> +	bool hlt_in_guest;
>  	bool mwait_in_guest;
>  
>  	bool ept_identity_pagetable_done;
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ef1b320..c135b98 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1236,7 +1236,6 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	set_intercept(svm, INTERCEPT_RDPMC);
>  	set_intercept(svm, INTERCEPT_CPUID);
>  	set_intercept(svm, INTERCEPT_INVD);
> -	set_intercept(svm, INTERCEPT_HLT);
>  	set_intercept(svm, INTERCEPT_INVLPG);
>  	set_intercept(svm, INTERCEPT_INVLPGA);
>  	set_intercept(svm, INTERCEPT_IOIO_PROT);
> @@ -1257,6 +1256,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>  		set_intercept(svm, INTERCEPT_MONITOR);
>  		set_intercept(svm, INTERCEPT_MWAIT);
>  	}
> +	if (!kvm_hlt_in_guest(svm->vcpu.kvm))
> +		set_intercept(svm, INTERCEPT_HLT);
>  
>  	control->iopm_base_pa = __sme_set(iopm_base);
>  	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a067735..1b67433 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2446,6 +2446,25 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  	vmx_set_interrupt_shadow(vcpu, 0);
>  }
>  
> +static void vmx_set_intr_info(struct kvm_vcpu *vcpu, u32 intr)
> +{
> +	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
> +
> +	/*
> +	 * Ensure that we clear the HLT state in the VMCS.  We don't need to
> +	 * explicitly skip the instruction because if the HLT state is set, then
> +	 * the instruction is already executing and RIP has already been
> +	 * advanced.
> +	 */
> +	if (!kvm_hlt_in_guest(vcpu->kvm) || !(intr & INTR_INFO_VALID_MASK))
> +		return;
> +	if (is_external_interrupt(intr) || is_nmi(intr))
> +		return;
> +	if (vmcs_read32(GUEST_ACTIVITY_STATE) != GUEST_ACTIVITY_HLT)
> +		return;
> +	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
> +}
> +
>  static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
>  					       unsigned long exit_qual)
>  {
> @@ -2540,7 +2559,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
>  	} else
>  		intr_info |= INTR_TYPE_HARD_EXCEPTION;
>  
> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
> +	vmx_set_intr_info(vcpu, intr_info);
>  }
>  
>  static bool vmx_rdtscp_supported(void)
> @@ -5298,6 +5317,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
>  	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
>  		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
>  				  CPU_BASED_MONITOR_EXITING);
> +	if (kvm_hlt_in_guest(vmx->vcpu.kvm))
> +		exec_control &= ~CPU_BASED_HLT_EXITING;
>  	return exec_control;
>  }
>  
> @@ -5635,7 +5656,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  
>  	setup_msrs(vmx);
>  
> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
> +	vmx_set_intr_info(vcpu, 0);  /* 22.2.1 */
>  
>  	if (cpu_has_vmx_tpr_shadow() && !init_event) {
>  		vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
> @@ -5729,7 +5750,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
>  			     vmx->vcpu.arch.event_exit_inst_len);
>  	} else
>  		intr |= INTR_TYPE_EXT_INTR;
> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
> +	vmx_set_intr_info(vcpu, intr);
>  }
>  
>  static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
> @@ -5758,8 +5779,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>  		return;
>  	}
>  
> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> -			INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
> +	vmx_set_intr_info(vcpu, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK |
> +				NMI_VECTOR);
>  }
>  
>  static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
> @@ -9301,7 +9322,7 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>  				  VM_ENTRY_INSTRUCTION_LEN,
>  				  VM_ENTRY_EXCEPTION_ERROR_CODE);
>  
> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
> +	vmx_set_intr_info(vcpu, 0);
>  }
>  
>  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fe6627a..f17c520 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2755,6 +2755,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_SET_BOOT_CPU_ID:
>   	case KVM_CAP_SPLIT_IRQCHIP:
>  	case KVM_CAP_IMMEDIATE_EXIT:
> +	case KVM_CAP_X86_GUEST_HLT:
>  		r = 1;
>  		break;
>  	case KVM_CAP_ADJUST_CLOCK:
> @@ -4068,6 +4069,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  			r = 0;
>  		}
>  		break;
> +	case KVM_CAP_X86_GUEST_HLT:
> +		kvm->arch.hlt_in_guest = true;
> +		r = 0;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index ed8e150..b2066aa 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -266,4 +266,9 @@ static inline bool kvm_mwait_in_guest(struct kvm *kvm)
>  	return kvm->arch.mwait_in_guest;
>  }
>  
> +static inline bool kvm_hlt_in_guest(struct kvm *kvm)
> +{
> +	return kvm->arch.hlt_in_guest;
> +}
> +
>  #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 282d7613..ff8f266 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_HYPERV_SYNIC2 148
>  #define KVM_CAP_HYPERV_VP_INDEX 149
>  #define KVM_CAP_S390_AIS_MIGRATION 150
> +#define KVM_CAP_X86_GUEST_HLT 151
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.3.1.dirty

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
  2017-11-25 13:09 ` [PATCH 1/3] KVM: Don't enable MWAIT in guest by default Jan H. Schönherr
  2017-11-27 18:13   ` Jim Mattson
@ 2017-11-27 20:46   ` Michael S. Tsirkin
  2017-11-27 22:36     ` Jan H. Schönherr
  2017-11-27 20:50   ` Michael S. Tsirkin
  2 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-27 20:46 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Sat, Nov 25, 2017 at 02:09:31PM +0100, Jan H. Schönherr wrote:
> Allowing a guest to execute MWAIT without interception enables a guest
> to put a (physical) CPU into a power saving state, where it takes
> longer to return from than what may be desired by the host.
> 
> Don't give a guest that power over a host by default. (Especially,
> since nothing prevents a guest from using MWAIT even when it is not
> advertised via CPUID.)
> 
> This restores the behavior from before Linux 4.12 commit 668fffa3f838
> ("kvm: better MWAIT emulation for guests") but keeps the option to
> enable MWAIT in guest for individual VMs.

As others pointed out, an interrupt will wake up the host CPU anyway.

Given that, what's the actual motivation here?

> Suggested-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> ---
> Note: AMD code paths are only compile tested
> ---
>  Documentation/virtual/kvm/api.txt | 20 ++++++++++--------
>  arch/x86/include/asm/kvm_host.h   |  2 ++
>  arch/x86/kvm/svm.c                |  2 +-
>  arch/x86/kvm/vmx.c                |  9 ++++----
>  arch/x86/kvm/x86.c                | 44 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.h                | 35 ++-----------------------------
>  6 files changed, 64 insertions(+), 48 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index f670e4b..0ee812c 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4163,6 +4163,17 @@ enables QEMU to build error log and branch to guest kernel registered
>  machine check handling routine. Without this capability KVM will
>  branch to guests' 0x200 interrupt vector.
>  
> +7.13 KVM_CAP_X86_GUEST_MWAIT
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0 on success
> +
> +This capability indicates that a guest using memory monitoring instructions
> +(MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
> +spent while a virtual CPU is halted in this way will then be accounted for as
> +guest running time on the host (as opposed to e.g. HLT).
> +
>  8. Other capabilities.
>  ----------------------
>  
> @@ -4275,15 +4286,6 @@ reserved.
>      Both registers and addresses are 64-bits wide.
>      It will be possible to run 64-bit or 32-bit guest code.
>  
> -8.8 KVM_CAP_X86_GUEST_MWAIT
> -
> -Architectures: x86
> -
> -This capability indicates that guest using memory monotoring instructions
> -(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit.  As such time
> -spent while virtual CPU is halted in this way will then be accounted for as
> -guest running time on the host (as opposed to e.g. HLT).
> -
>  8.9 KVM_CAP_ARM_USER_IRQ
>  
>  Architectures: arm, arm64
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b97726e..f7bcfaa 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -781,6 +781,8 @@ struct kvm_arch {
>  
>  	gpa_t wall_clock;
>  
> +	bool mwait_in_guest;
> +
>  	bool ept_identity_pagetable_done;
>  	gpa_t ept_identity_map_addr;
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1f3e7f2..ef1b320 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1253,7 +1253,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	set_intercept(svm, INTERCEPT_WBINVD);
>  	set_intercept(svm, INTERCEPT_XSETBV);
>  
> -	if (!kvm_mwait_in_guest()) {
> +	if (!kvm_mwait_in_guest(svm->vcpu.kvm)) {
>  		set_intercept(svm, INTERCEPT_MONITOR);
>  		set_intercept(svm, INTERCEPT_MWAIT);
>  	}
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1eb7053..a067735 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3635,13 +3635,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  	      CPU_BASED_USE_IO_BITMAPS |
>  	      CPU_BASED_MOV_DR_EXITING |
>  	      CPU_BASED_USE_TSC_OFFSETING |
> +	      CPU_BASED_MWAIT_EXITING |
> +	      CPU_BASED_MONITOR_EXITING |
>  	      CPU_BASED_INVLPG_EXITING |
>  	      CPU_BASED_RDPMC_EXITING;
>  
> -	if (!kvm_mwait_in_guest())
> -		min |= CPU_BASED_MWAIT_EXITING |
> -			CPU_BASED_MONITOR_EXITING;
> -
>  	opt = CPU_BASED_TPR_SHADOW |
>  	      CPU_BASED_USE_MSR_BITMAPS |
>  	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> @@ -5297,6 +5295,9 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
>  		exec_control |= CPU_BASED_CR3_STORE_EXITING |
>  				CPU_BASED_CR3_LOAD_EXITING  |
>  				CPU_BASED_INVLPG_EXITING;
> +	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
> +		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
> +				  CPU_BASED_MONITOR_EXITING);
>  	return exec_control;
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 985a305..fe6627a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -67,6 +67,7 @@
>  #include <asm/pvclock.h>
>  #include <asm/div64.h>
>  #include <asm/irq_remapping.h>
> +#include <asm/mwait.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
> @@ -2672,6 +2673,40 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
>  	return r;
>  }
>  
> +static bool kvm_mwait_in_guest_possible(void)
> +{
> +	unsigned int eax, ebx, ecx, edx;
> +
> +	if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
> +		return false;
> +
> +	switch (boot_cpu_data.x86_vendor) {
> +	case X86_VENDOR_AMD:
> +		/* All AMD CPUs have a working MWAIT implementation */
> +		return true;
> +	case X86_VENDOR_INTEL:
> +		/* Handle Intel below */
> +		break;
> +	default:
> +		return false;
> +	}
> +
> +	/*
> +	 * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
> +	 * they would allow guest to stop the CPU completely by disabling
> +	 * interrupts then invoking MWAIT.
> +	 */
> +	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> +		return false;
> +
> +	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> +
> +	if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
> +		return false;
> +
> +	return true;
> +}
> +
>  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  {
>  	int r;
> @@ -2726,7 +2761,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = KVM_CLOCK_TSC_STABLE;
>  		break;
>  	case KVM_CAP_X86_GUEST_MWAIT:
> -		r = kvm_mwait_in_guest();
> +		r = kvm_mwait_in_guest_possible();
>  		break;
>  	case KVM_CAP_X86_SMM:
>  		/* SMBASE is usually relocated above 1M on modern chipsets,
> @@ -4026,6 +4061,13 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  
>  		r = 0;
>  		break;
> +	case KVM_CAP_X86_GUEST_MWAIT:
> +		r = -EINVAL;
> +		if (kvm_mwait_in_guest_possible()) {
> +			kvm->arch.mwait_in_guest = true;
> +			r = 0;
> +		}
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index d0b95b7..ed8e150 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -2,8 +2,6 @@
>  #ifndef ARCH_X86_KVM_X86_H
>  #define ARCH_X86_KVM_X86_H
>  
> -#include <asm/processor.h>
> -#include <asm/mwait.h>
>  #include <linux/kvm_host.h>
>  #include <asm/pvclock.h>
>  #include "kvm_cache_regs.h"
> @@ -263,38 +261,9 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
>  	    __rem;						\
>  	 })
>  
> -static inline bool kvm_mwait_in_guest(void)
> +static inline bool kvm_mwait_in_guest(struct kvm *kvm)
>  {
> -	unsigned int eax, ebx, ecx, edx;
> -
> -	if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
> -		return false;
> -
> -	switch (boot_cpu_data.x86_vendor) {
> -	case X86_VENDOR_AMD:
> -		/* All AMD CPUs have a working MWAIT implementation */
> -		return true;
> -	case X86_VENDOR_INTEL:
> -		/* Handle Intel below */
> -		break;
> -	default:
> -		return false;
> -	}
> -
> -	/*
> -	 * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
> -	 * they would allow guest to stop the CPU completely by disabling
> -	 * interrupts then invoking MWAIT.
> -	 */
> -	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> -		return false;
> -
> -	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> -
> -	if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
> -		return false;
> -
> -	return true;
> +	return kvm->arch.mwait_in_guest;
>  }
>  
>  #endif
> -- 
> 2.3.1.dirty

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] KVM: Add capability to not exit on PAUSE
  2017-11-25 13:09 ` [PATCH 3/3] KVM: Add capability to not exit on PAUSE Jan H. Schönherr
@ 2017-11-27 20:48   ` Michael S. Tsirkin
  2017-11-28  3:37   ` Longpeng (Mike)
  1 sibling, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-27 20:48 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Sat, Nov 25, 2017 at 02:09:33PM +0100, Jan H. Schönherr wrote:
> Allow to disable pause loop exit/pause filtering on a per VM basis.
> 
> If some VMs have dedicated host CPUs, they won't be negatively affected
> due to needlessly intercepted PAUSE instructions.
> 
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>

Nice, thanks!

> ---
> Note: AMD code paths are only compile tested
> ---
>  Documentation/virtual/kvm/api.txt |  8 ++++++++
>  arch/x86/include/asm/kvm_host.h   |  1 +
>  arch/x86/kvm/svm.c                |  3 ++-
>  arch/x86/kvm/vmx.c                | 17 +++++++++++++----
>  arch/x86/kvm/x86.c                |  5 +++++
>  arch/x86/kvm/x86.h                |  5 +++++
>  include/uapi/linux/kvm.h          |  1 +
>  7 files changed, 35 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index c06bb41..42a54d1 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4184,6 +4184,14 @@ This capability indicates that a guest using HLT to stop a virtual CPU will not
>  cause a VM exit. As such, time spent while a virtual CPU is halted in this way
>  will then be accounted for as guest running time on the host.
>  
> +7.15 KVM_CAP_X86_GUEST_PAUSE
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0 on success
> +
> +This capability indicates that a guest using PAUSE will not cause a VM exit.
> +
>  8. Other capabilities.
>  ----------------------
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3197c2d..0d4ea32 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -782,6 +782,7 @@ struct kvm_arch {
>  	gpa_t wall_clock;
>  
>  	bool hlt_in_guest;
> +	bool pause_in_guest;
>  	bool mwait_in_guest;
>  
>  	bool ept_identity_pagetable_done;
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c135b98..a5eb60a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1314,7 +1314,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	svm->nested.vmcb = 0;
>  	svm->vcpu.arch.hflags = 0;
>  
> -	if (boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
> +	if (boot_cpu_has(X86_FEATURE_PAUSEFILTER) &&
> +	    !kvm_pause_in_guest(svm->vcpu.kvm)) {
>  		control->pause_filter_count = 3000;
>  		set_intercept(svm, INTERCEPT_PAUSE);
>  	}
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1b67433..5f8c33b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5352,7 +5352,7 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
>  	}
>  	if (!enable_unrestricted_guest)
>  		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
> -	if (!ple_gap)
> +	if (kvm_pause_in_guest(vmx->vcpu.kvm))
>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>  	if (!kvm_vcpu_apicv_active(vcpu))
>  		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
> @@ -5519,7 +5519,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
>  	}
>  
> -	if (ple_gap) {
> +	if (!kvm_pause_in_guest(vmx->vcpu.kvm)) {
>  		vmcs_write32(PLE_GAP, ple_gap);
>  		vmx->ple_window = ple_window;
>  		vmx->ple_window_dirty = true;
> @@ -6975,7 +6975,7 @@ static __exit void hardware_unsetup(void)
>   */
>  static int handle_pause(struct kvm_vcpu *vcpu)
>  {
> -	if (ple_gap)
> +	if (!kvm_pause_in_guest(vcpu->kvm))
>  		grow_ple_window(vcpu);
>  
>  	/*
> @@ -9730,6 +9730,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>  	return ERR_PTR(err);
>  }
>  
> +static int vmx_vm_init(struct kvm *kvm)
> +{
> +	if (!ple_gap)
> +		kvm->arch.pause_in_guest = true;
> +	return 0;
> +}
> +
>  static void __init vmx_check_processor_compat(void *rtn)
>  {
>  	struct vmcs_config vmcs_conf;
> @@ -11793,7 +11800,7 @@ static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
>  
>  static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
>  {
> -	if (ple_gap)
> +	if (!kvm_pause_in_guest(vcpu->kvm))
>  		shrink_ple_window(vcpu);
>  }
>  
> @@ -12152,6 +12159,8 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.cpu_has_accelerated_tpr = report_flexpriority,
>  	.cpu_has_high_real_mode_segbase = vmx_has_high_real_mode_segbase,
>  
> +	.vm_init = vmx_vm_init,
> +
>  	.vcpu_create = vmx_create_vcpu,
>  	.vcpu_free = vmx_free_vcpu,
>  	.vcpu_reset = vmx_vcpu_reset,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f17c520..e13df00 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2756,6 +2756,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_SPLIT_IRQCHIP:
>  	case KVM_CAP_IMMEDIATE_EXIT:
>  	case KVM_CAP_X86_GUEST_HLT:
> +	case KVM_CAP_X86_GUEST_PAUSE:
>  		r = 1;
>  		break;
>  	case KVM_CAP_ADJUST_CLOCK:
> @@ -4073,6 +4074,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		kvm->arch.hlt_in_guest = true;
>  		r = 0;
>  		break;
> +	case KVM_CAP_X86_GUEST_PAUSE:
> +		kvm->arch.pause_in_guest = true;
> +		r = 0;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index b2066aa..56297c4 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -271,4 +271,9 @@ static inline bool kvm_hlt_in_guest(struct kvm *kvm)
>  	return kvm->arch.hlt_in_guest;
>  }
>  
> +static inline bool kvm_pause_in_guest(struct kvm *kvm)
> +{
> +	return kvm->arch.pause_in_guest;
> +}
> +
>  #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index ff8f266..bc2b654 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -933,6 +933,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_HYPERV_VP_INDEX 149
>  #define KVM_CAP_S390_AIS_MIGRATION 150
>  #define KVM_CAP_X86_GUEST_HLT 151
> +#define KVM_CAP_X86_GUEST_PAUSE 152
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.3.1.dirty

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
  2017-11-25 13:09 ` [PATCH 1/3] KVM: Don't enable MWAIT in guest by default Jan H. Schönherr
  2017-11-27 18:13   ` Jim Mattson
  2017-11-27 20:46   ` Michael S. Tsirkin
@ 2017-11-27 20:50   ` Michael S. Tsirkin
       [not found]     ` <90f7f081-95d7-f573-8b57-5c6e86fd2a8d@redhat.com>
  2 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-27 20:50 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Sat, Nov 25, 2017 at 02:09:31PM +0100, Jan H. Schönherr wrote:
> Allowing a guest to execute MWAIT without interception enables a guest
> to put a (physical) CPU into a power saving state, where it takes
> longer to return from than what may be desired by the host.
> 
> Don't give a guest that power over a host by default. (Especially,
> since nothing prevents a guest from using MWAIT even when it is not
> advertised via CPUID.)
> 
> This restores the behavior from before Linux 4.12 commit 668fffa3f838
> ("kvm: better MWAIT emulation for guests") but keeps the option to
> enable MWAIT in guest for individual VMs.
> 
> Suggested-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>

I don't think we should play with the defaults like this, userspace has
no way to detect what is supported and what is the
actual value.

I think we should either leave the default alone, or remove this
capability and add a new one.



> ---
> Note: AMD code paths are only compile tested
> ---
>  Documentation/virtual/kvm/api.txt | 20 ++++++++++--------
>  arch/x86/include/asm/kvm_host.h   |  2 ++
>  arch/x86/kvm/svm.c                |  2 +-
>  arch/x86/kvm/vmx.c                |  9 ++++----
>  arch/x86/kvm/x86.c                | 44 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.h                | 35 ++-----------------------------
>  6 files changed, 64 insertions(+), 48 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index f670e4b..0ee812c 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4163,6 +4163,17 @@ enables QEMU to build error log and branch to guest kernel registered
>  machine check handling routine. Without this capability KVM will
>  branch to guests' 0x200 interrupt vector.
>  
> +7.13 KVM_CAP_X86_GUEST_MWAIT
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0 on success
> +
> +This capability indicates that a guest using memory monitoring instructions
> +(MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
> +spent while a virtual CPU is halted in this way will then be accounted for as
> +guest running time on the host (as opposed to e.g. HLT).
> +
>  8. Other capabilities.
>  ----------------------
>  
> @@ -4275,15 +4286,6 @@ reserved.
>      Both registers and addresses are 64-bits wide.
>      It will be possible to run 64-bit or 32-bit guest code.
>  
> -8.8 KVM_CAP_X86_GUEST_MWAIT
> -
> -Architectures: x86
> -
> -This capability indicates that guest using memory monotoring instructions
> -(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit.  As such time
> -spent while virtual CPU is halted in this way will then be accounted for as
> -guest running time on the host (as opposed to e.g. HLT).
> -
>  8.9 KVM_CAP_ARM_USER_IRQ
>  
>  Architectures: arm, arm64
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b97726e..f7bcfaa 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -781,6 +781,8 @@ struct kvm_arch {
>  
>  	gpa_t wall_clock;
>  
> +	bool mwait_in_guest;
> +
>  	bool ept_identity_pagetable_done;
>  	gpa_t ept_identity_map_addr;
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1f3e7f2..ef1b320 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1253,7 +1253,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	set_intercept(svm, INTERCEPT_WBINVD);
>  	set_intercept(svm, INTERCEPT_XSETBV);
>  
> -	if (!kvm_mwait_in_guest()) {
> +	if (!kvm_mwait_in_guest(svm->vcpu.kvm)) {
>  		set_intercept(svm, INTERCEPT_MONITOR);
>  		set_intercept(svm, INTERCEPT_MWAIT);
>  	}
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1eb7053..a067735 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3635,13 +3635,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  	      CPU_BASED_USE_IO_BITMAPS |
>  	      CPU_BASED_MOV_DR_EXITING |
>  	      CPU_BASED_USE_TSC_OFFSETING |
> +	      CPU_BASED_MWAIT_EXITING |
> +	      CPU_BASED_MONITOR_EXITING |
>  	      CPU_BASED_INVLPG_EXITING |
>  	      CPU_BASED_RDPMC_EXITING;
>  
> -	if (!kvm_mwait_in_guest())
> -		min |= CPU_BASED_MWAIT_EXITING |
> -			CPU_BASED_MONITOR_EXITING;
> -
>  	opt = CPU_BASED_TPR_SHADOW |
>  	      CPU_BASED_USE_MSR_BITMAPS |
>  	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> @@ -5297,6 +5295,9 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
>  		exec_control |= CPU_BASED_CR3_STORE_EXITING |
>  				CPU_BASED_CR3_LOAD_EXITING  |
>  				CPU_BASED_INVLPG_EXITING;
> +	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
> +		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
> +				  CPU_BASED_MONITOR_EXITING);
>  	return exec_control;
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 985a305..fe6627a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -67,6 +67,7 @@
>  #include <asm/pvclock.h>
>  #include <asm/div64.h>
>  #include <asm/irq_remapping.h>
> +#include <asm/mwait.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
> @@ -2672,6 +2673,40 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
>  	return r;
>  }
>  
> +static bool kvm_mwait_in_guest_possible(void)
> +{
> +	unsigned int eax, ebx, ecx, edx;
> +
> +	if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
> +		return false;
> +
> +	switch (boot_cpu_data.x86_vendor) {
> +	case X86_VENDOR_AMD:
> +		/* All AMD CPUs have a working MWAIT implementation */
> +		return true;
> +	case X86_VENDOR_INTEL:
> +		/* Handle Intel below */
> +		break;
> +	default:
> +		return false;
> +	}
> +
> +	/*
> +	 * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
> +	 * they would allow guest to stop the CPU completely by disabling
> +	 * interrupts then invoking MWAIT.
> +	 */
> +	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> +		return false;
> +
> +	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> +
> +	if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
> +		return false;
> +
> +	return true;
> +}
> +
>  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  {
>  	int r;
> @@ -2726,7 +2761,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = KVM_CLOCK_TSC_STABLE;
>  		break;
>  	case KVM_CAP_X86_GUEST_MWAIT:
> -		r = kvm_mwait_in_guest();
> +		r = kvm_mwait_in_guest_possible();
>  		break;
>  	case KVM_CAP_X86_SMM:
>  		/* SMBASE is usually relocated above 1M on modern chipsets,
> @@ -4026,6 +4061,13 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  
>  		r = 0;
>  		break;
> +	case KVM_CAP_X86_GUEST_MWAIT:
> +		r = -EINVAL;
> +		if (kvm_mwait_in_guest_possible()) {
> +			kvm->arch.mwait_in_guest = true;
> +			r = 0;
> +		}
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index d0b95b7..ed8e150 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -2,8 +2,6 @@
>  #ifndef ARCH_X86_KVM_X86_H
>  #define ARCH_X86_KVM_X86_H
>  
> -#include <asm/processor.h>
> -#include <asm/mwait.h>
>  #include <linux/kvm_host.h>
>  #include <asm/pvclock.h>
>  #include "kvm_cache_regs.h"
> @@ -263,38 +261,9 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
>  	    __rem;						\
>  	 })
>  
> -static inline bool kvm_mwait_in_guest(void)
> +static inline bool kvm_mwait_in_guest(struct kvm *kvm)
>  {
> -	unsigned int eax, ebx, ecx, edx;
> -
> -	if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
> -		return false;
> -
> -	switch (boot_cpu_data.x86_vendor) {
> -	case X86_VENDOR_AMD:
> -		/* All AMD CPUs have a working MWAIT implementation */
> -		return true;
> -	case X86_VENDOR_INTEL:
> -		/* Handle Intel below */
> -		break;
> -	default:
> -		return false;
> -	}
> -
> -	/*
> -	 * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
> -	 * they would allow guest to stop the CPU completely by disabling
> -	 * interrupts then invoking MWAIT.
> -	 */
> -	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> -		return false;
> -
> -	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> -
> -	if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
> -		return false;
> -
> -	return true;
> +	return kvm->arch.mwait_in_guest;
>  }
>  
>  #endif
> -- 
> 2.3.1.dirty

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
       [not found]     ` <8ce45bad-b43c-4e97-aa69-74d7fc9cecb5@redhat.com>
@ 2017-11-27 20:55       ` Michael S. Tsirkin
  2017-11-28  1:34         ` Longpeng (Mike)
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-27 20:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan H. Schönherr, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Mon, Nov 27, 2017 at 09:51:27PM +0100, Paolo Bonzini wrote:
> On 27/11/2017 21:45, Michael S. Tsirkin wrote:
> > On Sat, Nov 25, 2017 at 02:09:32PM +0100, Jan H. Schönherr wrote:
> >> If host CPUs are dedicated to a VM, we can avoid VM exits on HLT,
> >> reducing the wake-up latency on posted interrupts.
> >>
> >> This reintroduces a feature that has been there at some point --
> >> see Linux 3.4 commit 10166744b80a ("KVM: VMX: remove yield_on_hlt")
> >> for the removal -- but with the additional ability to enable it only
> >> for selected VMs (and supporting SVM as well).
> >>
> >> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> > 
> > 
> > If you are going to do this, why not expose mwait
> > in the cpuid thus making guests use mwait to halt?
> > 
> > What are the advantages of doing this using halt specifically?
> 
> Not all guests use MWAIT, I suppose.
> 
> Paolo

In that case, it would be nice to document which guests of interest
don't.  E.g.  I don't think there are still supported versions of RHEL
that don't use MWAIT.



> > 
> >> ---
> >> Note: AMD code paths are only compile tested
> >> ---
> >>  Documentation/virtual/kvm/api.txt | 12 +++++++++++-
> >>  arch/x86/include/asm/kvm_host.h   |  1 +
> >>  arch/x86/kvm/svm.c                |  3 ++-
> >>  arch/x86/kvm/vmx.c                | 33 +++++++++++++++++++++++++++------
> >>  arch/x86/kvm/x86.c                |  5 +++++
> >>  arch/x86/kvm/x86.h                |  5 +++++
> >>  include/uapi/linux/kvm.h          |  1 +
> >>  7 files changed, 52 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index 0ee812c..c06bb41 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -4172,7 +4172,17 @@ Returns: 0 on success
> >>  This capability indicates that a guest using memory monitoring instructions
> >>  (MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
> >>  spent while a virtual CPU is halted in this way will then be accounted for as
> >> -guest running time on the host (as opposed to e.g. HLT).
> >> +guest running time on the host.
> >> +
> >> +7.14 KVM_CAP_X86_GUEST_HLT
> >> +
> >> +Architectures: x86
> >> +Parameters: none
> >> +Returns: 0 on success
> >> +
> >> +This capability indicates that a guest using HLT to stop a virtual CPU will not
> >> +cause a VM exit. As such, time spent while a virtual CPU is halted in this way
> >> +will then be accounted for as guest running time on the host.
> >>  
> >>  8. Other capabilities.
> >>  ----------------------
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index f7bcfaa..3197c2d 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -781,6 +781,7 @@ struct kvm_arch {
> >>  
> >>  	gpa_t wall_clock;
> >>  
> >> +	bool hlt_in_guest;
> >>  	bool mwait_in_guest;
> >>  
> >>  	bool ept_identity_pagetable_done;
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index ef1b320..c135b98 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -1236,7 +1236,6 @@ static void init_vmcb(struct vcpu_svm *svm)
> >>  	set_intercept(svm, INTERCEPT_RDPMC);
> >>  	set_intercept(svm, INTERCEPT_CPUID);
> >>  	set_intercept(svm, INTERCEPT_INVD);
> >> -	set_intercept(svm, INTERCEPT_HLT);
> >>  	set_intercept(svm, INTERCEPT_INVLPG);
> >>  	set_intercept(svm, INTERCEPT_INVLPGA);
> >>  	set_intercept(svm, INTERCEPT_IOIO_PROT);
> >> @@ -1257,6 +1256,8 @@ static void init_vmcb(struct vcpu_svm *svm)
> >>  		set_intercept(svm, INTERCEPT_MONITOR);
> >>  		set_intercept(svm, INTERCEPT_MWAIT);
> >>  	}
> >> +	if (!kvm_hlt_in_guest(svm->vcpu.kvm))
> >> +		set_intercept(svm, INTERCEPT_HLT);
> >>  
> >>  	control->iopm_base_pa = __sme_set(iopm_base);
> >>  	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index a067735..1b67433 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -2446,6 +2446,25 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
> >>  	vmx_set_interrupt_shadow(vcpu, 0);
> >>  }
> >>  
> >> +static void vmx_set_intr_info(struct kvm_vcpu *vcpu, u32 intr)
> >> +{
> >> +	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
> >> +
> >> +	/*
> >> +	 * Ensure that we clear the HLT state in the VMCS.  We don't need to
> >> +	 * explicitly skip the instruction because if the HLT state is set, then
> >> +	 * the instruction is already executing and RIP has already been
> >> +	 * advanced.
> >> +	 */
> >> +	if (!kvm_hlt_in_guest(vcpu->kvm) || !(intr & INTR_INFO_VALID_MASK))
> >> +		return;
> >> +	if (is_external_interrupt(intr) || is_nmi(intr))
> >> +		return;
> >> +	if (vmcs_read32(GUEST_ACTIVITY_STATE) != GUEST_ACTIVITY_HLT)
> >> +		return;
> >> +	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
> >> +}
> >> +
> >>  static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
> >>  					       unsigned long exit_qual)
> >>  {
> >> @@ -2540,7 +2559,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
> >>  	} else
> >>  		intr_info |= INTR_TYPE_HARD_EXCEPTION;
> >>  
> >> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
> >> +	vmx_set_intr_info(vcpu, intr_info);
> >>  }
> >>  
> >>  static bool vmx_rdtscp_supported(void)
> >> @@ -5298,6 +5317,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
> >>  	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
> >>  		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
> >>  				  CPU_BASED_MONITOR_EXITING);
> >> +	if (kvm_hlt_in_guest(vmx->vcpu.kvm))
> >> +		exec_control &= ~CPU_BASED_HLT_EXITING;
> >>  	return exec_control;
> >>  }
> >>  
> >> @@ -5635,7 +5656,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> >>  
> >>  	setup_msrs(vmx);
> >>  
> >> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
> >> +	vmx_set_intr_info(vcpu, 0);  /* 22.2.1 */
> >>  
> >>  	if (cpu_has_vmx_tpr_shadow() && !init_event) {
> >>  		vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
> >> @@ -5729,7 +5750,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
> >>  			     vmx->vcpu.arch.event_exit_inst_len);
> >>  	} else
> >>  		intr |= INTR_TYPE_EXT_INTR;
> >> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
> >> +	vmx_set_intr_info(vcpu, intr);
> >>  }
> >>  
> >>  static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
> >> @@ -5758,8 +5779,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
> >>  		return;
> >>  	}
> >>  
> >> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> >> -			INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
> >> +	vmx_set_intr_info(vcpu, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK |
> >> +				NMI_VECTOR);
> >>  }
> >>  
> >>  static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
> >> @@ -9301,7 +9322,7 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
> >>  				  VM_ENTRY_INSTRUCTION_LEN,
> >>  				  VM_ENTRY_EXCEPTION_ERROR_CODE);
> >>  
> >> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
> >> +	vmx_set_intr_info(vcpu, 0);
> >>  }
> >>  
> >>  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index fe6627a..f17c520 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -2755,6 +2755,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>  	case KVM_CAP_SET_BOOT_CPU_ID:
> >>   	case KVM_CAP_SPLIT_IRQCHIP:
> >>  	case KVM_CAP_IMMEDIATE_EXIT:
> >> +	case KVM_CAP_X86_GUEST_HLT:
> >>  		r = 1;
> >>  		break;
> >>  	case KVM_CAP_ADJUST_CLOCK:
> >> @@ -4068,6 +4069,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >>  			r = 0;
> >>  		}
> >>  		break;
> >> +	case KVM_CAP_X86_GUEST_HLT:
> >> +		kvm->arch.hlt_in_guest = true;
> >> +		r = 0;
> >> +		break;
> >>  	default:
> >>  		r = -EINVAL;
> >>  		break;
> >> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> >> index ed8e150..b2066aa 100644
> >> --- a/arch/x86/kvm/x86.h
> >> +++ b/arch/x86/kvm/x86.h
> >> @@ -266,4 +266,9 @@ static inline bool kvm_mwait_in_guest(struct kvm *kvm)
> >>  	return kvm->arch.mwait_in_guest;
> >>  }
> >>  
> >> +static inline bool kvm_hlt_in_guest(struct kvm *kvm)
> >> +{
> >> +	return kvm->arch.hlt_in_guest;
> >> +}
> >> +
> >>  #endif
> >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> index 282d7613..ff8f266 100644
> >> --- a/include/uapi/linux/kvm.h
> >> +++ b/include/uapi/linux/kvm.h
> >> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
> >>  #define KVM_CAP_HYPERV_SYNIC2 148
> >>  #define KVM_CAP_HYPERV_VP_INDEX 149
> >>  #define KVM_CAP_S390_AIS_MIGRATION 150
> >> +#define KVM_CAP_X86_GUEST_HLT 151
> >>  
> >>  #ifdef KVM_CAP_IRQ_ROUTING
> >>  
> >> -- 
> >> 2.3.1.dirty

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
       [not found]     ` <90f7f081-95d7-f573-8b57-5c6e86fd2a8d@redhat.com>
@ 2017-11-27 20:57       ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-27 20:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan H. Schönherr, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Mon, Nov 27, 2017 at 09:52:24PM +0100, Paolo Bonzini wrote:
> On 27/11/2017 21:50, Michael S. Tsirkin wrote:
> > On Sat, Nov 25, 2017 at 02:09:31PM +0100, Jan H. Schönherr wrote:
> >> Allowing a guest to execute MWAIT without interception enables a guest
> >> to put a (physical) CPU into a power saving state, where it takes
> >> longer to return from than what may be desired by the host.
> >>
> >> Don't give a guest that power over a host by default. (Especially,
> >> since nothing prevents a guest from using MWAIT even when it is not
> >> advertised via CPUID.)
> >>
> >> This restores the behavior from before Linux 4.12 commit 668fffa3f838
> >> ("kvm: better MWAIT emulation for guests") but keeps the option to
> >> enable MWAIT in guest for individual VMs.
> >>
> >> Suggested-by: KarimAllah Ahmed <karahmed@amazon.de>
> >> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> > 
> > I don't think we should play with the defaults like this, userspace has
> > no way to detect what is supported and what is the
> > actual value.
> > 
> > I think we should either leave the default alone, or remove this
> > capability and add a new one.
> 
> Are there any users of KVM_CHECK_EXTENSION(KVM_CAP_X86_GUEST_MWAIT)?

Not that I know though there's no sure way to find out.

> Because if not it's all academic...
> 
> Paolo

Presumably there will be. How will they know what is the actual default
value?

> > 
> > 
> > 
> >> ---
> >> Note: AMD code paths are only compile tested
> >> ---
> >>  Documentation/virtual/kvm/api.txt | 20 ++++++++++--------
> >>  arch/x86/include/asm/kvm_host.h   |  2 ++
> >>  arch/x86/kvm/svm.c                |  2 +-
> >>  arch/x86/kvm/vmx.c                |  9 ++++----
> >>  arch/x86/kvm/x86.c                | 44 ++++++++++++++++++++++++++++++++++++++-
> >>  arch/x86/kvm/x86.h                | 35 ++-----------------------------
> >>  6 files changed, 64 insertions(+), 48 deletions(-)
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index f670e4b..0ee812c 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -4163,6 +4163,17 @@ enables QEMU to build error log and branch to guest kernel registered
> >>  machine check handling routine. Without this capability KVM will
> >>  branch to guests' 0x200 interrupt vector.
> >>  
> >> +7.13 KVM_CAP_X86_GUEST_MWAIT
> >> +
> >> +Architectures: x86
> >> +Parameters: none
> >> +Returns: 0 on success
> >> +
> >> +This capability indicates that a guest using memory monitoring instructions
> >> +(MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
> >> +spent while a virtual CPU is halted in this way will then be accounted for as
> >> +guest running time on the host (as opposed to e.g. HLT).
> >> +
> >>  8. Other capabilities.
> >>  ----------------------
> >>  
> >> @@ -4275,15 +4286,6 @@ reserved.
> >>      Both registers and addresses are 64-bits wide.
> >>      It will be possible to run 64-bit or 32-bit guest code.
> >>  
> >> -8.8 KVM_CAP_X86_GUEST_MWAIT
> >> -
> >> -Architectures: x86
> >> -
> >> -This capability indicates that guest using memory monotoring instructions
> >> -(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit.  As such time
> >> -spent while virtual CPU is halted in this way will then be accounted for as
> >> -guest running time on the host (as opposed to e.g. HLT).
> >> -
> >>  8.9 KVM_CAP_ARM_USER_IRQ
> >>  
> >>  Architectures: arm, arm64
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index b97726e..f7bcfaa 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -781,6 +781,8 @@ struct kvm_arch {
> >>  
> >>  	gpa_t wall_clock;
> >>  
> >> +	bool mwait_in_guest;
> >> +
> >>  	bool ept_identity_pagetable_done;
> >>  	gpa_t ept_identity_map_addr;
> >>  
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index 1f3e7f2..ef1b320 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -1253,7 +1253,7 @@ static void init_vmcb(struct vcpu_svm *svm)
> >>  	set_intercept(svm, INTERCEPT_WBINVD);
> >>  	set_intercept(svm, INTERCEPT_XSETBV);
> >>  
> >> -	if (!kvm_mwait_in_guest()) {
> >> +	if (!kvm_mwait_in_guest(svm->vcpu.kvm)) {
> >>  		set_intercept(svm, INTERCEPT_MONITOR);
> >>  		set_intercept(svm, INTERCEPT_MWAIT);
> >>  	}
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 1eb7053..a067735 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -3635,13 +3635,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
> >>  	      CPU_BASED_USE_IO_BITMAPS |
> >>  	      CPU_BASED_MOV_DR_EXITING |
> >>  	      CPU_BASED_USE_TSC_OFFSETING |
> >> +	      CPU_BASED_MWAIT_EXITING |
> >> +	      CPU_BASED_MONITOR_EXITING |
> >>  	      CPU_BASED_INVLPG_EXITING |
> >>  	      CPU_BASED_RDPMC_EXITING;
> >>  
> >> -	if (!kvm_mwait_in_guest())
> >> -		min |= CPU_BASED_MWAIT_EXITING |
> >> -			CPU_BASED_MONITOR_EXITING;
> >> -
> >>  	opt = CPU_BASED_TPR_SHADOW |
> >>  	      CPU_BASED_USE_MSR_BITMAPS |
> >>  	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> >> @@ -5297,6 +5295,9 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
> >>  		exec_control |= CPU_BASED_CR3_STORE_EXITING |
> >>  				CPU_BASED_CR3_LOAD_EXITING  |
> >>  				CPU_BASED_INVLPG_EXITING;
> >> +	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
> >> +		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
> >> +				  CPU_BASED_MONITOR_EXITING);
> >>  	return exec_control;
> >>  }
> >>  
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index 985a305..fe6627a 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -67,6 +67,7 @@
> >>  #include <asm/pvclock.h>
> >>  #include <asm/div64.h>
> >>  #include <asm/irq_remapping.h>
> >> +#include <asm/mwait.h>
> >>  
> >>  #define CREATE_TRACE_POINTS
> >>  #include "trace.h"
> >> @@ -2672,6 +2673,40 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
> >>  	return r;
> >>  }
> >>  
> >> +static bool kvm_mwait_in_guest_possible(void)
> >> +{
> >> +	unsigned int eax, ebx, ecx, edx;
> >> +
> >> +	if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
> >> +		return false;
> >> +
> >> +	switch (boot_cpu_data.x86_vendor) {
> >> +	case X86_VENDOR_AMD:
> >> +		/* All AMD CPUs have a working MWAIT implementation */
> >> +		return true;
> >> +	case X86_VENDOR_INTEL:
> >> +		/* Handle Intel below */
> >> +		break;
> >> +	default:
> >> +		return false;
> >> +	}
> >> +
> >> +	/*
> >> +	 * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
> >> +	 * they would allow guest to stop the CPU completely by disabling
> >> +	 * interrupts then invoking MWAIT.
> >> +	 */
> >> +	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> >> +		return false;
> >> +
> >> +	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> >> +
> >> +	if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
> >> +		return false;
> >> +
> >> +	return true;
> >> +}
> >> +
> >>  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>  {
> >>  	int r;
> >> @@ -2726,7 +2761,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>  		r = KVM_CLOCK_TSC_STABLE;
> >>  		break;
> >>  	case KVM_CAP_X86_GUEST_MWAIT:
> >> -		r = kvm_mwait_in_guest();
> >> +		r = kvm_mwait_in_guest_possible();
> >>  		break;
> >>  	case KVM_CAP_X86_SMM:
> >>  		/* SMBASE is usually relocated above 1M on modern chipsets,
> >> @@ -4026,6 +4061,13 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >>  
> >>  		r = 0;
> >>  		break;
> >> +	case KVM_CAP_X86_GUEST_MWAIT:
> >> +		r = -EINVAL;
> >> +		if (kvm_mwait_in_guest_possible()) {
> >> +			kvm->arch.mwait_in_guest = true;
> >> +			r = 0;
> >> +		}
> >> +		break;
> >>  	default:
> >>  		r = -EINVAL;
> >>  		break;
> >> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> >> index d0b95b7..ed8e150 100644
> >> --- a/arch/x86/kvm/x86.h
> >> +++ b/arch/x86/kvm/x86.h
> >> @@ -2,8 +2,6 @@
> >>  #ifndef ARCH_X86_KVM_X86_H
> >>  #define ARCH_X86_KVM_X86_H
> >>  
> >> -#include <asm/processor.h>
> >> -#include <asm/mwait.h>
> >>  #include <linux/kvm_host.h>
> >>  #include <asm/pvclock.h>
> >>  #include "kvm_cache_regs.h"
> >> @@ -263,38 +261,9 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
> >>  	    __rem;						\
> >>  	 })
> >>  
> >> -static inline bool kvm_mwait_in_guest(void)
> >> +static inline bool kvm_mwait_in_guest(struct kvm *kvm)
> >>  {
> >> -	unsigned int eax, ebx, ecx, edx;
> >> -
> >> -	if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
> >> -		return false;
> >> -
> >> -	switch (boot_cpu_data.x86_vendor) {
> >> -	case X86_VENDOR_AMD:
> >> -		/* All AMD CPUs have a working MWAIT implementation */
> >> -		return true;
> >> -	case X86_VENDOR_INTEL:
> >> -		/* Handle Intel below */
> >> -		break;
> >> -	default:
> >> -		return false;
> >> -	}
> >> -
> >> -	/*
> >> -	 * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
> >> -	 * they would allow guest to stop the CPU completely by disabling
> >> -	 * interrupts then invoking MWAIT.
> >> -	 */
> >> -	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> >> -		return false;
> >> -
> >> -	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> >> -
> >> -	if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
> >> -		return false;
> >> -
> >> -	return true;
> >> +	return kvm->arch.mwait_in_guest;
> >>  }
> >>  
> >>  #endif
> >> -- 
> >> 2.3.1.dirty

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
  2017-11-27 20:46   ` Michael S. Tsirkin
@ 2017-11-27 22:36     ` Jan H. Schönherr
  2017-11-28 14:00       ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-27 22:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On 11/27/2017 09:46 PM, Michael S. Tsirkin wrote:
> On Sat, Nov 25, 2017 at 02:09:31PM +0100, Jan H. Schönherr wrote:
>> Allowing a guest to execute MWAIT without interception enables a guest
>> to put a (physical) CPU into a power saving state, where it takes
>> longer to return from than what may be desired by the host.
>>
>> Don't give a guest that power over a host by default. (Especially,
>> since nothing prevents a guest from using MWAIT even when it is not
>> advertised via CPUID.)
>>
>> This restores the behavior from before Linux 4.12 commit 668fffa3f838
>> ("kvm: better MWAIT emulation for guests") but keeps the option to
>> enable MWAIT in guest for individual VMs.
> 
> As others pointed out, an interrupt will wake up the host CPU anyway.
> 
> Given that, what's the actual motivation here?

The CPU will wake up, but it will take time depending which C-state the
guest requested and achieved to enter.

Since Linux 4.11 and 4.14 the menu and ladder cpuidle governors, respectively,
allow setting a per CPU wakeup latency requirement -- see, eg, commit c523c68da211
("cpuidle: ladder: Add per CPU PM QoS resume latency support"). Allowing a guest
to execute MWAIT without moderation will bypass that functionality.

Thus, while MWAIT-in-guest is a great thing to have when there are dedicated
CPUs, it is less so, when the CPUs still have to do other activities.

Regards
Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities
       [not found] ` <a3c80a22-ff69-fa51-ea90-48f039eb449a@redhat.com>
@ 2017-11-28  0:15   ` Jan H. Schönherr
       [not found]     ` <8971d9e0-388c-9934-1ab2-33508cbbeb8f@redhat.com>
  0 siblings, 1 reply; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-28  0:15 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm

On 11/27/2017 12:46 PM, Paolo Bonzini wrote:
> On 25/11/2017 14:09, Jan H. Schönherr wrote:
>> KVM no longer intercepts MWAIT instructions since Linux 4.12 commit 668fffa3f838
>> ("kvm: better MWAIT emulation for guests") for improved latency in some
>> workloads.
>>
>> This series takes the idea further and makes the interception of HLT (patch 2)
>> and PAUSE (patch 3) optional as well for similar reasons. Both interceptions
>> can be turned off for an individual VM by enabling the corresponding capability.
>>
>> It also converts KVM_CAP_X86_GUEST_MWAIT into an initially disabled capability
>> that has to be enabled explicitly for a VM. This restores pre Linux 4.12
>> behavior in the default case, so that a guest cannot put CPUs into low power
>> states, which may exceed the host's latency requirements (patch 1).
> 
> Nice!  Regarding the userspace ABI, we could have a single capability
> KVM_CAP_X86_DISABLE_EXITS that is just KVM_CAP_X86_GUEST_MWAIT renamed.
> The value returned by KVM_CHECK_EXTENSION would be defined like this:
> 
> - if bit 16 is 0, bit 0..15 says which exits are disabled
> 
> - if bit 16 is 1, no exits are disabled by default but the capability
> supports KVM_ENABLE_CAP

... and bits 0..15 indicate which exits can be disabled by specifying
them as argument to KVM_ENABLE_CAP?

> 
> With
> 
> #define KVM_X86_DISABLE_EXITS_MWAIT		1
> #define KVM_X86_DISABLE_EXITS_HLT		2
> #define KVM_X86_DISABLE_EXITS_PAUSE		4
> #define KVM_X86_DISABLE_EXITS_WITH_ENABLE	0x10000

Is that bit 16 an attempt at backwards compatibility with the current state?
That would only work, if any potential user actually checks with "==1" instead
of "!=0".

What is the benefit of doing this with bitmasks as opposed to separate
capabilities?

Regards
Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
  2017-11-27 20:55       ` Michael S. Tsirkin
@ 2017-11-28  1:34         ` Longpeng (Mike)
  2017-11-28 14:04           ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Longpeng (Mike) @ 2017-11-28  1:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, "Jan H. Schönherr",
	Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm



On 2017/11/28 4:55, Michael S. Tsirkin wrote:

> On Mon, Nov 27, 2017 at 09:51:27PM +0100, Paolo Bonzini wrote:
>> On 27/11/2017 21:45, Michael S. Tsirkin wrote:
>>> On Sat, Nov 25, 2017 at 02:09:32PM +0100, Jan H. Schönherr wrote:
>>>> If host CPUs are dedicated to a VM, we can avoid VM exits on HLT,
>>>> reducing the wake-up latency on posted interrupts.
>>>>
>>>> This reintroduces a feature that has been there at some point --
>>>> see Linux 3.4 commit 10166744b80a ("KVM: VMX: remove yield_on_hlt")
>>>> for the removal -- but with the additional ability to enable it only
>>>> for selected VMs (and supporting SVM as well).
>>>>
>>>> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
>>>
>>>
>>> If you are going to do this, why not expose mwait
>>> in the cpuid thus making guests use mwait to halt?
>>>
>>> What are the advantages of doing this using halt specifically?
>>
>> Not all guests use MWAIT, I suppose.
>>
>> Paolo
> 
> In that case, it would be nice to document which guests of interest
> don't.  E.g.  I don't think there are still supported versions of RHEL
> that don't use MWAIT.
> 


Some old kernels, E.g. my kernel is 3.10.0-514 based on RHEL 7.3, don't use
MWAIT if the idle-driver is not supported (we can see
"/sys/devices/system/cpu/cpuidle/current_driver" in guest is "none"). So the
idle routine will use the kernel's default routine.

The default idle routine is selected when starting,

old kernel:
'''
select_idle_routine()
	if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E)) {
		x86_idle = amd_e400_idle;
	} else
		x86_idle = default_idle;
'''

newer kernel:
'''
select_idle_routine()
	if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
		pr_info("using AMD E400 aware idle routine\n");
		x86_idle = amd_e400_idle;
	} else if (prefer_mwait_c1_over_halt(c)) {
		pr_info("using mwait in idle threads\n");
		x86_idle = mwait_idle;
	} else
		x86_idle = default_idle;
'''

So, some old guests don't use MWAIT as default idle routine.

> 
> 
>>>
>>>> ---
>>>> Note: AMD code paths are only compile tested
>>>> ---
>>>>  Documentation/virtual/kvm/api.txt | 12 +++++++++++-
>>>>  arch/x86/include/asm/kvm_host.h   |  1 +
>>>>  arch/x86/kvm/svm.c                |  3 ++-
>>>>  arch/x86/kvm/vmx.c                | 33 +++++++++++++++++++++++++++------
>>>>  arch/x86/kvm/x86.c                |  5 +++++
>>>>  arch/x86/kvm/x86.h                |  5 +++++
>>>>  include/uapi/linux/kvm.h          |  1 +
>>>>  7 files changed, 52 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>> index 0ee812c..c06bb41 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -4172,7 +4172,17 @@ Returns: 0 on success
>>>>  This capability indicates that a guest using memory monitoring instructions
>>>>  (MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
>>>>  spent while a virtual CPU is halted in this way will then be accounted for as
>>>> -guest running time on the host (as opposed to e.g. HLT).
>>>> +guest running time on the host.
>>>> +
>>>> +7.14 KVM_CAP_X86_GUEST_HLT
>>>> +
>>>> +Architectures: x86
>>>> +Parameters: none
>>>> +Returns: 0 on success
>>>> +
>>>> +This capability indicates that a guest using HLT to stop a virtual CPU will not
>>>> +cause a VM exit. As such, time spent while a virtual CPU is halted in this way
>>>> +will then be accounted for as guest running time on the host.
>>>>  
>>>>  8. Other capabilities.
>>>>  ----------------------
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index f7bcfaa..3197c2d 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -781,6 +781,7 @@ struct kvm_arch {
>>>>  
>>>>  	gpa_t wall_clock;
>>>>  
>>>> +	bool hlt_in_guest;
>>>>  	bool mwait_in_guest;
>>>>  
>>>>  	bool ept_identity_pagetable_done;
>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>> index ef1b320..c135b98 100644
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -1236,7 +1236,6 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>>  	set_intercept(svm, INTERCEPT_RDPMC);
>>>>  	set_intercept(svm, INTERCEPT_CPUID);
>>>>  	set_intercept(svm, INTERCEPT_INVD);
>>>> -	set_intercept(svm, INTERCEPT_HLT);
>>>>  	set_intercept(svm, INTERCEPT_INVLPG);
>>>>  	set_intercept(svm, INTERCEPT_INVLPGA);
>>>>  	set_intercept(svm, INTERCEPT_IOIO_PROT);
>>>> @@ -1257,6 +1256,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>>  		set_intercept(svm, INTERCEPT_MONITOR);
>>>>  		set_intercept(svm, INTERCEPT_MWAIT);
>>>>  	}
>>>> +	if (!kvm_hlt_in_guest(svm->vcpu.kvm))
>>>> +		set_intercept(svm, INTERCEPT_HLT);
>>>>  
>>>>  	control->iopm_base_pa = __sme_set(iopm_base);
>>>>  	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> index a067735..1b67433 100644
>>>> --- a/arch/x86/kvm/vmx.c
>>>> +++ b/arch/x86/kvm/vmx.c
>>>> @@ -2446,6 +2446,25 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>>>>  	vmx_set_interrupt_shadow(vcpu, 0);
>>>>  }
>>>>  
>>>> +static void vmx_set_intr_info(struct kvm_vcpu *vcpu, u32 intr)
>>>> +{
>>>> +	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
>>>> +
>>>> +	/*
>>>> +	 * Ensure that we clear the HLT state in the VMCS.  We don't need to
>>>> +	 * explicitly skip the instruction because if the HLT state is set, then
>>>> +	 * the instruction is already executing and RIP has already been
>>>> +	 * advanced.
>>>> +	 */
>>>> +	if (!kvm_hlt_in_guest(vcpu->kvm) || !(intr & INTR_INFO_VALID_MASK))
>>>> +		return;
>>>> +	if (is_external_interrupt(intr) || is_nmi(intr))
>>>> +		return;
>>>> +	if (vmcs_read32(GUEST_ACTIVITY_STATE) != GUEST_ACTIVITY_HLT)
>>>> +		return;
>>>> +	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
>>>> +}
>>>> +
>>>>  static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
>>>>  					       unsigned long exit_qual)
>>>>  {
>>>> @@ -2540,7 +2559,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
>>>>  	} else
>>>>  		intr_info |= INTR_TYPE_HARD_EXCEPTION;
>>>>  
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
>>>> +	vmx_set_intr_info(vcpu, intr_info);
>>>>  }
>>>>  
>>>>  static bool vmx_rdtscp_supported(void)
>>>> @@ -5298,6 +5317,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
>>>>  	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
>>>>  		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
>>>>  				  CPU_BASED_MONITOR_EXITING);
>>>> +	if (kvm_hlt_in_guest(vmx->vcpu.kvm))
>>>> +		exec_control &= ~CPU_BASED_HLT_EXITING;
>>>>  	return exec_control;
>>>>  }
>>>>  
>>>> @@ -5635,7 +5656,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>>>  
>>>>  	setup_msrs(vmx);
>>>>  
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
>>>> +	vmx_set_intr_info(vcpu, 0);  /* 22.2.1 */
>>>>  
>>>>  	if (cpu_has_vmx_tpr_shadow() && !init_event) {
>>>>  		vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
>>>> @@ -5729,7 +5750,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
>>>>  			     vmx->vcpu.arch.event_exit_inst_len);
>>>>  	} else
>>>>  		intr |= INTR_TYPE_EXT_INTR;
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
>>>> +	vmx_set_intr_info(vcpu, intr);
>>>>  }
>>>>  
>>>>  static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>>>> @@ -5758,8 +5779,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>>>>  		return;
>>>>  	}
>>>>  
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>>>> -			INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
>>>> +	vmx_set_intr_info(vcpu, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK |
>>>> +				NMI_VECTOR);
>>>>  }
>>>>  
>>>>  static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
>>>> @@ -9301,7 +9322,7 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>>>>  				  VM_ENTRY_INSTRUCTION_LEN,
>>>>  				  VM_ENTRY_EXCEPTION_ERROR_CODE);
>>>>  
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
>>>> +	vmx_set_intr_info(vcpu, 0);
>>>>  }
>>>>  
>>>>  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index fe6627a..f17c520 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -2755,6 +2755,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>  	case KVM_CAP_SET_BOOT_CPU_ID:
>>>>   	case KVM_CAP_SPLIT_IRQCHIP:
>>>>  	case KVM_CAP_IMMEDIATE_EXIT:
>>>> +	case KVM_CAP_X86_GUEST_HLT:
>>>>  		r = 1;
>>>>  		break;
>>>>  	case KVM_CAP_ADJUST_CLOCK:
>>>> @@ -4068,6 +4069,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>>>  			r = 0;
>>>>  		}
>>>>  		break;
>>>> +	case KVM_CAP_X86_GUEST_HLT:
>>>> +		kvm->arch.hlt_in_guest = true;
>>>> +		r = 0;
>>>> +		break;
>>>>  	default:
>>>>  		r = -EINVAL;
>>>>  		break;
>>>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>>>> index ed8e150..b2066aa 100644
>>>> --- a/arch/x86/kvm/x86.h
>>>> +++ b/arch/x86/kvm/x86.h
>>>> @@ -266,4 +266,9 @@ static inline bool kvm_mwait_in_guest(struct kvm *kvm)
>>>>  	return kvm->arch.mwait_in_guest;
>>>>  }
>>>>  
>>>> +static inline bool kvm_hlt_in_guest(struct kvm *kvm)
>>>> +{
>>>> +	return kvm->arch.hlt_in_guest;
>>>> +}
>>>> +
>>>>  #endif
>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>> index 282d7613..ff8f266 100644
>>>> --- a/include/uapi/linux/kvm.h
>>>> +++ b/include/uapi/linux/kvm.h
>>>> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
>>>>  #define KVM_CAP_HYPERV_SYNIC2 148
>>>>  #define KVM_CAP_HYPERV_VP_INDEX 149
>>>>  #define KVM_CAP_S390_AIS_MIGRATION 150
>>>> +#define KVM_CAP_X86_GUEST_HLT 151
>>>>  
>>>>  #ifdef KVM_CAP_IRQ_ROUTING
>>>>  
>>>> -- 
>>>> 2.3.1.dirty
> 
> .
> 


-- 
Regards,
Longpeng(Mike)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] KVM: Add capability to not exit on PAUSE
  2017-11-25 13:09 ` [PATCH 3/3] KVM: Add capability to not exit on PAUSE Jan H. Schönherr
  2017-11-27 20:48   ` Michael S. Tsirkin
@ 2017-11-28  3:37   ` Longpeng (Mike)
  2017-11-29  0:09     ` Jan H. Schönherr
  1 sibling, 1 reply; 33+ messages in thread
From: Longpeng (Mike) @ 2017-11-28  3:37 UTC (permalink / raw)
  To: "Jan H. Schönherr"
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm



On 2017/11/25 21:09, Jan H. Schönherr wrote:

> Allow to disable pause loop exit/pause filtering on a per VM basis.
> 
> If some VMs have dedicated host CPUs, they won't be negatively affected
> due to needlessly intercepted PAUSE instructions.
> 

Hi Jan,

Is there any difference between 'disable PLE in vmcs' and 'make ple_gap per
VM/VCPU and set ple_gap=0 for vcpus which is dedicated' ?

> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> ---
> Note: AMD code paths are only compile tested
> ---
>  Documentation/virtual/kvm/api.txt |  8 ++++++++
>  arch/x86/include/asm/kvm_host.h   |  1 +
>  arch/x86/kvm/svm.c                |  3 ++-
>  arch/x86/kvm/vmx.c                | 17 +++++++++++++----
>  arch/x86/kvm/x86.c                |  5 +++++
>  arch/x86/kvm/x86.h                |  5 +++++
>  include/uapi/linux/kvm.h          |  1 +
>  7 files changed, 35 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index c06bb41..42a54d1 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4184,6 +4184,14 @@ This capability indicates that a guest using HLT to stop a virtual CPU will not
>  cause a VM exit. As such, time spent while a virtual CPU is halted in this way
>  will then be accounted for as guest running time on the host.
>  
> +7.15 KVM_CAP_X86_GUEST_PAUSE
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0 on success
> +
> +This capability indicates that a guest using PAUSE will not cause a VM exit.
> +
>  8. Other capabilities.
>  ----------------------
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3197c2d..0d4ea32 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -782,6 +782,7 @@ struct kvm_arch {
>  	gpa_t wall_clock;
>  
>  	bool hlt_in_guest;
> +	bool pause_in_guest;
>  	bool mwait_in_guest;
>  
>  	bool ept_identity_pagetable_done;
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c135b98..a5eb60a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1314,7 +1314,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	svm->nested.vmcb = 0;
>  	svm->vcpu.arch.hflags = 0;
>  
> -	if (boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
> +	if (boot_cpu_has(X86_FEATURE_PAUSEFILTER) &&
> +	    !kvm_pause_in_guest(svm->vcpu.kvm)) {
>  		control->pause_filter_count = 3000;
>  		set_intercept(svm, INTERCEPT_PAUSE);
>  	}
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1b67433..5f8c33b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5352,7 +5352,7 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
>  	}
>  	if (!enable_unrestricted_guest)
>  		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
> -	if (!ple_gap)
> +	if (kvm_pause_in_guest(vmx->vcpu.kvm))
>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>  	if (!kvm_vcpu_apicv_active(vcpu))
>  		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
> @@ -5519,7 +5519,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
>  	}
>  
> -	if (ple_gap) {
> +	if (!kvm_pause_in_guest(vmx->vcpu.kvm)) {
>  		vmcs_write32(PLE_GAP, ple_gap);
>  		vmx->ple_window = ple_window;
>  		vmx->ple_window_dirty = true;
> @@ -6975,7 +6975,7 @@ static __exit void hardware_unsetup(void)
>   */
>  static int handle_pause(struct kvm_vcpu *vcpu)
>  {
> -	if (ple_gap)
> +	if (!kvm_pause_in_guest(vcpu->kvm))
>  		grow_ple_window(vcpu);
>  
>  	/*
> @@ -9730,6 +9730,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>  	return ERR_PTR(err);
>  }
>  
> +static int vmx_vm_init(struct kvm *kvm)
> +{
> +	if (!ple_gap)
> +		kvm->arch.pause_in_guest = true;
> +	return 0;
> +}
> +
>  static void __init vmx_check_processor_compat(void *rtn)
>  {
>  	struct vmcs_config vmcs_conf;
> @@ -11793,7 +11800,7 @@ static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
>  
>  static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
>  {
> -	if (ple_gap)
> +	if (!kvm_pause_in_guest(vcpu->kvm))
>  		shrink_ple_window(vcpu);
>  }
>  
> @@ -12152,6 +12159,8 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.cpu_has_accelerated_tpr = report_flexpriority,
>  	.cpu_has_high_real_mode_segbase = vmx_has_high_real_mode_segbase,
>  
> +	.vm_init = vmx_vm_init,
> +
>  	.vcpu_create = vmx_create_vcpu,
>  	.vcpu_free = vmx_free_vcpu,
>  	.vcpu_reset = vmx_vcpu_reset,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f17c520..e13df00 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2756,6 +2756,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_SPLIT_IRQCHIP:
>  	case KVM_CAP_IMMEDIATE_EXIT:
>  	case KVM_CAP_X86_GUEST_HLT:
> +	case KVM_CAP_X86_GUEST_PAUSE:
>  		r = 1;
>  		break;
>  	case KVM_CAP_ADJUST_CLOCK:
> @@ -4073,6 +4074,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		kvm->arch.hlt_in_guest = true;
>  		r = 0;
>  		break;
> +	case KVM_CAP_X86_GUEST_PAUSE:
> +		kvm->arch.pause_in_guest = true;
> +		r = 0;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index b2066aa..56297c4 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -271,4 +271,9 @@ static inline bool kvm_hlt_in_guest(struct kvm *kvm)
>  	return kvm->arch.hlt_in_guest;
>  }
>  
> +static inline bool kvm_pause_in_guest(struct kvm *kvm)
> +{
> +	return kvm->arch.pause_in_guest;
> +}
> +
>  #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index ff8f266..bc2b654 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -933,6 +933,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_HYPERV_VP_INDEX 149
>  #define KVM_CAP_S390_AIS_MIGRATION 150
>  #define KVM_CAP_X86_GUEST_HLT 151
> +#define KVM_CAP_X86_GUEST_PAUSE 152
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  


-- 
Regards,
Longpeng(Mike)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities
       [not found]     ` <8971d9e0-388c-9934-1ab2-33508cbbeb8f@redhat.com>
@ 2017-11-28 10:42       ` Jan H. Schönherr
  2017-11-28 14:08       ` Michael S. Tsirkin
  1 sibling, 0 replies; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-28 10:42 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm

On 11/28/2017 09:42 AM, Paolo Bonzini wrote:
> Fair enough.  Let's get rid of KVM_CAP_X86_GUEST_MWAIT altogether and
> add a new capability without bit 16.  But let's use just one.
> 
>> What is the benefit of doing this with bitmasks as opposed to separate
>> capabilities?
> 
> The three capabilities are more or less all doing the same thing.
> Perhaps it would make some sense to only leave PAUSE spin loops in
> guest, but not HLT/MWAIT; but apart from that I think users would
> probably enable all of them.  So I think we should put in the
> documentation that blindly passing the KVM_CHECK_EXTENSION result to
> KVM_ENABLE_CAP is a valid thing to do when vCPUs are associated to
> dedicated physical CPUs.

I'll give that a shot for v2.

Regards
Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
  2017-11-27 22:36     ` Jan H. Schönherr
@ 2017-11-28 14:00       ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-28 14:00 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Mon, Nov 27, 2017 at 11:36:11PM +0100, Jan H. Schönherr wrote:
> On 11/27/2017 09:46 PM, Michael S. Tsirkin wrote:
> > On Sat, Nov 25, 2017 at 02:09:31PM +0100, Jan H. Schönherr wrote:
> >> Allowing a guest to execute MWAIT without interception enables a guest
> >> to put a (physical) CPU into a power saving state, where it takes
> >> longer to return from than what may be desired by the host.
> >>
> >> Don't give a guest that power over a host by default. (Especially,
> >> since nothing prevents a guest from using MWAIT even when it is not
> >> advertised via CPUID.)
> >>
> >> This restores the behavior from before Linux 4.12 commit 668fffa3f838
> >> ("kvm: better MWAIT emulation for guests") but keeps the option to
> >> enable MWAIT in guest for individual VMs.
> > 
> > As others pointed out, an interrupt will wake up the host CPU anyway.
> > 
> > Given that, what's the actual motivation here?
> 
> The CPU will wake up, but it will take time depending which C-state the
> guest requested and achieved to enter.
> 
> Since Linux 4.11 and 4.14 the menu and ladder cpuidle governors, respectively,
> allow setting a per CPU wakeup latency requirement -- see, eg, commit c523c68da211
> ("cpuidle: ladder: Add per CPU PM QoS resume latency support"). Allowing a guest
> to execute MWAIT without moderation will bypass that functionality.
> 
> Thus, while MWAIT-in-guest is a great thing to have when there are dedicated
> CPUs, it is less so, when the CPUs still have to do other activities.
> 
> Regards
> Jan

I see. And the example given there is a CPU running real time
applications.

If you do another version, it might be helpful to add
this info. E.g.
"... than desired by the host - e.g. when the same CPU is running
another real time applications".

I wonder how well does it work to mix real time apps
and VMs on the same CPU.

-- 
MST

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
  2017-11-28  1:34         ` Longpeng (Mike)
@ 2017-11-28 14:04           ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-28 14:04 UTC (permalink / raw)
  To: Longpeng (Mike)
  Cc: Paolo Bonzini, "Jan H. Schönherr",
	Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Tue, Nov 28, 2017 at 09:34:56AM +0800, Longpeng (Mike) wrote:
> 
> 
> On 2017/11/28 4:55, Michael S. Tsirkin wrote:
> 
> > On Mon, Nov 27, 2017 at 09:51:27PM +0100, Paolo Bonzini wrote:
> >> On 27/11/2017 21:45, Michael S. Tsirkin wrote:
> >>> On Sat, Nov 25, 2017 at 02:09:32PM +0100, Jan H. Schönherr wrote:
> >>>> If host CPUs are dedicated to a VM, we can avoid VM exits on HLT,
> >>>> reducing the wake-up latency on posted interrupts.
> >>>>
> >>>> This reintroduces a feature that has been there at some point --
> >>>> see Linux 3.4 commit 10166744b80a ("KVM: VMX: remove yield_on_hlt")
> >>>> for the removal -- but with the additional ability to enable it only
> >>>> for selected VMs (and supporting SVM as well).
> >>>>
> >>>> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> >>>
> >>>
> >>> If you are going to do this, why not expose mwait
> >>> in the cpuid thus making guests use mwait to halt?
> >>>
> >>> What are the advantages of doing this using halt specifically?
> >>
> >> Not all guests use MWAIT, I suppose.
> >>
> >> Paolo
> > 
> > In that case, it would be nice to document which guests of interest
> > don't.  E.g.  I don't think there are still supported versions of RHEL
> > that don't use MWAIT.
> > 
> 
> 
> Some old kernels, E.g. my kernel is 3.10.0-514 based on RHEL 7.3, don't use
> MWAIT if the idle-driver is not supported (we can see
> "/sys/devices/system/cpu/cpuidle/current_driver" in guest is "none").


This is exactly what I was saying. mwait is supported by this guest -
the reason idle driver is not used is because QEMU does not enable the
mwait flag in the CPUID.

You just need a QEMU patch to expose CPUID.

However, if mwait is disabled to avoid guest entering low PM states,
then maybe it makes sense to allow a non-exiting HLT instead.
Pls add that in the commit log as a motivation.


> So the
> idle routine will use the kernel's default routine.
> 
> The default idle routine is selected when starting,
> 
> old kernel:
> '''
> select_idle_routine()
> 	if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E)) {
> 		x86_idle = amd_e400_idle;
> 	} else
> 		x86_idle = default_idle;
> '''
> 
> newer kernel:
> '''
> select_idle_routine()
> 	if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
> 		pr_info("using AMD E400 aware idle routine\n");
> 		x86_idle = amd_e400_idle;
> 	} else if (prefer_mwait_c1_over_halt(c)) {
> 		pr_info("using mwait in idle threads\n");
> 		x86_idle = mwait_idle;
> 	} else
> 		x86_idle = default_idle;
> '''
> 
> So, some old guests don't use MWAIT as default idle routine.
> 
> > 
> > 
> >>>
> >>>> ---
> >>>> Note: AMD code paths are only compile tested
> >>>> ---
> >>>>  Documentation/virtual/kvm/api.txt | 12 +++++++++++-
> >>>>  arch/x86/include/asm/kvm_host.h   |  1 +
> >>>>  arch/x86/kvm/svm.c                |  3 ++-
> >>>>  arch/x86/kvm/vmx.c                | 33 +++++++++++++++++++++++++++------
> >>>>  arch/x86/kvm/x86.c                |  5 +++++
> >>>>  arch/x86/kvm/x86.h                |  5 +++++
> >>>>  include/uapi/linux/kvm.h          |  1 +
> >>>>  7 files changed, 52 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>>> index 0ee812c..c06bb41 100644
> >>>> --- a/Documentation/virtual/kvm/api.txt
> >>>> +++ b/Documentation/virtual/kvm/api.txt
> >>>> @@ -4172,7 +4172,17 @@ Returns: 0 on success
> >>>>  This capability indicates that a guest using memory monitoring instructions
> >>>>  (MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
> >>>>  spent while a virtual CPU is halted in this way will then be accounted for as
> >>>> -guest running time on the host (as opposed to e.g. HLT).
> >>>> +guest running time on the host.
> >>>> +
> >>>> +7.14 KVM_CAP_X86_GUEST_HLT
> >>>> +
> >>>> +Architectures: x86
> >>>> +Parameters: none
> >>>> +Returns: 0 on success
> >>>> +
> >>>> +This capability indicates that a guest using HLT to stop a virtual CPU will not
> >>>> +cause a VM exit. As such, time spent while a virtual CPU is halted in this way
> >>>> +will then be accounted for as guest running time on the host.
> >>>>  
> >>>>  8. Other capabilities.
> >>>>  ----------------------
> >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>> index f7bcfaa..3197c2d 100644
> >>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>> @@ -781,6 +781,7 @@ struct kvm_arch {
> >>>>  
> >>>>  	gpa_t wall_clock;
> >>>>  
> >>>> +	bool hlt_in_guest;
> >>>>  	bool mwait_in_guest;
> >>>>  
> >>>>  	bool ept_identity_pagetable_done;
> >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>>> index ef1b320..c135b98 100644
> >>>> --- a/arch/x86/kvm/svm.c
> >>>> +++ b/arch/x86/kvm/svm.c
> >>>> @@ -1236,7 +1236,6 @@ static void init_vmcb(struct vcpu_svm *svm)
> >>>>  	set_intercept(svm, INTERCEPT_RDPMC);
> >>>>  	set_intercept(svm, INTERCEPT_CPUID);
> >>>>  	set_intercept(svm, INTERCEPT_INVD);
> >>>> -	set_intercept(svm, INTERCEPT_HLT);
> >>>>  	set_intercept(svm, INTERCEPT_INVLPG);
> >>>>  	set_intercept(svm, INTERCEPT_INVLPGA);
> >>>>  	set_intercept(svm, INTERCEPT_IOIO_PROT);
> >>>> @@ -1257,6 +1256,8 @@ static void init_vmcb(struct vcpu_svm *svm)
> >>>>  		set_intercept(svm, INTERCEPT_MONITOR);
> >>>>  		set_intercept(svm, INTERCEPT_MWAIT);
> >>>>  	}
> >>>> +	if (!kvm_hlt_in_guest(svm->vcpu.kvm))
> >>>> +		set_intercept(svm, INTERCEPT_HLT);
> >>>>  
> >>>>  	control->iopm_base_pa = __sme_set(iopm_base);
> >>>>  	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>> index a067735..1b67433 100644
> >>>> --- a/arch/x86/kvm/vmx.c
> >>>> +++ b/arch/x86/kvm/vmx.c
> >>>> @@ -2446,6 +2446,25 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
> >>>>  	vmx_set_interrupt_shadow(vcpu, 0);
> >>>>  }
> >>>>  
> >>>> +static void vmx_set_intr_info(struct kvm_vcpu *vcpu, u32 intr)
> >>>> +{
> >>>> +	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
> >>>> +
> >>>> +	/*
> >>>> +	 * Ensure that we clear the HLT state in the VMCS.  We don't need to
> >>>> +	 * explicitly skip the instruction because if the HLT state is set, then
> >>>> +	 * the instruction is already executing and RIP has already been
> >>>> +	 * advanced.
> >>>> +	 */
> >>>> +	if (!kvm_hlt_in_guest(vcpu->kvm) || !(intr & INTR_INFO_VALID_MASK))
> >>>> +		return;
> >>>> +	if (is_external_interrupt(intr) || is_nmi(intr))
> >>>> +		return;
> >>>> +	if (vmcs_read32(GUEST_ACTIVITY_STATE) != GUEST_ACTIVITY_HLT)
> >>>> +		return;
> >>>> +	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
> >>>> +}
> >>>> +
> >>>>  static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
> >>>>  					       unsigned long exit_qual)
> >>>>  {
> >>>> @@ -2540,7 +2559,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
> >>>>  	} else
> >>>>  		intr_info |= INTR_TYPE_HARD_EXCEPTION;
> >>>>  
> >>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
> >>>> +	vmx_set_intr_info(vcpu, intr_info);
> >>>>  }
> >>>>  
> >>>>  static bool vmx_rdtscp_supported(void)
> >>>> @@ -5298,6 +5317,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
> >>>>  	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
> >>>>  		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
> >>>>  				  CPU_BASED_MONITOR_EXITING);
> >>>> +	if (kvm_hlt_in_guest(vmx->vcpu.kvm))
> >>>> +		exec_control &= ~CPU_BASED_HLT_EXITING;
> >>>>  	return exec_control;
> >>>>  }
> >>>>  
> >>>> @@ -5635,7 +5656,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> >>>>  
> >>>>  	setup_msrs(vmx);
> >>>>  
> >>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
> >>>> +	vmx_set_intr_info(vcpu, 0);  /* 22.2.1 */
> >>>>  
> >>>>  	if (cpu_has_vmx_tpr_shadow() && !init_event) {
> >>>>  		vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
> >>>> @@ -5729,7 +5750,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
> >>>>  			     vmx->vcpu.arch.event_exit_inst_len);
> >>>>  	} else
> >>>>  		intr |= INTR_TYPE_EXT_INTR;
> >>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
> >>>> +	vmx_set_intr_info(vcpu, intr);
> >>>>  }
> >>>>  
> >>>>  static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
> >>>> @@ -5758,8 +5779,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
> >>>>  		return;
> >>>>  	}
> >>>>  
> >>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> >>>> -			INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
> >>>> +	vmx_set_intr_info(vcpu, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK |
> >>>> +				NMI_VECTOR);
> >>>>  }
> >>>>  
> >>>>  static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
> >>>> @@ -9301,7 +9322,7 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
> >>>>  				  VM_ENTRY_INSTRUCTION_LEN,
> >>>>  				  VM_ENTRY_EXCEPTION_ERROR_CODE);
> >>>>  
> >>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
> >>>> +	vmx_set_intr_info(vcpu, 0);
> >>>>  }
> >>>>  
> >>>>  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>> index fe6627a..f17c520 100644
> >>>> --- a/arch/x86/kvm/x86.c
> >>>> +++ b/arch/x86/kvm/x86.c
> >>>> @@ -2755,6 +2755,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>>>  	case KVM_CAP_SET_BOOT_CPU_ID:
> >>>>   	case KVM_CAP_SPLIT_IRQCHIP:
> >>>>  	case KVM_CAP_IMMEDIATE_EXIT:
> >>>> +	case KVM_CAP_X86_GUEST_HLT:
> >>>>  		r = 1;
> >>>>  		break;
> >>>>  	case KVM_CAP_ADJUST_CLOCK:
> >>>> @@ -4068,6 +4069,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >>>>  			r = 0;
> >>>>  		}
> >>>>  		break;
> >>>> +	case KVM_CAP_X86_GUEST_HLT:
> >>>> +		kvm->arch.hlt_in_guest = true;
> >>>> +		r = 0;
> >>>> +		break;
> >>>>  	default:
> >>>>  		r = -EINVAL;
> >>>>  		break;
> >>>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> >>>> index ed8e150..b2066aa 100644
> >>>> --- a/arch/x86/kvm/x86.h
> >>>> +++ b/arch/x86/kvm/x86.h
> >>>> @@ -266,4 +266,9 @@ static inline bool kvm_mwait_in_guest(struct kvm *kvm)
> >>>>  	return kvm->arch.mwait_in_guest;
> >>>>  }
> >>>>  
> >>>> +static inline bool kvm_hlt_in_guest(struct kvm *kvm)
> >>>> +{
> >>>> +	return kvm->arch.hlt_in_guest;
> >>>> +}
> >>>> +
> >>>>  #endif
> >>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>>> index 282d7613..ff8f266 100644
> >>>> --- a/include/uapi/linux/kvm.h
> >>>> +++ b/include/uapi/linux/kvm.h
> >>>> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
> >>>>  #define KVM_CAP_HYPERV_SYNIC2 148
> >>>>  #define KVM_CAP_HYPERV_VP_INDEX 149
> >>>>  #define KVM_CAP_S390_AIS_MIGRATION 150
> >>>> +#define KVM_CAP_X86_GUEST_HLT 151
> >>>>  
> >>>>  #ifdef KVM_CAP_IRQ_ROUTING
> >>>>  
> >>>> -- 
> >>>> 2.3.1.dirty
> > 
> > .
> > 
> 
> 
> -- 
> Regards,
> Longpeng(Mike)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities
       [not found]     ` <8971d9e0-388c-9934-1ab2-33508cbbeb8f@redhat.com>
  2017-11-28 10:42       ` Jan H. Schönherr
@ 2017-11-28 14:08       ` Michael S. Tsirkin
       [not found]         ` <e61d93f0-17d9-d182-83ae-b7165ae3dcb0@redhat.com>
  1 sibling, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-28 14:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan H. Schönherr, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Tue, Nov 28, 2017 at 09:42:41AM +0100, Paolo Bonzini wrote:
> On 28/11/2017 01:15, Jan H. Schönherr wrote:
> > On 11/27/2017 12:46 PM, Paolo Bonzini wrote:
> >> On 25/11/2017 14:09, Jan H. Schönherr wrote:
> >>> KVM no longer intercepts MWAIT instructions since Linux 4.12 commit 668fffa3f838
> >>> ("kvm: better MWAIT emulation for guests") for improved latency in some
> >>> workloads.
> >>>
> >>> This series takes the idea further and makes the interception of HLT (patch 2)
> >>> and PAUSE (patch 3) optional as well for similar reasons. Both interceptions
> >>> can be turned off for an individual VM by enabling the corresponding capability.
> >>>
> >>> It also converts KVM_CAP_X86_GUEST_MWAIT into an initially disabled capability
> >>> that has to be enabled explicitly for a VM. This restores pre Linux 4.12
> >>> behavior in the default case, so that a guest cannot put CPUs into low power
> >>> states, which may exceed the host's latency requirements (patch 1).
> >>
> >> Nice!  Regarding the userspace ABI, we could have a single capability
> >> KVM_CAP_X86_DISABLE_EXITS that is just KVM_CAP_X86_GUEST_MWAIT renamed.
> >> The value returned by KVM_CHECK_EXTENSION would be defined like this:
> >>
> >> - if bit 16 is 0, bit 0..15 says which exits are disabled
> >>
> >> - if bit 16 is 1, no exits are disabled by default but the capability
> >> supports KVM_ENABLE_CAP
> > 
> > ... and bits 0..15 indicate which exits can be disabled by specifying
> > them as argument to KVM_ENABLE_CAP?
> 
> Yes.
> 
> >> With
> >>
> >> #define KVM_X86_DISABLE_EXITS_MWAIT		1
> >> #define KVM_X86_DISABLE_EXITS_HLT		2
> >> #define KVM_X86_DISABLE_EXITS_PAUSE		4
> >> #define KVM_X86_DISABLE_EXITS_WITH_ENABLE	0x10000
> > 
> > Is that bit 16 an attempt at backwards compatibility with the current state?
> > That would only work, if any potential user actually checks with "==1" instead
> > of "!=0".
> 
> Fair enough.  Let's get rid of KVM_CAP_X86_GUEST_MWAIT altogether and
> add a new capability without bit 16.  But let's use just one.
> 
> > What is the benefit of doing this with bitmasks as opposed to separate
> > capabilities?
> 
> The three capabilities are more or less all doing the same thing.
> Perhaps it would make some sense to only leave PAUSE spin loops in
> guest, but not HLT/MWAIT; but apart from that I think users would
> probably enable all of them.

I am not sure I agree.  I think guests still want some way to halt when
giving up CPU for a long time.

If you are not worried about guests entering low power states,
then you only need MWAIT and maybe PAUSE.

HLT within guest only makes sense if you do not want to
allow guest to enter power state.

If you don't exit on any of these, you want some other way
to actually halt the VCPU. Maybe add an IO register for that.

>  So I think we should put in the
> documentation that blindly passing the KVM_CHECK_EXTENSION result to
> KVM_ENABLE_CAP is a valid thing to do when vCPUs are associated to
> dedicated physical CPUs.
> 
> Paolo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
  2017-11-27 18:13   ` Jim Mattson
       [not found]     ` <82e1f7c8-fdd6-835e-319a-bec72d771ef9@redhat.com>
@ 2017-11-28 23:58     ` Jan H. Schönherr
  2017-11-29 16:58       ` Radim Krčmář
  1 sibling, 1 reply; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-28 23:58 UTC (permalink / raw)
  To: Jim Mattson, Radim Krčmář
  Cc: Paolo Bonzini, Joerg Roedel, Michael S. Tsirkin,
	KarimAllah Ahmed, kvm list

On 11/27/2017 07:13 PM, Jim Mattson wrote:
> I don't understand the concern regarding CPUID5_ECX_INTERRUPT_BREAK.
> Even if the CPU has this feature, can't the guest bypass it by
> disabling interrupts and invoking MWAIT with bit 0 of ECX clear?

Note, that there was this series at some point:

  [PATCH 0/4] KVM: x86: kvm_mwait_in_guest() cleanup and fixes
  https://www.spinics.net/lists/kvm/msg149238.html

That throws away most of the function.
No sure, what happened to that. Radim?

Regards
Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] KVM: Add capability to not exit on PAUSE
  2017-11-28  3:37   ` Longpeng (Mike)
@ 2017-11-29  0:09     ` Jan H. Schönherr
  2017-11-29  4:34       ` Longpeng (Mike)
  0 siblings, 1 reply; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-29  0:09 UTC (permalink / raw)
  To: Longpeng (Mike)
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm

On 11/28/2017 04:37 AM, Longpeng (Mike) wrote:
> 
> On 2017/11/25 21:09, Jan H. Schönherr wrote:
> 
>> Allow to disable pause loop exit/pause filtering on a per VM basis.
>>
>> If some VMs have dedicated host CPUs, they won't be negatively affected
>> due to needlessly intercepted PAUSE instructions.
>>
> Hi Jan,
> 
> Is there any difference between 'disable PLE in vmcs' and 'make ple_gap per
> VM/VCPU and set ple_gap=0 for vcpus which is dedicated' ?

"Just" disabling PLE in vmcs would still call into some of the PLE window
adjustment paths and potentially do some VMCS writes at times. My patch should
have eliminated these cases as well.

However, making all the PLE configuration knobs per VM has the difficulty
that you'd need to be able to specify them in some way. That would not only
be a x86 specific interface, but a VMX specific one as well. VMX-PLE and
SVM-PF don't look compatible enough for a shared configuration.

That's why I only went for the binary on/off interface.

Regards
Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities
       [not found]         ` <e61d93f0-17d9-d182-83ae-b7165ae3dcb0@redhat.com>
@ 2017-11-29  0:20           ` Michael S. Tsirkin
  2017-11-29  0:24             ` Michael S. Tsirkin
       [not found]             ` <8e559062-e459-5a85-a4a3-72a4baf7764c@redhat.com>
  0 siblings, 2 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-29  0:20 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan H. Schönherr, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Wed, Nov 29, 2017 at 01:07:32AM +0100, Paolo Bonzini wrote:
> On 28/11/2017 15:08, Michael S. Tsirkin wrote:
> > I think guests still want some way to halt when
> > giving up CPU for a long time.
> > 
> > If you are not worried about guests entering low power states,
> > then you only need MWAIT and maybe PAUSE.
> > 
> > HLT within guest only makes sense if you do not want to
> > allow guest to enter power state.
> > 
> > If you don't exit on any of these, you want some other way
> > to actually halt the VCPU.
> 
> If you want to do something in userspace, send a signal.  Otherwise, it
> doesn't really matter (if you have a dedicated physical CPU) whether the
> task is runnable or not, as long as the CPU isn't in C0.
> 
> Paolo

If VCPU wants to give up its timeslice, how is it supposed to do it
if all exits are blocked?

-- 
MST

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities
  2017-11-29  0:20           ` Michael S. Tsirkin
@ 2017-11-29  0:24             ` Michael S. Tsirkin
       [not found]             ` <8e559062-e459-5a85-a4a3-72a4baf7764c@redhat.com>
  1 sibling, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-29  0:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan H. Schönherr, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Wed, Nov 29, 2017 at 02:20:34AM +0200, Michael S. Tsirkin wrote:
> On Wed, Nov 29, 2017 at 01:07:32AM +0100, Paolo Bonzini wrote:
> > On 28/11/2017 15:08, Michael S. Tsirkin wrote:
> > > I think guests still want some way to halt when
> > > giving up CPU for a long time.
> > > 
> > > If you are not worried about guests entering low power states,
> > > then you only need MWAIT and maybe PAUSE.
> > > 
> > > HLT within guest only makes sense if you do not want to
> > > allow guest to enter power state.
> > > 
> > > If you don't exit on any of these, you want some other way
> > > to actually halt the VCPU.
> > 
> > If you want to do something in userspace, send a signal.  Otherwise, it
> > doesn't really matter (if you have a dedicated physical CPU) whether the
> > task is runnable or not, as long as the CPU isn't in C0.
> > 
> > Paolo
> 
> If VCPU wants to give up its timeslice, how is it supposed to do it
> if all exits are blocked?

All this doesn't mean the patch can't be applied. But I think
it would be handy to add another way to exit to host
and stop until an interrupt. E.g. an access to a special
address could do it.

> -- 
> MST

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] KVM: Add capability to not exit on PAUSE
  2017-11-29  0:09     ` Jan H. Schönherr
@ 2017-11-29  4:34       ` Longpeng (Mike)
  2017-11-29 12:20         ` Jan H. Schönherr
  0 siblings, 1 reply; 33+ messages in thread
From: Longpeng (Mike) @ 2017-11-29  4:34 UTC (permalink / raw)
  To: "Jan H. Schönherr"
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm



On 2017/11/29 8:09, Jan H. Schönherr wrote:

> On 11/28/2017 04:37 AM, Longpeng (Mike) wrote:
>>
>> On 2017/11/25 21:09, Jan H. Schönherr wrote:
>>
>>> Allow to disable pause loop exit/pause filtering on a per VM basis.
>>>
>>> If some VMs have dedicated host CPUs, they won't be negatively affected
>>> due to needlessly intercepted PAUSE instructions.
>>>
>> Hi Jan,
>>
>> Is there any difference between 'disable PLE in vmcs' and 'make ple_gap per
>> VM/VCPU and set ple_gap=0 for vcpus which is dedicated' ?
> 
> "Just" disabling PLE in vmcs would still call into some of the PLE window
> adjustment paths and potentially do some VMCS writes at times. My patch should
> have eliminated these cases as well.
> 

Ah, I see, thanks. :)

We used a proprietary test suite to benchmark performance with ple_gap=0 and
ple_gap=1 (very little VMexits due to PLE in kvm trace), we found that ple_gap=1
is always better than ple_gap=0, we don't know why, maybe impacts the hardware
logical. Do you have any idea about this ?

> However, making all the PLE configuration knobs per VM has the difficulty
> that you'd need to be able to specify them in some way. That would not only
> be a x86 specific interface, but a VMX specific one as well. VMX-PLE and
> SVM-PF don't look compatible enough for a shared configuration.
> 
> That's why I only went for the binary on/off interface.
> 

Yes, I think your patch is pretty good.

> Regards
> Jan
> 
> 
> 
> .
> 


-- 
Regards,
Longpeng(Mike)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] KVM: Add capability to not exit on PAUSE
  2017-11-29  4:34       ` Longpeng (Mike)
@ 2017-11-29 12:20         ` Jan H. Schönherr
  0 siblings, 0 replies; 33+ messages in thread
From: Jan H. Schönherr @ 2017-11-29 12:20 UTC (permalink / raw)
  To: Longpeng (Mike)
  Cc: Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Michael S. Tsirkin, KarimAllah Ahmed, kvm

On 11/29/2017 05:34 AM, Longpeng (Mike) wrote:
> We used a proprietary test suite to benchmark performance with ple_gap=0 and
> ple_gap=1 (very little VMexits due to PLE in kvm trace), we found that ple_gap=1
> is always better than ple_gap=0, we don't know why, maybe impacts the hardware
> logical. Do you have any idea about this ?

No. Sorry.

Regards
Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities
       [not found]             ` <8e559062-e459-5a85-a4a3-72a4baf7764c@redhat.com>
@ 2017-11-29 15:13               ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2017-11-29 15:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan H. Schönherr, Radim Krčmář,
	Joerg Roedel, KarimAllah Ahmed, kvm

On Wed, Nov 29, 2017 at 09:43:22AM +0100, Paolo Bonzini wrote:
> On 29/11/2017 01:20, Michael S. Tsirkin wrote:
> > On Wed, Nov 29, 2017 at 01:07:32AM +0100, Paolo Bonzini wrote:
> >> On 28/11/2017 15:08, Michael S. Tsirkin wrote:
> >>> I think guests still want some way to halt when
> >>> giving up CPU for a long time.
> >>>
> >>> If you are not worried about guests entering low power states,
> >>> then you only need MWAIT and maybe PAUSE.
> >>>
> >>> HLT within guest only makes sense if you do not want to
> >>> allow guest to enter power state.
> >>>
> >>> If you don't exit on any of these, you want some other way
> >>> to actually halt the VCPU.
> >>
> >> If you want to do something in userspace, send a signal.  Otherwise, it
> >> doesn't really matter (if you have a dedicated physical CPU) whether the
> >> task is runnable or not, as long as the CPU isn't in C0.
> > 
> > If VCPU wants to give up its timeslice, how is it supposed to do it
> > if all exits are blocked?
> 
> It's the host userspace who told KVM it doesn't care.  Why would the
> vCPU care?
> 
> Paolo

I guess you assume this will only be used for dedicated host CPUs.
Fair enough. And please do not take this as critique of the patchset,
I'm just trying to look a bit further.

My assumption is that some userspace might want to enable mwait
exiting due to latency concerns. To get low latency wakeups
it might also want to disable HLT exiting. How does one
then give up the timeslice if it's known to be halted for a long time?

I guess we could allow userspace to register an address,
IO at that address would halt the VCPU.


-- 
MST

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] KVM: Don't enable MWAIT in guest by default
  2017-11-28 23:58     ` Jan H. Schönherr
@ 2017-11-29 16:58       ` Radim Krčmář
  0 siblings, 0 replies; 33+ messages in thread
From: Radim Krčmář @ 2017-11-29 16:58 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Jim Mattson, Paolo Bonzini, Joerg Roedel, Michael S. Tsirkin,
	KarimAllah Ahmed, kvm list

2017-11-29 00:58+0100, Jan H. Schönherr:
> On 11/27/2017 07:13 PM, Jim Mattson wrote:
> > I don't understand the concern regarding CPUID5_ECX_INTERRUPT_BREAK.
> > Even if the CPU has this feature, can't the guest bypass it by
> > disabling interrupts and invoking MWAIT with bit 0 of ECX clear?
> 
> Note, that there was this series at some point:
> 
>   [PATCH 0/4] KVM: x86: kvm_mwait_in_guest() cleanup and fixes
>   https://www.spinics.net/lists/kvm/msg149238.html
> 
> That throws away most of the function.
> No sure, what happened to that. Radim?

I forgot about that while waiting whether its discussion would continue.
Going to send v2,

thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-11-29 16:58 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-25 13:09 [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities Jan H. Schönherr
2017-11-25 13:09 ` [PATCH 1/3] KVM: Don't enable MWAIT in guest by default Jan H. Schönherr
2017-11-27 18:13   ` Jim Mattson
     [not found]     ` <82e1f7c8-fdd6-835e-319a-bec72d771ef9@redhat.com>
2017-11-27 18:32       ` Jim Mattson
2017-11-28 23:58     ` Jan H. Schönherr
2017-11-29 16:58       ` Radim Krčmář
2017-11-27 20:46   ` Michael S. Tsirkin
2017-11-27 22:36     ` Jan H. Schönherr
2017-11-28 14:00       ` Michael S. Tsirkin
2017-11-27 20:50   ` Michael S. Tsirkin
     [not found]     ` <90f7f081-95d7-f573-8b57-5c6e86fd2a8d@redhat.com>
2017-11-27 20:57       ` Michael S. Tsirkin
2017-11-25 13:09 ` [PATCH 2/3] KVM: Add capability to not exit on HLT Jan H. Schönherr
2017-11-27  1:32   ` Wanpeng Li
2017-11-27  1:47     ` Wanpeng Li
     [not found]       ` <a2f4cf7f-5d7b-a1cc-30d5-d18df4d49173@redhat.com>
2017-11-27 12:29         ` Jan H. Schönherr
     [not found]     ` <421c71fd-6dff-c01e-9e78-42f114711ea9@redhat.com>
2017-11-27 15:27       ` Jan H. Schönherr
     [not found]   ` <e17ea420-c141-18b6-2622-e33a3f540c61@redhat.com>
2017-11-27 16:12     ` Jan H. Schönherr
2017-11-27 20:45   ` Michael S. Tsirkin
     [not found]     ` <8ce45bad-b43c-4e97-aa69-74d7fc9cecb5@redhat.com>
2017-11-27 20:55       ` Michael S. Tsirkin
2017-11-28  1:34         ` Longpeng (Mike)
2017-11-28 14:04           ` Michael S. Tsirkin
2017-11-25 13:09 ` [PATCH 3/3] KVM: Add capability to not exit on PAUSE Jan H. Schönherr
2017-11-27 20:48   ` Michael S. Tsirkin
2017-11-28  3:37   ` Longpeng (Mike)
2017-11-29  0:09     ` Jan H. Schönherr
2017-11-29  4:34       ` Longpeng (Mike)
2017-11-29 12:20         ` Jan H. Schönherr
     [not found] ` <a3c80a22-ff69-fa51-ea90-48f039eb449a@redhat.com>
2017-11-28  0:15   ` [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities Jan H. Schönherr
     [not found]     ` <8971d9e0-388c-9934-1ab2-33508cbbeb8f@redhat.com>
2017-11-28 10:42       ` Jan H. Schönherr
2017-11-28 14:08       ` Michael S. Tsirkin
     [not found]         ` <e61d93f0-17d9-d182-83ae-b7165ae3dcb0@redhat.com>
2017-11-29  0:20           ` Michael S. Tsirkin
2017-11-29  0:24             ` Michael S. Tsirkin
     [not found]             ` <8e559062-e459-5a85-a4a3-72a4baf7764c@redhat.com>
2017-11-29 15:13               ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.