[RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability
@ 2021-12-21  9:04 Kechen Lu
  2021-12-21  9:04 ` [RFC PATCH v2 1/3] KVM: x86: only allow exits disable before vCPUs created Kechen Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Kechen Lu @ 2021-12-21  9:04 UTC (permalink / raw)
  To: kvm, pbonzini, seanjc
  Cc: wanpengli, vkuznets, mst, somduttar, kechenl, linux-kernel

Summary
===========
Introduce support of vCPU-scoped ioctl with KVM_CAP_X86_DISABLE_EXITS
cap for disabling exits to enable finer-grained VM exits disabling
on per vCPU scales instead of whole guest. This patch series enabled
the vCPU-scoped exits control on HLT VM-exits.

Motivation
============
In use cases like Windows guest running heavy CPU-bound
workloads, disabling HLT VM-exits could mitigate host sched ctx switch
overhead. Simply HLT disabling on all vCPUs could bring
performance benefits, but if no pCPUs reserved for host threads, could
happened to the forced preemption as host does not know the time to do
the schedule for other host threads want to run. With this patch, we
could only disable part of vCPUs HLT exits for one guest, this still
keeps performance benefits, and also shows resiliency to host stressing
workload running at the same time.

Performance and Testing
=========================
In the host stressing workload experiment with Windows guest heavy
CPU-bound workloads, it shows good resiliency and having the ~3%
performance improvement. E.g. Passmark running in a Windows guest
with this patch disabling HLT exits on only half of vCPUs still
showing 2.4% higher main score v/s baseline.

Tested everything on AMD machines.

v1->v2 (Sean Christopherson) :
- Add explicit restriction for VM-scoped exits disabling to be called
  before vCPUs creation (patch 1)
- Use vCPU ioctl instead of 64bit vCPU bitmask (patch 3), and make exits
  disable flags check purely for vCPU instead of VM (patch 2)

Best Regards,
Kechen

Kechen Lu (3):
  KVM: x86: only allow exits disable before vCPUs created
  KVM: x86: move ()_in_guest checking to vCPU scope
  KVM: x86: add vCPU ioctl for HLT exits disable capability

 Documentation/virt/kvm/api.rst     |  4 +++-
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  7 +++++++
 arch/x86/kvm/cpuid.c               |  2 +-
 arch/x86/kvm/lapic.c               |  2 +-
 arch/x86/kvm/svm/svm.c             | 20 +++++++++++++++-----
 arch/x86/kvm/vmx/vmx.c             | 26 ++++++++++++++++++--------
 arch/x86/kvm/x86.c                 | 24 +++++++++++++++++++++++-
 arch/x86/kvm/x86.h                 | 16 ++++++++--------
 9 files changed, 77 insertions(+), 25 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v2 1/3] KVM: x86: only allow exits disable before vCPUs created
  2021-12-21  9:04 [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability Kechen Lu
@ 2021-12-21  9:04 ` Kechen Lu
  2022-01-10 18:50   ` Sean Christopherson
  2021-12-21  9:04 ` [RFC PATCH v2 2/3] KVM: x86: move ()_in_guest checking to vCPU scope Kechen Lu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Kechen Lu @ 2021-12-21  9:04 UTC (permalink / raw)
  To: kvm, pbonzini, seanjc
  Cc: wanpengli, vkuznets, mst, somduttar, kechenl, linux-kernel

Since VMX and SVM both would never update the control bits if exits
are disable after vCPUs are created, only allow setting exits
disable flag before vCPU creation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kechen Lu <kechenl@nvidia.com>
---
 Documentation/virt/kvm/api.rst | 1 +
 arch/x86/kvm/x86.c             | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index aeeb071c7688..d1c50b95bbc1 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6581,6 +6581,7 @@ branch to guests' 0x200 interrupt vector.
 :Architectures: x86
 :Parameters: args[0] defines which exits are disabled
 :Returns: 0 on success, -EINVAL when args[0] contains invalid exits
+          or if any vCPU has already been created
 
 Valid bits in args[0] are::
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0cf1082455df..37529c0c279d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5764,6 +5764,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
 			break;
 
+		mutex_lock(&kvm->lock);
+		if (kvm->created_vcpus)
+			goto disable_exits_unlock;
+
 		if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
 			kvm_can_mwait_in_guest())
 			kvm->arch.mwait_in_guest = true;
@@ -5774,6 +5778,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
 			kvm->arch.cstate_in_guest = true;
 		r = 0;
+disable_exits_unlock:
+		mutex_unlock(&kvm->lock);
 		break;
 	case KVM_CAP_MSR_PLATFORM_INFO:
 		kvm->arch.guest_can_read_msr_platform_info = cap->args[0];
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v2 2/3] KVM: x86: move ()_in_guest checking to vCPU scope
  2021-12-21  9:04 [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability Kechen Lu
  2021-12-21  9:04 ` [RFC PATCH v2 1/3] KVM: x86: only allow exits disable before vCPUs created Kechen Lu
@ 2021-12-21  9:04 ` Kechen Lu
  2022-01-10 19:35   ` Sean Christopherson
  2021-12-21  9:04 ` [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability Kechen Lu
  2022-01-10 21:18 ` [RFC PATCH v2 0/3] KVM: x86: add per-vCPU " Michael S. Tsirkin
  3 siblings, 1 reply; 14+ messages in thread
From: Kechen Lu @ 2021-12-21  9:04 UTC (permalink / raw)
  To: kvm, pbonzini, seanjc
  Cc: wanpengli, vkuznets, mst, somduttar, kechenl, linux-kernel

For futher extensions on finer-grained control on per-vCPU exits
disable control, and because VM-scoped restricted to set before
vCPUs creation, runtime disabled exits flag check could be purely
vCPU scope.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kechen Lu <kechenl@nvidia.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++++
 arch/x86/kvm/cpuid.c            |  2 +-
 arch/x86/kvm/lapic.c            |  2 +-
 arch/x86/kvm/svm/svm.c          | 10 +++++-----
 arch/x86/kvm/vmx/vmx.c          | 16 ++++++++--------
 arch/x86/kvm/x86.c              |  6 +++++-
 arch/x86/kvm/x86.h              | 16 ++++++++--------
 7 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2164b9f4c7b0..edc5fca4d8c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -908,6 +908,11 @@ struct kvm_vcpu_arch {
 #if IS_ENABLED(CONFIG_HYPERV)
 	hpa_t hv_root_tdp;
 #endif
+
+	bool mwait_in_guest;
+	bool hlt_in_guest;
+	bool pause_in_guest;
+	bool cstate_in_guest;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 07e9215e911d..6291e15710ba 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -177,7 +177,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
 	best = kvm_find_kvm_cpuid_features(vcpu);
-	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
+	if (kvm_hlt_in_guest(vcpu) && best &&
 		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
 		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f206fc35deff..effb994e6642 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -119,7 +119,7 @@ static bool kvm_can_post_timer_interrupt(struct kvm_vcpu *vcpu)
 bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu)
 {
 	return kvm_x86_ops.set_hv_timer
-	       && !(kvm_mwait_in_guest(vcpu->kvm) ||
+	       && !(kvm_mwait_in_guest(vcpu) ||
 		    kvm_can_post_timer_interrupt(vcpu));
 }
 EXPORT_SYMBOL_GPL(kvm_can_use_hv_timer);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d0f68d11ec70..70e393c6dfb5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1271,12 +1271,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	svm_set_intercept(svm, INTERCEPT_RDPRU);
 	svm_set_intercept(svm, INTERCEPT_RSM);
 
-	if (!kvm_mwait_in_guest(vcpu->kvm)) {
+	if (!kvm_mwait_in_guest(vcpu)) {
 		svm_set_intercept(svm, INTERCEPT_MONITOR);
 		svm_set_intercept(svm, INTERCEPT_MWAIT);
 	}
 
-	if (!kvm_hlt_in_guest(vcpu->kvm))
+	if (!kvm_hlt_in_guest(vcpu))
 		svm_set_intercept(svm, INTERCEPT_HLT);
 
 	control->iopm_base_pa = __sme_set(iopm_base);
@@ -1320,7 +1320,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	svm->nested.vmcb12_gpa = INVALID_GPA;
 	svm->nested.last_vmcb12_gpa = INVALID_GPA;
 
-	if (!kvm_pause_in_guest(vcpu->kvm)) {
+	if (!kvm_pause_in_guest(vcpu)) {
 		control->pause_filter_count = pause_filter_count;
 		if (pause_filter_thresh)
 			control->pause_filter_thresh = pause_filter_thresh;
@@ -3095,7 +3095,7 @@ static int pause_interception(struct kvm_vcpu *vcpu)
 	 */
 	in_kernel = !sev_es_guest(vcpu->kvm) && svm_get_cpl(vcpu) == 0;
 
-	if (!kvm_pause_in_guest(vcpu->kvm))
+	if (!kvm_pause_in_guest(vcpu))
 		grow_ple_window(vcpu);
 
 	kvm_vcpu_on_spin(vcpu, in_kernel);
@@ -4308,7 +4308,7 @@ static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 
 static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
-	if (!kvm_pause_in_guest(vcpu->kvm))
+	if (!kvm_pause_in_guest(vcpu))
 		shrink_ple_window(vcpu);
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5aadad3e7367..b5133619dea1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1585,7 +1585,7 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
 	 * then the instruction is already executing and RIP has already been
 	 * advanced.
 	 */
-	if (kvm_hlt_in_guest(vcpu->kvm) &&
+	if (kvm_hlt_in_guest(vcpu) &&
 			vmcs_read32(GUEST_ACTIVITY_STATE) == GUEST_ACTIVITY_HLT)
 		vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
 }
@@ -4120,10 +4120,10 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
 		exec_control |= CPU_BASED_CR3_STORE_EXITING |
 				CPU_BASED_CR3_LOAD_EXITING  |
 				CPU_BASED_INVLPG_EXITING;
-	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
+	if (kvm_mwait_in_guest(&vmx->vcpu))
 		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
 				CPU_BASED_MONITOR_EXITING);
-	if (kvm_hlt_in_guest(vmx->vcpu.kvm))
+	if (kvm_hlt_in_guest(&vmx->vcpu))
 		exec_control &= ~CPU_BASED_HLT_EXITING;
 	return exec_control;
 }
@@ -4202,7 +4202,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 	}
 	if (!enable_unrestricted_guest)
 		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
-	if (kvm_pause_in_guest(vmx->vcpu.kvm))
+	if (kvm_pause_in_guest(vcpu))
 		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 	if (!kvm_vcpu_apicv_active(vcpu))
 		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
@@ -4305,7 +4305,7 @@ static void init_vmcs(struct vcpu_vmx *vmx)
 		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
 	}
 
-	if (!kvm_pause_in_guest(vmx->vcpu.kvm)) {
+	if (!kvm_pause_in_guest(&vmx->vcpu)) {
 		vmcs_write32(PLE_GAP, ple_gap);
 		vmx->ple_window = ple_window;
 		vmx->ple_window_dirty = true;
@@ -5412,7 +5412,7 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
-	if (!kvm_pause_in_guest(vcpu->kvm))
+	if (!kvm_pause_in_guest(vcpu))
 		grow_ple_window(vcpu);
 
 	/*
@@ -6859,7 +6859,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
 	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
 	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
-	if (kvm_cstate_in_guest(vcpu->kvm)) {
+	if (kvm_cstate_in_guest(vcpu)) {
 		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
 		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
 		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
@@ -7401,7 +7401,7 @@ static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
 
 static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
-	if (!kvm_pause_in_guest(vcpu->kvm))
+	if (!kvm_pause_in_guest(vcpu))
 		shrink_ple_window(vcpu);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 37529c0c279d..d5d0d99b584e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10941,6 +10941,10 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 #if IS_ENABLED(CONFIG_HYPERV)
 	vcpu->arch.hv_root_tdp = INVALID_PAGE;
 #endif
+	vcpu->arch.mwait_in_guest = vcpu->kvm->arch.mwait_in_guest;
+	vcpu->arch.hlt_in_guest = vcpu->kvm->arch.hlt_in_guest;
+	vcpu->arch.pause_in_guest = vcpu->kvm->arch.pause_in_guest;
+	vcpu->arch.cstate_in_guest = vcpu->kvm->arch.cstate_in_guest;
 
 	r = static_call(kvm_x86_vcpu_create)(vcpu);
 	if (r)
@@ -12086,7 +12090,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
 		     vcpu->arch.exception.pending))
 		return false;
 
-	if (kvm_hlt_in_guest(vcpu->kvm) && !kvm_can_deliver_async_pf(vcpu))
+	if (kvm_hlt_in_guest(vcpu) && !kvm_can_deliver_async_pf(vcpu))
 		return false;
 
 	/*
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 4abcd8d9836d..0da4b09535a5 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -372,24 +372,24 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
 	    __rem;						\
 	 })
 
-static inline bool kvm_mwait_in_guest(struct kvm *kvm)
+static inline bool kvm_mwait_in_guest(struct kvm_vcpu *vcpu)
 {
-	return kvm->arch.mwait_in_guest;
+	return vcpu->arch.mwait_in_guest;
 }
 
-static inline bool kvm_hlt_in_guest(struct kvm *kvm)
+static inline bool kvm_hlt_in_guest(struct kvm_vcpu *vcpu)
 {
-	return kvm->arch.hlt_in_guest;
+	return vcpu->arch.hlt_in_guest;
 }
 
-static inline bool kvm_pause_in_guest(struct kvm *kvm)
+static inline bool kvm_pause_in_guest(struct kvm_vcpu *vcpu)
 {
-	return kvm->arch.pause_in_guest;
+	return vcpu->arch.pause_in_guest;
 }
 
-static inline bool kvm_cstate_in_guest(struct kvm *kvm)
+static inline bool kvm_cstate_in_guest(struct kvm_vcpu *vcpu)
 {
-	return kvm->arch.cstate_in_guest;
+	return vcpu->arch.cstate_in_guest;
 }
 
 DECLARE_PER_CPU(struct kvm_vcpu *, current_vcpu);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability
  2021-12-21  9:04 [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability Kechen Lu
  2021-12-21  9:04 ` [RFC PATCH v2 1/3] KVM: x86: only allow exits disable before vCPUs created Kechen Lu
  2021-12-21  9:04 ` [RFC PATCH v2 2/3] KVM: x86: move ()_in_guest checking to vCPU scope Kechen Lu
@ 2021-12-21  9:04 ` Kechen Lu
  2022-01-10 20:56   ` Sean Christopherson
  2022-01-10 21:18 ` [RFC PATCH v2 0/3] KVM: x86: add per-vCPU " Michael S. Tsirkin
  3 siblings, 1 reply; 14+ messages in thread
From: Kechen Lu @ 2021-12-21  9:04 UTC (permalink / raw)
  To: kvm, pbonzini, seanjc
  Cc: wanpengli, vkuznets, mst, somduttar, kechenl, linux-kernel

Introduce support of vCPU-scoped ioctl with KVM_CAP_X86_DISABLE_EXITS
cap for disabling exits to enable finer-grained VM exits disabling
on per vCPU scales instead of whole guest. This patch enabled
the vCPU-scoped exits control on HLT VM-exits.

In use cases like Windows guest running heavy CPU-bound
workloads, disabling HLT VM-exits could mitigate host sched ctx switch
overhead. Simply HLT disabling on all vCPUs could bring
performance benefits, but if no pCPUs reserved for host threads, could
happened to the forced preemption as host does not know the time to do
the schedule for other host threads want to run. With this patch, we
could only disable part of vCPUs HLT exits for one guest, this still
keeps performance benefits, and also shows resiliency to host stressing
workload running at the same time.

In the host stressing workload experiment with Windows guest heavy
CPU-bound workloads, it shows good resiliency and having the ~3%
performance improvement.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kechen Lu <kechenl@nvidia.com>
---
 Documentation/virt/kvm/api.rst     |  5 +++--
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/svm/svm.c             | 10 ++++++++++
 arch/x86/kvm/vmx/vmx.c             | 10 ++++++++++
 arch/x86/kvm/x86.c                 | 12 ++++++++++++
 6 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index d1c50b95bbc1..b340e36a34f3 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6581,7 +6581,7 @@ branch to guests' 0x200 interrupt vector.
 :Architectures: x86
 :Parameters: args[0] defines which exits are disabled
 :Returns: 0 on success, -EINVAL when args[0] contains invalid exits
-          or if any vCPU has already been created
+          or if any vCPU has already been created for vm ioctl
 
 Valid bits in args[0] are::
 
@@ -6595,7 +6595,8 @@ longer intercept some instructions for improved latency in some
 workloads, and is suggested when vCPUs are associated to dedicated
 physical CPUs.  More bits can be added in the future; userspace can
 just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable
-all such vmexits.
+all such vmexits. vCPUs scoped capability support is also added for
+HLT exits.
 
 Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
 
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index cefe1d81e2e8..3e54400535f2 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -121,6 +121,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush)
 KVM_X86_OP_NULL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP_NULL(complete_emulated_msr)
+KVM_X86_OP(update_disabled_exits)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_NULL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index edc5fca4d8c8..20a1f34c772c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1498,6 +1498,8 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+
+	void (*update_disabled_exits)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 70e393c6dfb5..6cad069a540a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4556,6 +4556,14 @@ static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 	sev_vcpu_deliver_sipi_vector(vcpu, vector);
 }
 
+static void svm_update_disabled_exits(struct kvm_vcpu *vcpu)
+{
+	if (kvm_hlt_in_guest(vcpu))
+		svm_clr_intercept(to_svm(vcpu), INTERCEPT_HLT);
+	else
+		svm_set_intercept(to_svm(vcpu), INTERCEPT_HLT);
+}
+
 static void svm_vm_destroy(struct kvm *kvm)
 {
 	avic_vm_destroy(kvm);
@@ -4705,6 +4713,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
+
+	.update_disabled_exits = svm_update_disabled_exits,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b5133619dea1..149eb621b124 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7536,6 +7536,14 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
 	return supported & BIT(bit);
 }
 
+static void vmx_update_disabled_exits(struct kvm_vcpu *vcpu)
+{
+	if (kvm_hlt_in_guest(vcpu))
+		exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_HLT_EXITING);
+	else
+		exec_controls_setbit(to_vmx(vcpu), CPU_BASED_HLT_EXITING);
+}
+
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.name = "kvm_intel",
 
@@ -7672,6 +7680,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.complete_emulated_msr = kvm_complete_insn_gp,
 
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
+
+	.update_disabled_exits = vmx_update_disabled_exits,
 };
 
 static __init void vmx_setup_user_return_msrs(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d5d0d99b584e..d7b4a3e360bb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5072,6 +5072,18 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 			kvm_update_pv_runtime(vcpu);
 
 		return 0;
+
+	case KVM_CAP_X86_DISABLE_EXITS:
+		if (cap->args[0] && (cap->args[0] &
+				~KVM_X86_DISABLE_VALID_EXITS))
+			return -EINVAL;
+
+		vcpu->arch.hlt_in_guest = (cap->args[0] &
+			KVM_X86_DISABLE_EXITS_HLT) ? true : false;
+
+		static_call(kvm_x86_update_disabled_exits)(vcpu);
+		return 0;
+
 	default:
 		return -EINVAL;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2 1/3] KVM: x86: only allow exits disable before vCPUs created
  2021-12-21  9:04 ` [RFC PATCH v2 1/3] KVM: x86: only allow exits disable before vCPUs created Kechen Lu
@ 2022-01-10 18:50   ` Sean Christopherson
  2022-01-11  6:38     ` Kechen Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2022-01-10 18:50 UTC (permalink / raw)
  To: Kechen Lu
  Cc: kvm, pbonzini, wanpengli, vkuznets, mst, somduttar, linux-kernel

On Tue, Dec 21, 2021, Kechen Lu wrote:
> Since VMX and SVM both would never update the control bits if exits
> are disable after vCPUs are created, only allow setting exits
> disable flag before vCPU creation.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

For this to carry my SOB, I should be attributed as the author, or add a
Co-developed-by: for me.  I'm also totally ok with a Suggested-by: or Reported-by:

And we should at least have

  Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts")

andy maybe Cc: stable@vger.kernel.org, though I'm not entirely sure this is stable
material as it could in theory do more harm than good if there's a busted userspace
out there.

If this doesn't carry my SOB...

Reviewed-by: Sean Christopherson <seanjc@google.com>

> Signed-off-by: Kechen Lu <kechenl@nvidia.com>
> ---
>  Documentation/virt/kvm/api.rst | 1 +
>  arch/x86/kvm/x86.c             | 6 ++++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..d1c50b95bbc1 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6581,6 +6581,7 @@ branch to guests' 0x200 interrupt vector.
>  :Architectures: x86
>  :Parameters: args[0] defines which exits are disabled
>  :Returns: 0 on success, -EINVAL when args[0] contains invalid exits
> +          or if any vCPU has already been created
>  
>  Valid bits in args[0] are::
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0cf1082455df..37529c0c279d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5764,6 +5764,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
>  			break;
>  
> +		mutex_lock(&kvm->lock);
> +		if (kvm->created_vcpus)
> +			goto disable_exits_unlock;
> +
>  		if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
>  			kvm_can_mwait_in_guest())
>  			kvm->arch.mwait_in_guest = true;
> @@ -5774,6 +5778,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
>  			kvm->arch.cstate_in_guest = true;
>  		r = 0;
> +disable_exits_unlock:
> +		mutex_unlock(&kvm->lock);
>  		break;
>  	case KVM_CAP_MSR_PLATFORM_INFO:
>  		kvm->arch.guest_can_read_msr_platform_info = cap->args[0];
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2 2/3] KVM: x86: move ()_in_guest checking to vCPU scope
  2021-12-21  9:04 ` [RFC PATCH v2 2/3] KVM: x86: move ()_in_guest checking to vCPU scope Kechen Lu
@ 2022-01-10 19:35   ` Sean Christopherson
  2022-01-11  6:37     ` Kechen Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2022-01-10 19:35 UTC (permalink / raw)
  To: Kechen Lu
  Cc: kvm, pbonzini, wanpengli, vkuznets, mst, somduttar, linux-kernel

The shortlog is weird, I get what you're going for with the "()", but it honestly
looks like a typo :-)  And add "power management" so that there's a bit more context
in the shortlog?  Maybe this?

  KVM: x86: Move *_in_guest power management flags to vCPU scope

On Tue, Dec 21, 2021, Kechen Lu wrote:
> For futher extensions on finer-grained control on per-vCPU exits
> disable control, and because VM-scoped restricted to set before
> vCPUs creation, runtime disabled exits flag check could be purely
> vCPU scope.

State what the patch does, not what it "could" do.  E.g.

Make the runtime disabled mwait/hlt/pause/cstate exits flags vCPU scope
to allow finer-grained, per-vCPU control.  The VM-scoped control is only
allowed before vCPUs are created, thus preserving the existing behavior
is a simple matter of snapshotting the flags at vCPU creation.

A few nits below that aren't even from this path, but otherwise,

Reviewed-by: Sean Christopherson <seanjc@google.com>

> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Kechen Lu <kechenl@nvidia.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  5 +++++
>  arch/x86/kvm/cpuid.c            |  2 +-
>  arch/x86/kvm/lapic.c            |  2 +-
>  arch/x86/kvm/svm/svm.c          | 10 +++++-----
>  arch/x86/kvm/vmx/vmx.c          | 16 ++++++++--------
>  arch/x86/kvm/x86.c              |  6 +++++-
>  arch/x86/kvm/x86.h              | 16 ++++++++--------
>  7 files changed, 33 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2164b9f4c7b0..edc5fca4d8c8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -908,6 +908,11 @@ struct kvm_vcpu_arch {
>  #if IS_ENABLED(CONFIG_HYPERV)
>  	hpa_t hv_root_tdp;
>  #endif
> +
> +	bool mwait_in_guest;
> +	bool hlt_in_guest;
> +	bool pause_in_guest;
> +	bool cstate_in_guest;
>  };
>  
>  struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 07e9215e911d..6291e15710ba 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -177,7 +177,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
>  
>  	best = kvm_find_kvm_cpuid_features(vcpu);
> -	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> +	if (kvm_hlt_in_guest(vcpu) && best &&
>  		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))

Can you (or Paolo?) opportunistically align this?  And maybe even shuffle the
check on "best" to pair the !NULL check with the functional check?  E.g.

	if (kvm_hlt_in_guest(vcpu) &&
	    best && (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);

>  		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
>  
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index f206fc35deff..effb994e6642 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -119,7 +119,7 @@ static bool kvm_can_post_timer_interrupt(struct kvm_vcpu *vcpu)
>  bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu)
>  {
>  	return kvm_x86_ops.set_hv_timer
> -	       && !(kvm_mwait_in_guest(vcpu->kvm) ||
> +	       && !(kvm_mwait_in_guest(vcpu) ||

And another opportunistic tweak?  I'm been itching for an excuse to "fix" this
particular helper for quite some time :-)

	return kvm_x86_ops.set_hv_timer &&
	       !(kvm_mwait_in_guest(vcpu) || kvm_can_post_timer_interrupt(vcpu));

That overruns the 80 char soft limit, but IMO it's worth it in this case as the
resulting code is more readable.


>  		    kvm_can_post_timer_interrupt(vcpu));
>  }
>  EXPORT_SYMBOL_GPL(kvm_can_use_hv_timer);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability
  2021-12-21  9:04 ` [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability Kechen Lu
@ 2022-01-10 20:56   ` Sean Christopherson
  2022-01-10 21:00     ` Sean Christopherson
  2022-01-11  6:35     ` Kechen Lu
  0 siblings, 2 replies; 14+ messages in thread
From: Sean Christopherson @ 2022-01-10 20:56 UTC (permalink / raw)
  To: Kechen Lu
  Cc: kvm, pbonzini, wanpengli, vkuznets, mst, somduttar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3250 bytes --]

On Tue, Dec 21, 2021, Kechen Lu wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d5d0d99b584e..d7b4a3e360bb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5072,6 +5072,18 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  			kvm_update_pv_runtime(vcpu);
>  
>  		return 0;
> +
> +	case KVM_CAP_X86_DISABLE_EXITS:
> +		if (cap->args[0] && (cap->args[0] &
> +				~KVM_X86_DISABLE_VALID_EXITS))

Bad alignment, but there's no need for the !0 in the first place, i.e.

		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)

but that's also incomplete as this path only supports toggling HLT, yet allows
all flavors of KVM_X86_DISABLE_VALID_EXITS.  Unless there's a good reason to not
allow maniuplating the other exits, the proper fix is to just support everything.

> +			return -EINVAL;
> +
> +		vcpu->arch.hlt_in_guest = (cap->args[0] &
> +			KVM_X86_DISABLE_EXITS_HLT) ? true : false;

Hmm, this behavior diverges from the per-VM ioctl, which doesn't allow re-enabling
a disabled exit.  We can't change the per-VM behavior without breaking backwards
compatibility, e.g. if userspace does:

	if (allow_mwait)
		kvm_vm_disable_exit(KVM_X86_DISABLE_EXITS_MWAIT)
	if (allow_hlt)
		kvm_vm_disable_exit(KVM_X86_DISABLE_EXITS_HLT)

then changing KVM behavior would result in MWAIT behavior intercepted when previously
it would have been allowed.  We have a userspace VMM that operates like this...

Does your use case require toggling intercepts?  Or is the configuration static?
If it's static, then the easiest thing would be to follow the per-VM behavior so
that there are no suprises.  If toggling is required, then I think the best thing
would be to add a prep patch to add an override flag to the per-VM ioctl, and then
share code between the per-VM and per-vCPU paths for modifying the flags (attached
as patch 0003).

Somewhat related, there's another bug of sorts that I think we can safely fix.
KVM doesn't reject disabling of MWAIT exits when MWAIT isn't allowed in the guest,
and instead ignores the bad input.  Not a big deal, but fixing that means KVM
doesn't need to check kvm_can_mwait_in_guest() when processing the args to update
flags.  If that breaks someone's userspace, the alternative would be to tweak the
attached patch 0003 to introduce the OVERRIDE, e.g.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f611a49ceba4..3bac756bab79 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5053,6 +5053,8 @@ static int kvm_vcpu_ioctl_device_attr(struct kvm_vcpu *vcpu,

 #define kvm_ioctl_disable_exits(a, mask)                                    \
 ({                                                                          \
+       if (!kvm_can_mwait_in_guest())                                       \
+               (mask) &= KVM_X86_DISABLE_EXITS_MWAIT;                       \
        if ((mask) & KVM_X86_DISABLE_EXITS_OVERRIDE) {                       \
                (a).mwait_in_guest = (mask) & KVM_X86_DISABLE_EXITS_MWAIT;   \
                (a).hlt_in_guest = (mask) & KVM_X86_DISABLE_EXITS_HLT;       \


If toggling is not required, then I still think it makes sense to add a macro to
handle propagating the capability args to the arch flags.

[-- Attachment #2: 0001-KVM-x86-Reject-disabling-of-MWAIT-interception-when-.patch --]
[-- Type: text/x-diff, Size: 2358 bytes --]

From 10798eb3fc1fe7f7240acd0f2667a6519ae445c5 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 10 Jan 2022 12:29:28 -0800
Subject: [PATCH] KVM: x86: Reject disabling of MWAIT interception when not
 allowed

Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT
exits and KVM previously reported (via KVM_CHECK_EXTENSION) that MWAIT is
not allowed in guest, e.g. because it's not supported or the CPU doesn't
have an aways-running APIC timer.

Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c194a8cbd25f..9de22dceb49b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4087,6 +4087,17 @@ static inline bool kvm_can_mwait_in_guest(void)
 		boot_cpu_has(X86_FEATURE_ARAT);
 }
 
+static u64 kvm_get_allowed_disable_exits(void)
+{
+	u64 r = KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
+		KVM_X86_DISABLE_EXITS_CSTATE;
+
+	if(kvm_can_mwait_in_guest())
+		r |= KVM_X86_DISABLE_EXITS_MWAIT;
+
+	return r;
+}
+
 static int kvm_ioctl_get_supported_hv_cpuid(struct kvm_vcpu *vcpu,
 					    struct kvm_cpuid2 __user *cpuid_arg)
 {
@@ -4202,10 +4213,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = KVM_CLOCK_VALID_FLAGS;
 		break;
 	case KVM_CAP_X86_DISABLE_EXITS:
-		r |=  KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
-		      KVM_X86_DISABLE_EXITS_CSTATE;
-		if(kvm_can_mwait_in_guest())
-			r |= KVM_X86_DISABLE_EXITS_MWAIT;
+		r |= kvm_get_allowed_disable_exits();
 		break;
 	case KVM_CAP_X86_SMM:
 		/* SMBASE is usually relocated above 1M on modern chipsets,
@@ -5779,11 +5787,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	case KVM_CAP_X86_DISABLE_EXITS:
 		r = -EINVAL;
-		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
+		if (cap->args[0] & ~kvm_get_allowed_disable_exits())
 			break;
 
-		if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
-			kvm_can_mwait_in_guest())
+		if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
 			kvm->arch.mwait_in_guest = true;
 		if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
 			kvm->arch.hlt_in_guest = true;
-- 
2.34.1.575.g55b058a8bb-goog


[-- Attachment #3: 0003-KVM-x86-Let-userspace-re-enable-previously-disabled-.patch --]
[-- Type: text/x-diff, Size: 5151 bytes --]

From a8e75742293c1c400dd5aa1d111825046cf26ab2 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 10 Jan 2022 12:40:31 -0800
Subject: [PATCH] KVM: x86: Let userspace re-enable previously disabled exits

Add an OVERRIDE flag to KVM_CAP_X86_DISABLE_EXITS allow userspace to
re-enable exits and/or override previous settings.  There's no real use
case for the the per-VM ioctl, but a future per-vCPU variant wants to let
userspace toggle interception while the vCPU is running; add the OVERRIDE
functionality now to provide consistent between between the per-VM and
per-vCPU variants.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst |  5 +++++
 arch/x86/kvm/x86.c             | 37 +++++++++++++++++++++++-----------
 include/uapi/linux/kvm.h       |  4 +++-
 3 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 9460941d38d7..c42e653891a2 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6617,6 +6617,7 @@ Valid bits in args[0] are::
   #define KVM_X86_DISABLE_EXITS_HLT              (1 << 1)
   #define KVM_X86_DISABLE_EXITS_PAUSE            (1 << 2)
   #define KVM_X86_DISABLE_EXITS_CSTATE           (1 << 3)
+  #define KVM_X86_DISABLE_EXITS_OVERRIDE         (1ull << 63)
 
 Enabling this capability on a VM provides userspace with a way to no
 longer intercept some instructions for improved latency in some
@@ -6625,6 +6626,10 @@ physical CPUs.  More bits can be added in the future; userspace can
 just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable
 all such vmexits.
 
+By default, this capability only disables exits.  To re-enable an exit, or to
+override previous settings, userspace can set KVM_X86_DISABLE_EXITS_OVERRIDE,
+in which case KVM will enable/disable according to the mask (a '1' == disable).
+
 Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
 
 7.14 KVM_CAP_S390_HPAGE_1M
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7741a5980334..f611a49ceba4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4089,11 +4089,10 @@ static inline bool kvm_can_mwait_in_guest(void)
 
 static u64 kvm_get_allowed_disable_exits(void)
 {
-	u64 r = KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
-		KVM_X86_DISABLE_EXITS_CSTATE;
+	u64 r = KVM_X86_DISABLE_VALID_EXITS;
 
-	if(kvm_can_mwait_in_guest())
-		r |= KVM_X86_DISABLE_EXITS_MWAIT;
+	if (!kvm_can_mwait_in_guest())
+		r &= ~KVM_X86_DISABLE_EXITS_MWAIT;
 
 	return r;
 }
@@ -5051,6 +5050,26 @@ static int kvm_vcpu_ioctl_device_attr(struct kvm_vcpu *vcpu,
 	return r;
 }
 
+
+#define kvm_ioctl_disable_exits(a, mask)				     \
+({									     \
+	if ((mask) & KVM_X86_DISABLE_EXITS_OVERRIDE) {			     \
+		(a).mwait_in_guest = (mask) & KVM_X86_DISABLE_EXITS_MWAIT;   \
+		(a).hlt_in_guest = (mask) & KVM_X86_DISABLE_EXITS_HLT;	     \
+		(a).pause_in_guest = (mask) & KVM_X86_DISABLE_EXITS_PAUSE;   \
+		(a).cstate_in_guest = (mask) & KVM_X86_DISABLE_EXITS_CSTATE; \
+	} else {							     \
+		if ((mask) & KVM_X86_DISABLE_EXITS_MWAIT)		     \
+			(a).mwait_in_guest = true;			     \
+		if ((mask) & KVM_X86_DISABLE_EXITS_HLT)			     \
+			(a).hlt_in_guest = true;			     \
+		if ((mask) & KVM_X86_DISABLE_EXITS_PAUSE)		     \
+			(a).pause_in_guest = true;			     \
+		if ((mask) & KVM_X86_DISABLE_EXITS_CSTATE)		     \
+			(a).cstate_in_guest = true;			     \
+	}								     \
+})
+
 static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 				     struct kvm_enable_cap *cap)
 {
@@ -5794,14 +5813,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		if (kvm->created_vcpus)
 			goto disable_exits_unlock;
 
-		if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
-			kvm->arch.mwait_in_guest = true;
-		if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
-			kvm->arch.hlt_in_guest = true;
-		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
-			kvm->arch.pause_in_guest = true;
-		if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
-			kvm->arch.cstate_in_guest = true;
+		kvm_ioctl_disable_exits(kvm->arch, cap->args[0]);
+
 		r = 0;
 disable_exits_unlock:
 		mutex_unlock(&kvm->lock);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index fbfd70d965c6..5e0ffc87a578 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -798,10 +798,12 @@ struct kvm_ioeventfd {
 #define KVM_X86_DISABLE_EXITS_HLT            (1 << 1)
 #define KVM_X86_DISABLE_EXITS_PAUSE          (1 << 2)
 #define KVM_X86_DISABLE_EXITS_CSTATE         (1 << 3)
+#define KVM_X86_DISABLE_EXITS_OVERRIDE	     (1ull << 63)
 #define KVM_X86_DISABLE_VALID_EXITS          (KVM_X86_DISABLE_EXITS_MWAIT | \
                                               KVM_X86_DISABLE_EXITS_HLT | \
                                               KVM_X86_DISABLE_EXITS_PAUSE | \
-                                              KVM_X86_DISABLE_EXITS_CSTATE)
+                                              KVM_X86_DISABLE_EXITS_CSTATE | \
+					      KVM_X86_DISABLE_EXITS_OVERRIDE)
 
 /* for KVM_ENABLE_CAP */
 struct kvm_enable_cap {
-- 
2.34.1.575.g55b058a8bb-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability
  2022-01-10 20:56   ` Sean Christopherson
@ 2022-01-10 21:00     ` Sean Christopherson
  2022-01-11  6:36       ` Kechen Lu
  2022-01-11  6:35     ` Kechen Lu
  1 sibling, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2022-01-10 21:00 UTC (permalink / raw)
  To: Kechen Lu
  Cc: kvm, pbonzini, wanpengli, vkuznets, mst, somduttar, linux-kernel

On Mon, Jan 10, 2022, Sean Christopherson wrote:
> Does your use case require toggling intercepts?  Or is the configuration static?
> If it's static, then the easiest thing would be to follow the per-VM behavior so
> that there are no suprises.  If toggling is required, then I think the best thing
> would be to add a prep patch to add an override flag to the per-VM ioctl, and then
> share code between the per-VM and per-vCPU paths for modifying the flags (attached
> as patch 0003).

...
 
> If toggling is not required, then I still think it makes sense to add a macro to
> handle propagating the capability args to the arch flags.

Almost forgot.  Can you please add a selftests to verify whatever per-VM and
per-vCPU behavior we end implementing?  Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability
  2021-12-21  9:04 [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability Kechen Lu
                   ` (2 preceding siblings ...)
  2021-12-21  9:04 ` [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability Kechen Lu
@ 2022-01-10 21:18 ` Michael S. Tsirkin
  2022-01-11  6:34   ` Kechen Lu
  3 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2022-01-10 21:18 UTC (permalink / raw)
  To: Kechen Lu
  Cc: kvm, pbonzini, seanjc, wanpengli, vkuznets, somduttar, linux-kernel

On Tue, Dec 21, 2021 at 01:04:46AM -0800, Kechen Lu wrote:
> Summary
> ===========
> Introduce support of vCPU-scoped ioctl with KVM_CAP_X86_DISABLE_EXITS
> cap for disabling exits to enable finer-grained VM exits disabling
> on per vCPU scales instead of whole guest. This patch series enabled
> the vCPU-scoped exits control on HLT VM-exits.
> 
> Motivation
> ============
> In use cases like Windows guest running heavy CPU-bound
> workloads, disabling HLT VM-exits could mitigate host sched ctx switch
> overhead. Simply HLT disabling on all vCPUs could bring
> performance benefits, but if no pCPUs reserved for host threads, could
> happened to the forced preemption as host does not know the time to do
> the schedule for other host threads want to run. With this patch, we
> could only disable part of vCPUs HLT exits for one guest, this still
> keeps performance benefits, and also shows resiliency to host stressing
> workload running at the same time.
> 
> Performance and Testing
> =========================
> In the host stressing workload experiment with Windows guest heavy
> CPU-bound workloads, it shows good resiliency and having the ~3%
> performance improvement. E.g. Passmark running in a Windows guest
> with this patch disabling HLT exits on only half of vCPUs still
> showing 2.4% higher main score v/s baseline.
> 
> Tested everything on AMD machines.
> 
> 
> v1->v2 (Sean Christopherson) :
> - Add explicit restriction for VM-scoped exits disabling to be called
>   before vCPUs creation (patch 1)
> - Use vCPU ioctl instead of 64bit vCPU bitmask (patch 3), and make exits
>   disable flags check purely for vCPU instead of VM (patch 2)

This is still quite blunt and assumes a ton of configuration on the host
exactly matching the workload within guest. Which seems a waste since
guests actually have the smarts to know what's happening within them.

If you are going to allow guest to halt a vCPU, how about
working on exposing mwait to guest cleanly instead?
The idea is to expose this in ACPI - linux guests
ignore ACPI and go by CPUID but windows guests follow
ACPI. Linux can be patched ;)

What we would have is a mirror of host ACPI states,
such that lower states invoke HLT and exit, higher
power states invoke mwait and wait within guest.

The nice thing with this approach is that it's already supported
by the host kernel, so it's just a question of coding up ACPI.



> 
> Best Regards,
> Kechen
> 
> Kechen Lu (3):
>   KVM: x86: only allow exits disable before vCPUs created
>   KVM: x86: move ()_in_guest checking to vCPU scope
>   KVM: x86: add vCPU ioctl for HLT exits disable capability
> 
>  Documentation/virt/kvm/api.rst     |  4 +++-
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  7 +++++++
>  arch/x86/kvm/cpuid.c               |  2 +-
>  arch/x86/kvm/lapic.c               |  2 +-
>  arch/x86/kvm/svm/svm.c             | 20 +++++++++++++++-----
>  arch/x86/kvm/vmx/vmx.c             | 26 ++++++++++++++++++--------
>  arch/x86/kvm/x86.c                 | 24 +++++++++++++++++++++++-
>  arch/x86/kvm/x86.h                 | 16 ++++++++--------
>  9 files changed, 77 insertions(+), 25 deletions(-)
> 
> -- 
> 2.30.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability
  2022-01-10 21:18 ` [RFC PATCH v2 0/3] KVM: x86: add per-vCPU " Michael S. Tsirkin
@ 2022-01-11  6:34   ` Kechen Lu
  0 siblings, 0 replies; 14+ messages in thread
From: Kechen Lu @ 2022-01-11  6:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, pbonzini, seanjc, wanpengli, vkuznets, Somdutta Roy, linux-kernel

Hi Michael,

> -----Original Message-----
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Monday, January 10, 2022 1:18 PM
> To: Kechen Lu <kechenl@nvidia.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; seanjc@google.com;
> wanpengli@tencent.com; vkuznets@redhat.com; Somdutta Roy
> <somduttar@nvidia.com>; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable
> capability
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Dec 21, 2021 at 01:04:46AM -0800, Kechen Lu wrote:
> > Summary
> > ===========
> > Introduce support of vCPU-scoped ioctl with
> KVM_CAP_X86_DISABLE_EXITS
> > cap for disabling exits to enable finer-grained VM exits disabling on
> > per vCPU scales instead of whole guest. This patch series enabled the
> > vCPU-scoped exits control on HLT VM-exits.
> >
> > Motivation
> > ============
> > In use cases like Windows guest running heavy CPU-bound workloads,
> > disabling HLT VM-exits could mitigate host sched ctx switch overhead.
> > Simply HLT disabling on all vCPUs could bring performance benefits,
> > but if no pCPUs reserved for host threads, could happened to the
> > forced preemption as host does not know the time to do the schedule
> > for other host threads want to run. With this patch, we could only
> > disable part of vCPUs HLT exits for one guest, this still keeps
> > performance benefits, and also shows resiliency to host stressing
> > workload running at the same time.
> >
> > Performance and Testing
> > =========================
> > In the host stressing workload experiment with Windows guest heavy
> > CPU-bound workloads, it shows good resiliency and having the ~3%
> > performance improvement. E.g. Passmark running in a Windows guest with
> > this patch disabling HLT exits on only half of vCPUs still showing
> > 2.4% higher main score v/s baseline.
> >
> > Tested everything on AMD machines.
> >
> >
> > v1->v2 (Sean Christopherson) :
> > - Add explicit restriction for VM-scoped exits disabling to be called
> >   before vCPUs creation (patch 1)
> > - Use vCPU ioctl instead of 64bit vCPU bitmask (patch 3), and make exits
> >   disable flags check purely for vCPU instead of VM (patch 2)
> 
> This is still quite blunt and assumes a ton of configuration on the host exactly
> matching the workload within guest. Which seems a waste since guests
> actually have the smarts to know what's happening within them.
> 

For now we use fixed configuration on the host for our guests, it still 
gives promising performance benefits on most workloads in our use case. But 
yeah, it's not adaptive and flexible for workloads in guest.

> If you are going to allow guest to halt a vCPU, how about working on
> exposing mwait to guest cleanly instead?
> The idea is to expose this in ACPI - linux guests ignore ACPI and go by CPUID
> but windows guests follow ACPI. Linux can be patched ;)
> 
> What we would have is a mirror of host ACPI states, such that lower states
> invoke HLT and exit, higher power states invoke mwait and wait within guest.
> 
> The nice thing with this approach is that it's already supported by the host
> kernel, so it's just a question of coding up ACPI.
> 

This idea looks really interesting! If we could achieve idling longer time(deeper power
State) causing HLT and exit, shorter time idle(higher power state) mwait in guest, 
through ACPI config, that's indeed a more adaptive and cleaner approach. But especially
for Windows guest, its idle process execution and idle/sleep state switching logic seems
not well documented, need to figure out impacts on idle process and os PM behaviors 
with the change.

But much thanks for this suggestion, I will try to explore it a bit,
and will get updates posted. 

Thanks!

Best Regards,
Kechen

> 
> 
> >
> > Best Regards,
> > Kechen
> >
> > Kechen Lu (3):
> >   KVM: x86: only allow exits disable before vCPUs created
> >   KVM: x86: move ()_in_guest checking to vCPU scope
> >   KVM: x86: add vCPU ioctl for HLT exits disable capability
> >
> >  Documentation/virt/kvm/api.rst     |  4 +++-
> >  arch/x86/include/asm/kvm-x86-ops.h |  1 +
> >  arch/x86/include/asm/kvm_host.h    |  7 +++++++
> >  arch/x86/kvm/cpuid.c               |  2 +-
> >  arch/x86/kvm/lapic.c               |  2 +-
> >  arch/x86/kvm/svm/svm.c             | 20 +++++++++++++++-----
> >  arch/x86/kvm/vmx/vmx.c             | 26 ++++++++++++++++++--------
> >  arch/x86/kvm/x86.c                 | 24 +++++++++++++++++++++++-
> >  arch/x86/kvm/x86.h                 | 16 ++++++++--------
> >  9 files changed, 77 insertions(+), 25 deletions(-)
> >
> > --
> > 2.30.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability
  2022-01-10 20:56   ` Sean Christopherson
  2022-01-10 21:00     ` Sean Christopherson
@ 2022-01-11  6:35     ` Kechen Lu
  1 sibling, 0 replies; 14+ messages in thread
From: Kechen Lu @ 2022-01-11  6:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, wanpengli, vkuznets, mst, Somdutta Roy, linux-kernel



> -----Original Message-----
> From: Sean Christopherson <seanjc@google.com>
> Sent: Monday, January 10, 2022 12:56 PM
> To: Kechen Lu <kechenl@nvidia.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wanpengli@tencent.com;
> vkuznets@redhat.com; mst@redhat.com; Somdutta Roy
> <somduttar@nvidia.com>; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits
> disable capability
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Dec 21, 2021, Kechen Lu wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> > d5d0d99b584e..d7b4a3e360bb 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5072,6 +5072,18 @@ static int kvm_vcpu_ioctl_enable_cap(struct
> kvm_vcpu *vcpu,
> >                       kvm_update_pv_runtime(vcpu);
> >
> >               return 0;
> > +
> > +     case KVM_CAP_X86_DISABLE_EXITS:
> > +             if (cap->args[0] && (cap->args[0] &
> > +                             ~KVM_X86_DISABLE_VALID_EXITS))
> 
> Bad alignment, but there's no need for the !0 in the first place, i.e.
> 
>                 if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
> 

Ack.

> but that's also incomplete as this path only supports toggling HLT, yet allows
> all flavors of KVM_X86_DISABLE_VALID_EXITS.  Unless there's a good reason
> to not allow maniuplating the other exits, the proper fix is to just support
> everything.
> 

Makes sense. When I implement this patch version, only thinking about the use case of
HLT intercept and want to see more comments if this framework looks good. Will complete
this to support other exits.

> > +                     return -EINVAL;
> > +
> > +             vcpu->arch.hlt_in_guest = (cap->args[0] &
> > +                     KVM_X86_DISABLE_EXITS_HLT) ? true : false;
> 
> Hmm, this behavior diverges from the per-VM ioctl, which doesn't allow re-
> enabling a disabled exit.  We can't change the per-VM behavior without
> breaking backwards compatibility, e.g. if userspace does:
> 
>         if (allow_mwait)
>                 kvm_vm_disable_exit(KVM_X86_DISABLE_EXITS_MWAIT)
>         if (allow_hlt)
>                 kvm_vm_disable_exit(KVM_X86_DISABLE_EXITS_HLT)
> 
> then changing KVM behavior would result in MWAIT behavior intercepted
> when previously it would have been allowed.  We have a userspace VMM
> that operates like this...
> 
> Does your use case require toggling intercepts?  Or is the configuration
> static?
> If it's static, then the easiest thing would be to follow the per-VM behavior so
> that there are no suprises.  If toggling is required, then I think the best thing
> would be to add a prep patch to add an override flag to the per-VM ioctl, and
> then share code between the per-VM and per-vCPU paths for modifying the
> flags (attached as patch 0003).
> 

Our use case for now is static configuration. But since the per-vcpu ioctl is
anyhow executed runtime after the vcpu creation, so it is the "toggling" and
needs overrides on some vcpus. "OVERRIDE" flag makes much sense to me,
the macro looks clean and neat for reducing redundant codes. Thanks a lot
for the patch.

> Somewhat related, there's another bug of sorts that I think we can safely fix.
> KVM doesn't reject disabling of MWAIT exits when MWAIT isn't allowed in
> the guest, and instead ignores the bad input.  Not a big deal, but fixing that
> means KVM doesn't need to check kvm_can_mwait_in_guest() when
> processing the args to update flags.  If that breaks someone's userspace, the
> alternative would be to tweak the attached patch 0003 to introduce the
> OVERRIDE, e.g.
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> f611a49ceba4..3bac756bab79 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5053,6 +5053,8 @@ static int kvm_vcpu_ioctl_device_attr(struct
> kvm_vcpu *vcpu,
> 
>  #define kvm_ioctl_disable_exits(a, mask)                                    \
>  ({                                                                          \
> +       if (!kvm_can_mwait_in_guest())                                       \
> +               (mask) &= KVM_X86_DISABLE_EXITS_MWAIT;                       \

For some userspace's backward compatibility, adding this tweak to the attached
Patch 0003 makes sense. BTW, (mask) &= KVM_X86_DISABLE_EXITS_MWAIT
seems should be (mask) &= ~KVM_X86_DISABLE_EXITS_MWAIT, I guess it's a 
typo :P. Will include the attached patch 0001 in the next as well. Thanks for
all the help!

Best Regards,
Kechen

>         if ((mask) & KVM_X86_DISABLE_EXITS_OVERRIDE) {                       \
>                 (a).mwait_in_guest = (mask) & KVM_X86_DISABLE_EXITS_MWAIT;
> \
>                 (a).hlt_in_guest = (mask) & KVM_X86_DISABLE_EXITS_HLT;       \
> 
> 
> If toggling is not required, then I still think it makes sense to add a macro to
> handle propagating the capability args to the arch flags.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability
  2022-01-10 21:00     ` Sean Christopherson
@ 2022-01-11  6:36       ` Kechen Lu
  0 siblings, 0 replies; 14+ messages in thread
From: Kechen Lu @ 2022-01-11  6:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, wanpengli, vkuznets, mst, Somdutta Roy, linux-kernel



> -----Original Message-----
> From: Sean Christopherson <seanjc@google.com>
> Sent: Monday, January 10, 2022 1:01 PM
> To: Kechen Lu <kechenl@nvidia.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wanpengli@tencent.com;
> vkuznets@redhat.com; mst@redhat.com; Somdutta Roy
> <somduttar@nvidia.com>; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits
> disable capability
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, Jan 10, 2022, Sean Christopherson wrote:
> > Does your use case require toggling intercepts?  Or is the configuration
> static?
> > If it's static, then the easiest thing would be to follow the per-VM
> > behavior so that there are no suprises.  If toggling is required, then
> > I think the best thing would be to add a prep patch to add an override
> > flag to the per-VM ioctl, and then share code between the per-VM and
> > per-vCPU paths for modifying the flags (attached as patch 0003).
> 
> ...
> 
> > If toggling is not required, then I still think it makes sense to add
> > a macro to handle propagating the capability args to the arch flags.
> 
> Almost forgot.  Can you please add a selftests to verify whatever per-VM and
> per-vCPU behavior we end implementing?  Thanks!

Sure, will add selftests for per-VM and per-vCPU disable exits cap.

Thanks,
Kechen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH v2 2/3] KVM: x86: move ()_in_guest checking to vCPU scope
  2022-01-10 19:35   ` Sean Christopherson
@ 2022-01-11  6:37     ` Kechen Lu
  0 siblings, 0 replies; 14+ messages in thread
From: Kechen Lu @ 2022-01-11  6:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, wanpengli, vkuznets, mst, Somdutta Roy, linux-kernel



> -----Original Message-----
> From: Sean Christopherson <seanjc@google.com>
> Sent: Monday, January 10, 2022 11:36 AM
> To: Kechen Lu <kechenl@nvidia.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wanpengli@tencent.com;
> vkuznets@redhat.com; mst@redhat.com; Somdutta Roy
> <somduttar@nvidia.com>; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH v2 2/3] KVM: x86: move ()_in_guest checking to
> vCPU scope
> 
> External email: Use caution opening links or attachments
> 
> 
> The shortlog is weird, I get what you're going for with the "()", but it honestly
> looks like a typo :-)  And add "power management" so that there's a bit more
> context in the shortlog?  Maybe this?
> 
>   KVM: x86: Move *_in_guest power management flags to vCPU scope
> 

Ack. Yeah it's a typo :)

> On Tue, Dec 21, 2021, Kechen Lu wrote:
> > For futher extensions on finer-grained control on per-vCPU exits
> > disable control, and because VM-scoped restricted to set before vCPUs
> > creation, runtime disabled exits flag check could be purely vCPU
> > scope.
> 
> State what the patch does, not what it "could" do.  E.g.
> 
> Make the runtime disabled mwait/hlt/pause/cstate exits flags vCPU scope to
> allow finer-grained, per-vCPU control.  The VM-scoped control is only
> allowed before vCPUs are created, thus preserving the existing behavior is a
> simple matter of snapshotting the flags at vCPU creation.
> 

Ack! Thanks for patient refinement on the description wording :P

> A few nits below that aren't even from this path, but otherwise,
> 
> Reviewed-by: Sean Christopherson <seanjc@google.com>
> 
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Kechen Lu <kechenl@nvidia.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  5 +++++
> >  arch/x86/kvm/cpuid.c            |  2 +-
> >  arch/x86/kvm/lapic.c            |  2 +-
> >  arch/x86/kvm/svm/svm.c          | 10 +++++-----
> >  arch/x86/kvm/vmx/vmx.c          | 16 ++++++++--------
> >  arch/x86/kvm/x86.c              |  6 +++++-
> >  arch/x86/kvm/x86.h              | 16 ++++++++--------
> >  7 files changed, 33 insertions(+), 24 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h index 2164b9f4c7b0..edc5fca4d8c8
> > 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -908,6 +908,11 @@ struct kvm_vcpu_arch {  #if
> > IS_ENABLED(CONFIG_HYPERV)
> >       hpa_t hv_root_tdp;
> >  #endif
> > +
> > +     bool mwait_in_guest;
> > +     bool hlt_in_guest;
> > +     bool pause_in_guest;
> > +     bool cstate_in_guest;
> >  };
> >
> >  struct kvm_lpage_info {
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
> > 07e9215e911d..6291e15710ba 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -177,7 +177,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu
> *vcpu)
> >               best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> >
> >       best = kvm_find_kvm_cpuid_features(vcpu);
> > -     if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> > +     if (kvm_hlt_in_guest(vcpu) && best &&
> >               (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> 
> Can you (or Paolo?) opportunistically align this?  And maybe even shuffle the
> check on "best" to pair the !NULL check with the functional check?  E.g.
> 
>         if (kvm_hlt_in_guest(vcpu) &&
>             best && (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
>                 best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> 

Makes sense. Let me align and reform this in this patch.

> >               best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
> > f206fc35deff..effb994e6642 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -119,7 +119,7 @@ static bool kvm_can_post_timer_interrupt(struct
> > kvm_vcpu *vcpu)  bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu)  {
> >       return kvm_x86_ops.set_hv_timer
> > -            && !(kvm_mwait_in_guest(vcpu->kvm) ||
> > +            && !(kvm_mwait_in_guest(vcpu) ||
> 
> And another opportunistic tweak?  I'm been itching for an excuse to "fix" this
> particular helper for quite some time :-)
> 
>         return kvm_x86_ops.set_hv_timer &&
>                !(kvm_mwait_in_guest(vcpu) ||
> kvm_can_post_timer_interrupt(vcpu));
> 
> That overruns the 80 char soft limit, but IMO it's worth it in this case as the
> resulting code is more readable.
> 

Sure!

Best Regards,
Kechen

> 
> >                   kvm_can_post_timer_interrupt(vcpu));
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_can_use_hv_timer);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH v2 1/3] KVM: x86: only allow exits disable before vCPUs created
  2022-01-10 18:50   ` Sean Christopherson
@ 2022-01-11  6:38     ` Kechen Lu
  0 siblings, 0 replies; 14+ messages in thread
From: Kechen Lu @ 2022-01-11  6:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, wanpengli, vkuznets, mst, Somdutta Roy, linux-kernel

Hi Sean,

> -----Original Message-----
> From: Sean Christopherson <seanjc@google.com>
> Sent: Monday, January 10, 2022 10:50 AM
> To: Kechen Lu <kechenl@nvidia.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wanpengli@tencent.com;
> vkuznets@redhat.com; mst@redhat.com; Somdutta Roy
> <somduttar@nvidia.com>; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH v2 1/3] KVM: x86: only allow exits disable before
> vCPUs created
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Dec 21, 2021, Kechen Lu wrote:
> > Since VMX and SVM both would never update the control bits if exits
> > are disable after vCPUs are created, only allow setting exits disable
> > flag before vCPU creation.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> For this to carry my SOB, I should be attributed as the author, or add a
> Co-developed-by: for me.  I'm also totally ok with a Suggested-by: or
> Reported-by:
> 

My apologies for putting incorrect SOB format :P Will fix it!

> And we should at least have
> 
>   Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT
> intercepts")
> 

Ack! Will mention it in the description.

> andy maybe Cc: stable@vger.kernel.org, though I'm not entirely sure this is
> stable material as it could in theory do more harm than good if there's a
> busted userspace out there.
> 

I see, will cc stable mailing list. IMO with this patch, incorrect behavior from userspace
only cause the set flag "ineffective", not sure if this breaks some userspace seriously.

Best Regards,
Kechen

> If this doesn't carry my SOB...
> 
> Reviewed-by: Sean Christopherson <seanjc@google.com>
> 
> > Signed-off-by: Kechen Lu <kechenl@nvidia.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 1 +
> >  arch/x86/kvm/x86.c             | 6 ++++++
> >  2 files changed, 7 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst
> > b/Documentation/virt/kvm/api.rst index aeeb071c7688..d1c50b95bbc1
> > 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6581,6 +6581,7 @@ branch to guests' 0x200 interrupt vector.
> >  :Architectures: x86
> >  :Parameters: args[0] defines which exits are disabled
> >  :Returns: 0 on success, -EINVAL when args[0] contains invalid exits
> > +          or if any vCPU has already been created
> >
> >  Valid bits in args[0] are::
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> > 0cf1082455df..37529c0c279d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5764,6 +5764,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >               if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
> >                       break;
> >
> > +             mutex_lock(&kvm->lock);
> > +             if (kvm->created_vcpus)
> > +                     goto disable_exits_unlock;
> > +
> >               if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
> >                       kvm_can_mwait_in_guest())
> >                       kvm->arch.mwait_in_guest = true; @@ -5774,6
> > +5778,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >               if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> >                       kvm->arch.cstate_in_guest = true;
> >               r = 0;
> > +disable_exits_unlock:
> > +             mutex_unlock(&kvm->lock);
> >               break;
> >       case KVM_CAP_MSR_PLATFORM_INFO:
> >               kvm->arch.guest_can_read_msr_platform_info =
> > cap->args[0];
> > --
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-11  6:38 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-21  9:04 [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability Kechen Lu
2021-12-21  9:04 ` [RFC PATCH v2 1/3] KVM: x86: only allow exits disable before vCPUs created Kechen Lu
2022-01-10 18:50   ` Sean Christopherson
2022-01-11  6:38     ` Kechen Lu
2021-12-21  9:04 ` [RFC PATCH v2 2/3] KVM: x86: move ()_in_guest checking to vCPU scope Kechen Lu
2022-01-10 19:35   ` Sean Christopherson
2022-01-11  6:37     ` Kechen Lu
2021-12-21  9:04 ` [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability Kechen Lu
2022-01-10 20:56   ` Sean Christopherson
2022-01-10 21:00     ` Sean Christopherson
2022-01-11  6:36       ` Kechen Lu
2022-01-11  6:35     ` Kechen Lu
2022-01-10 21:18 ` [RFC PATCH v2 0/3] KVM: x86: add per-vCPU " Michael S. Tsirkin
2022-01-11  6:34   ` Kechen Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.