kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] Add support for bus lock VM exit
@ 2020-06-28  8:53 Chenyi Qiang
  2020-06-28  8:53 ` [RFC 1/2] KVM: VMX: Convert vcpu_vmx.exit_reason to a union Chenyi Qiang
  2020-06-28  8:53 ` [RFC 2/2] KVM: VMX: Enable bus lock VM exit Chenyi Qiang
  0 siblings, 2 replies; 12+ messages in thread
From: Chenyi Qiang @ 2020-06-28  8:53 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: kvm, linux-kernel

This serial adds the support for bus lock VM exit, which is a
sub-feature of bus lock detection in KVM. The left part concerning bus
lock debug exception support will be sent out once the kernel part is
ready.

The first patch applies Sean's refactor to vcpu_vmx.exit_reason at
https://patchwork.kernel.org/patch/11500659
It is necessary as bus lock VM exit adds a new modifier bit(bit 26) in
exit_reason field in VMCS.

The second patch is the enabling work for bus lock VM exit.

Document for Bus Lock VM exit is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".

Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Chenyi Qiang (1):
  KVM: VMX: Enable bus lock VM exit

Sean Christopherson (1):
  KVM: VMX: Convert vcpu_vmx.exit_reason to a union

 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/include/asm/vmx.h         |  1 +
 arch/x86/include/asm/vmxfeatures.h |  1 +
 arch/x86/include/uapi/asm/vmx.h    |  4 +-
 arch/x86/kvm/vmx/nested.c          | 42 ++++++++++------
 arch/x86/kvm/vmx/vmx.c             | 81 ++++++++++++++++++------------
 arch/x86/kvm/vmx/vmx.h             | 25 ++++++++-
 arch/x86/kvm/x86.c                 |  1 +
 8 files changed, 107 insertions(+), 49 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC 1/2] KVM: VMX: Convert vcpu_vmx.exit_reason to a union
  2020-06-28  8:53 [RFC 0/2] Add support for bus lock VM exit Chenyi Qiang
@ 2020-06-28  8:53 ` Chenyi Qiang
  2020-06-28  8:53 ` [RFC 2/2] KVM: VMX: Enable bus lock VM exit Chenyi Qiang
  1 sibling, 0 replies; 12+ messages in thread
From: Chenyi Qiang @ 2020-06-28  8:53 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: kvm, linux-kernel

From: Sean Christopherson <sean.j.christopherson@intel.com>

Convert vcpu_vmx.exit_reason from a u32 to a union (of size u32).  The
full VM_EXIT_REASON field is comprised of a 16-bit basic exit reason in
bits 15:0, and single-bit modifiers in bits 31:16.

Historically, KVM has only had to worry about handling the "failed
VM-Entry" modifier, which could only be set in very specific flows and
required dedicated handling.  I.e. manually stripping the FAILED_VMENTRY
bit was a somewhat viable approach.  But even with only a single bit to
worry about, KVM has had several bugs related to comparing a basic exit
reason against the full exit reason store in vcpu_vmx.

Upcoming Intel features, e.g. SGX, will add new modifier bits that can
be set on more or less any VM-Exit, as opposed to the significantly more
restricted FAILED_VMENTRY, i.e. correctly handling everything in one-off
flows isn't scalable.  Tracking exit reason in a union forces code to
explicitly choose between consuming the full exit reason and the basic
exit, and is a convenient way to document and access the modifiers.

No functional change intended.

Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 42 ++++++++++++++++---------
 arch/x86/kvm/vmx/vmx.c    | 64 ++++++++++++++++++++-------------------
 arch/x86/kvm/vmx/vmx.h    | 25 ++++++++++++++-
 3 files changed, 84 insertions(+), 47 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d1af20b050a8..77f732ad6085 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3255,7 +3255,11 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 	enum vm_entry_failure_code entry_failure_code;
 	bool evaluate_pending_interrupts;
-	u32 exit_reason, failed_index;
+	union vmx_exit_reason exit_reason = {
+		.basic = EXIT_REASON_INVALID_STATE,
+		.failed_vmentry = 1,
+	};
+	u32 failed_index;
 
 	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
 		kvm_vcpu_flush_tlb_current(vcpu);
@@ -3305,7 +3309,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 
 		if (nested_vmx_check_guest_state(vcpu, vmcs12,
 						 &entry_failure_code)) {
-			exit_reason = EXIT_REASON_INVALID_STATE;
+			exit_reason.basic = EXIT_REASON_INVALID_STATE;
 			vmcs12->exit_qualification = entry_failure_code;
 			goto vmentry_fail_vmexit;
 		}
@@ -3316,7 +3320,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 		vcpu->arch.tsc_offset += vmcs12->tsc_offset;
 
 	if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code)) {
-		exit_reason = EXIT_REASON_INVALID_STATE;
+		exit_reason.basic = EXIT_REASON_INVALID_STATE;
 		vmcs12->exit_qualification = entry_failure_code;
 		goto vmentry_fail_vmexit_guest_mode;
 	}
@@ -3326,7 +3330,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 						   vmcs12->vm_entry_msr_load_addr,
 						   vmcs12->vm_entry_msr_load_count);
 		if (failed_index) {
-			exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
+			exit_reason.basic = EXIT_REASON_MSR_LOAD_FAIL;
 			vmcs12->exit_qualification = failed_index;
 			goto vmentry_fail_vmexit_guest_mode;
 		}
@@ -3394,7 +3398,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 		return NVMX_VMENTRY_VMEXIT;
 
 	load_vmcs12_host_state(vcpu, vmcs12);
-	vmcs12->vm_exit_reason = exit_reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
+	vmcs12->vm_exit_reason = exit_reason.full;
 	if (enable_shadow_vmcs || vmx->nested.hv_evmcs)
 		vmx->nested.need_vmcs12_to_shadow_sync = true;
 	return NVMX_VMENTRY_VMEXIT;
@@ -5449,7 +5453,12 @@ static int handle_vmfunc(struct kvm_vcpu *vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 
 fail:
-	nested_vmx_vmexit(vcpu, vmx->exit_reason,
+	/*
+	 * This is effectively a reflected VM-Exit, as opposed to a synthesized
+	 * nested VM-Exit.  Pass the original exit reason, i.e. don't hardcode
+	 * EXIT_REASON_VMFUNC as the exit reason.
+	 */
+	nested_vmx_vmexit(vcpu, vmx->exit_reason.full,
 			  vmx_get_intr_info(vcpu),
 			  vmx_get_exit_qual(vcpu));
 	return 1;
@@ -5517,7 +5526,8 @@ static bool nested_vmx_exit_handled_io(struct kvm_vcpu *vcpu,
  * MSR bitmap. This may be the case even when L0 doesn't use MSR bitmaps.
  */
 static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
-	struct vmcs12 *vmcs12, u32 exit_reason)
+					struct vmcs12 *vmcs12,
+					union vmx_exit_reason exit_reason)
 {
 	u32 msr_index = kvm_rcx_read(vcpu);
 	gpa_t bitmap;
@@ -5531,7 +5541,7 @@ static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
 	 * First we need to figure out which of the four to use:
 	 */
 	bitmap = vmcs12->msr_bitmap;
-	if (exit_reason == EXIT_REASON_MSR_WRITE)
+	if (exit_reason.basic == EXIT_REASON_MSR_WRITE)
 		bitmap += 2048;
 	if (msr_index >= 0xc0000000) {
 		msr_index -= 0xc0000000;
@@ -5668,11 +5678,12 @@ static bool nested_vmx_exit_handled_mtf(struct vmcs12 *vmcs12)
  * Return true if L0 wants to handle an exit from L2 regardless of whether or not
  * L1 wants the exit.  Only call this when in is_guest_mode (L2).
  */
-static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu, u32 exit_reason)
+static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
+				     union vmx_exit_reason exit_reason)
 {
 	u32 intr_info;
 
-	switch ((u16)exit_reason) {
+	switch ((u16)exit_reason.basic) {
 	case EXIT_REASON_EXCEPTION_NMI:
 		intr_info = vmx_get_intr_info(vcpu);
 		if (is_nmi(intr_info))
@@ -5728,12 +5739,13 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu, u32 exit_reason)
  * Return 1 if L1 wants to intercept an exit from L2.  Only call this when in
  * is_guest_mode (L2).
  */
-static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu, u32 exit_reason)
+static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu,
+				     union vmx_exit_reason exit_reason)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 	u32 intr_info;
 
-	switch ((u16)exit_reason) {
+	switch ((u16)exit_reason.basic) {
 	case EXIT_REASON_EXCEPTION_NMI:
 		intr_info = vmx_get_intr_info(vcpu);
 		if (is_nmi(intr_info))
@@ -5852,7 +5864,7 @@ static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu, u32 exit_reason)
 bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u32 exit_reason = vmx->exit_reason;
+	union vmx_exit_reason exit_reason = vmx->exit_reason;
 	unsigned long exit_qual;
 	u32 exit_intr_info;
 
@@ -5874,7 +5886,7 @@ bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu)
 	exit_intr_info = vmx_get_intr_info(vcpu);
 	exit_qual = vmx_get_exit_qual(vcpu);
 
-	trace_kvm_nested_vmexit(kvm_rip_read(vcpu), exit_reason, exit_qual,
+	trace_kvm_nested_vmexit(kvm_rip_read(vcpu), exit_reason.full, exit_qual,
 				vmx->idt_vectoring_info, exit_intr_info,
 				vmcs_read32(VM_EXIT_INTR_ERROR_CODE),
 				KVM_ISA_VMX);
@@ -5903,7 +5915,7 @@ bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu)
 	}
 
 reflect_vmexit:
-	nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
+	nested_vmx_vmexit(vcpu, exit_reason.full, exit_intr_info, exit_qual);
 	return true;
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b1a23ad986ff..647ee9a1b4e6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1576,7 +1576,7 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	 * i.e. we end up advancing IP with some random value.
 	 */
 	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
-	    to_vmx(vcpu)->exit_reason != EXIT_REASON_EPT_MISCONFIG) {
+	    to_vmx(vcpu)->exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
 		orig_rip = kvm_rip_read(vcpu);
 		rip = orig_rip + vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 #ifdef CONFIG_X86_64
@@ -5961,8 +5961,9 @@ void dump_vmcs(void)
 static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u32 exit_reason = vmx->exit_reason;
+	union vmx_exit_reason exit_reason = vmx->exit_reason;
 	u32 vectoring_info = vmx->idt_vectoring_info;
+	u16 exit_handler_index;
 
 	/*
 	 * Flush logged GPAs PML buffer, this will make dirty_bitmap more
@@ -6004,11 +6005,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 			return 1;
 	}
 
-	if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
+	if (exit_reason.failed_vmentry) {
 		dump_vmcs();
 		vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
 		vcpu->run->fail_entry.hardware_entry_failure_reason
-			= exit_reason;
+			= exit_reason.full;
 		return 0;
 	}
 
@@ -6028,17 +6029,17 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	 * will cause infinite loop.
 	 */
 	if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
-			(exit_reason != EXIT_REASON_EXCEPTION_NMI &&
-			exit_reason != EXIT_REASON_EPT_VIOLATION &&
-			exit_reason != EXIT_REASON_PML_FULL &&
-			exit_reason != EXIT_REASON_TASK_SWITCH)) {
+	    (exit_reason.basic != EXIT_REASON_EXCEPTION_NMI &&
+	     exit_reason.basic != EXIT_REASON_EPT_VIOLATION &&
+	     exit_reason.basic != EXIT_REASON_PML_FULL &&
+	     exit_reason.basic != EXIT_REASON_TASK_SWITCH)) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV;
 		vcpu->run->internal.ndata = 3;
 		vcpu->run->internal.data[0] = vectoring_info;
-		vcpu->run->internal.data[1] = exit_reason;
+		vcpu->run->internal.data[1] = exit_reason.full;
 		vcpu->run->internal.data[2] = vcpu->arch.exit_qualification;
-		if (exit_reason == EXIT_REASON_EPT_MISCONFIG) {
+		if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG) {
 			vcpu->run->internal.ndata++;
 			vcpu->run->internal.data[3] =
 				vmcs_read64(GUEST_PHYSICAL_ADDRESS);
@@ -6068,38 +6069,39 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	if (exit_fastpath != EXIT_FASTPATH_NONE)
 		return 1;
 
-	if (exit_reason >= kvm_vmx_max_exit_handlers)
+	if (exit_reason.basic >= kvm_vmx_max_exit_handlers)
 		goto unexpected_vmexit;
 #ifdef CONFIG_RETPOLINE
-	if (exit_reason == EXIT_REASON_MSR_WRITE)
+	if (exit_reason.basic == EXIT_REASON_MSR_WRITE)
 		return kvm_emulate_wrmsr(vcpu);
-	else if (exit_reason == EXIT_REASON_PREEMPTION_TIMER)
+	else if (exit_reason.basic == EXIT_REASON_PREEMPTION_TIMER)
 		return handle_preemption_timer(vcpu);
-	else if (exit_reason == EXIT_REASON_INTERRUPT_WINDOW)
+	else if (exit_reason.basic == EXIT_REASON_INTERRUPT_WINDOW)
 		return handle_interrupt_window(vcpu);
-	else if (exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
+	else if (exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
 		return handle_external_interrupt(vcpu);
-	else if (exit_reason == EXIT_REASON_HLT)
+	else if (exit_reason.basic == EXIT_REASON_HLT)
 		return kvm_emulate_halt(vcpu);
-	else if (exit_reason == EXIT_REASON_EPT_MISCONFIG)
+	else if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG)
 		return handle_ept_misconfig(vcpu);
 #endif
 
-	exit_reason = array_index_nospec(exit_reason,
-					 kvm_vmx_max_exit_handlers);
-	if (!kvm_vmx_exit_handlers[exit_reason])
+	exit_handler_index = array_index_nospec((u16)exit_reason.basic,
+						kvm_vmx_max_exit_handlers);
+	if (!kvm_vmx_exit_handlers[exit_handler_index])
 		goto unexpected_vmexit;
 
-	return kvm_vmx_exit_handlers[exit_reason](vcpu);
+	return kvm_vmx_exit_handlers[exit_handler_index](vcpu);
 
 unexpected_vmexit:
-	vcpu_unimpl(vcpu, "vmx: unexpected exit reason 0x%x\n", exit_reason);
+	vcpu_unimpl(vcpu, "vmx: unexpected exit reason 0x%x\n",
+		    exit_reason.full);
 	dump_vmcs();
 	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 	vcpu->run->internal.suberror =
 			KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
 	vcpu->run->internal.ndata = 1;
-	vcpu->run->internal.data[0] = exit_reason;
+	vcpu->run->internal.data[0] = exit_reason.full;
 	return 0;
 }
 
@@ -6452,9 +6454,9 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	if (vmx->exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
+	if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
 		handle_external_interrupt_irqoff(vcpu);
-	else if (vmx->exit_reason == EXIT_REASON_EXCEPTION_NMI)
+	else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
 		handle_exception_nmi_irqoff(vmx);
 }
 
@@ -6659,7 +6661,7 @@ void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
 
 static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 {
-	switch (to_vmx(vcpu)->exit_reason) {
+	switch (to_vmx(vcpu)->exit_reason.basic) {
 	case EXIT_REASON_MSR_WRITE:
 		return handle_fastpath_set_msr_irqoff(vcpu);
 	case EXIT_REASON_PREEMPTION_TIMER:
@@ -6815,17 +6817,17 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	vmx->idt_vectoring_info = 0;
 
 	if (unlikely(vmx->fail)) {
-		vmx->exit_reason = 0xdead;
+		vmx->exit_reason.full = 0xdead;
 		return EXIT_FASTPATH_NONE;
 	}
 
-	vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
-	if (unlikely((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY))
+	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+	if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
 		kvm_machine_check();
 
-	trace_kvm_exit(vmx->exit_reason, vcpu, KVM_ISA_VMX);
+	trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX);
 
-	if (unlikely(vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
+	if (unlikely(vmx->exit_reason.failed_vmentry))
 		return EXIT_FASTPATH_NONE;
 
 	vmx->loaded_vmcs->launched = 1;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8a83b5edc820..a676db7ac03c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -91,6 +91,29 @@ struct pt_desc {
 	struct pt_ctx guest;
 };
 
+union vmx_exit_reason {
+	struct {
+		u32	basic			: 16;
+		u32	reserved16		: 1;
+		u32	reserved17		: 1;
+		u32	reserved18		: 1;
+		u32	reserved19		: 1;
+		u32	reserved20		: 1;
+		u32	reserved21		: 1;
+		u32	reserved22		: 1;
+		u32	reserved23		: 1;
+		u32	reserved24		: 1;
+		u32	reserved25		: 1;
+		u32	reserved26		: 1;
+		u32	enclave_mode		: 1;
+		u32	smi_pending_mtf		: 1;
+		u32	smi_from_vmx_root	: 1;
+		u32	reserved30		: 1;
+		u32	failed_vmentry		: 1;
+	};
+	u32 full;
+};
+
 /*
  * The nested_vmx structure is part of vcpu_vmx, and holds information we need
  * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -265,7 +288,7 @@ struct vcpu_vmx {
 	int vpid;
 	bool emulation_required;
 
-	u32 exit_reason;
+	union vmx_exit_reason exit_reason;
 
 	/* Posted interrupt descriptor */
 	struct pi_desc pi_desc;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-06-28  8:53 [RFC 0/2] Add support for bus lock VM exit Chenyi Qiang
  2020-06-28  8:53 ` [RFC 1/2] KVM: VMX: Convert vcpu_vmx.exit_reason to a union Chenyi Qiang
@ 2020-06-28  8:53 ` Chenyi Qiang
  2020-07-01  9:04   ` Vitaly Kuznetsov
  1 sibling, 1 reply; 12+ messages in thread
From: Chenyi Qiang @ 2020-06-28  8:53 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: kvm, linux-kernel

Virtual Machine can exploit bus locks to degrade the performance of
system. Bus lock can be caused by split locked access to writeback(WB)
memory or by using locks on uncacheable(UC) memory. The bus lock is
typically >1000 cycles slower than an atomic operation within a cache
line. It also disrupts performance on other cores (which must wait for
the bus lock to be released before their memory operations can
complete).

To address the threat, bus lock VM exit is introduced to notify the VMM
when a bus lock was acquired, allowing it to enforce throttling or other
policy based mitigations.

A VMM can enable VM exit due to bus locks by setting a new "Bus Lock
Detection" VM-execution control(bit 30 of Secondary Processor-based VM
execution controls). If delivery of this VM exit was pre-empted by a
higher priority VM exit, bit 26 of exit-reason is set to 1.

In current implementation, it only records every bus lock acquired in
non-root mode in vcpu->stat.bus_locks and exposes the data through
debugfs. It doesn't implement any further handling but leave it to user.

Document for Bus Lock VM exit is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".

Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/include/asm/vmx.h         |  1 +
 arch/x86/include/asm/vmxfeatures.h |  1 +
 arch/x86/include/uapi/asm/vmx.h    |  4 +++-
 arch/x86/kvm/vmx/vmx.c             | 17 ++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h             |  2 +-
 arch/x86/kvm/x86.c                 |  1 +
 7 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f852ee350beb..efb5a117e11a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1051,6 +1051,7 @@ struct kvm_vcpu_stat {
 	u64 req_event;
 	u64 halt_poll_success_ns;
 	u64 halt_poll_fail_ns;
+	u64 bus_locks;
 };
 
 struct x86_instruction_info;
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cd7de4b401fe..93a880bc31a7 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -73,6 +73,7 @@
 #define SECONDARY_EXEC_PT_USE_GPA		VMCS_CONTROL_BIT(PT_USE_GPA)
 #define SECONDARY_EXEC_TSC_SCALING              VMCS_CONTROL_BIT(TSC_SCALING)
 #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE	VMCS_CONTROL_BIT(USR_WAIT_PAUSE)
+#define SECONDARY_EXEC_BUS_LOCK_DETECTION	VMCS_CONTROL_BIT(BUS_LOCK_DETECTION)
 
 #define PIN_BASED_EXT_INTR_MASK                 VMCS_CONTROL_BIT(INTR_EXITING)
 #define PIN_BASED_NMI_EXITING                   VMCS_CONTROL_BIT(NMI_EXITING)
diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
index 9915990fd8cf..d9a74681a77d 100644
--- a/arch/x86/include/asm/vmxfeatures.h
+++ b/arch/x86/include/asm/vmxfeatures.h
@@ -83,5 +83,6 @@
 #define VMX_FEATURE_TSC_SCALING		( 2*32+ 25) /* Scale hardware TSC when read in guest */
 #define VMX_FEATURE_USR_WAIT_PAUSE	( 2*32+ 26) /* Enable TPAUSE, UMONITOR, UMWAIT in guest */
 #define VMX_FEATURE_ENCLV_EXITING	( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */
+#define VMX_FEATURE_BUS_LOCK_DETECTION	( 2*32+ 30) /* "" VM-Exit when bus lock caused */
 
 #endif /* _ASM_X86_VMXFEATURES_H */
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index b8ff9e8ac0d5..6bddfd7b87be 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -88,6 +88,7 @@
 #define EXIT_REASON_XRSTORS             64
 #define EXIT_REASON_UMWAIT              67
 #define EXIT_REASON_TPAUSE              68
+#define EXIT_REASON_BUS_LOCK		74
 
 #define VMX_EXIT_REASONS \
 	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
@@ -148,7 +149,8 @@
 	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
 	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
 	{ EXIT_REASON_UMWAIT,                "UMWAIT" }, \
-	{ EXIT_REASON_TPAUSE,                "TPAUSE" }
+	{ EXIT_REASON_TPAUSE,                "TPAUSE" }, \
+	{ EXIT_REASON_BUS_LOCK,              "BUS_LOCK" }
 
 #define VMX_EXIT_REASON_FLAGS \
 	{ VMX_EXIT_REASONS_FAILED_VMENTRY,	"FAILED_VMENTRY" }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 647ee9a1b4e6..9622d7486f6d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2463,7 +2463,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 			SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
 			SECONDARY_EXEC_PT_USE_GPA |
 			SECONDARY_EXEC_PT_CONCEAL_VMX |
-			SECONDARY_EXEC_ENABLE_VMFUNC;
+			SECONDARY_EXEC_ENABLE_VMFUNC |
+			SECONDARY_EXEC_BUS_LOCK_DETECTION;
 		if (cpu_has_sgx())
 			opt2 |= SECONDARY_EXEC_ENCLS_EXITING;
 		if (adjust_vmx_controls(min2, opt2,
@@ -5664,6 +5665,12 @@ static int handle_encls(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int handle_bus_lock(struct kvm_vcpu *vcpu)
+{
+	vcpu->stat.bus_locks++;
+	return 1;
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -5720,6 +5727,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_VMFUNC]		      = handle_vmx_instruction,
 	[EXIT_REASON_PREEMPTION_TIMER]	      = handle_preemption_timer,
 	[EXIT_REASON_ENCLS]		      = handle_encls,
+	[EXIT_REASON_BUS_LOCK]                = handle_bus_lock,
 };
 
 static const int kvm_vmx_max_exit_handlers =
@@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	if (unlikely(vmx->exit_reason.failed_vmentry))
 		return EXIT_FASTPATH_NONE;
 
+	/*
+	 * check the exit_reason to see if there is a bus lock
+	 * happened in guest.
+	 */
+	if (vmx->exit_reason.bus_lock_detected)
+		handle_bus_lock(vcpu);
+
 	vmx->loaded_vmcs->launched = 1;
 	vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a676db7ac03c..b501a26778fc 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -104,7 +104,7 @@ union vmx_exit_reason {
 		u32	reserved23		: 1;
 		u32	reserved24		: 1;
 		u32	reserved25		: 1;
-		u32	reserved26		: 1;
+		u32	bus_lock_detected	: 1;
 		u32	enclave_mode		: 1;
 		u32	smi_pending_mtf		: 1;
 		u32	smi_from_vmx_root	: 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00c88c2f34e4..6258b453b8dd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -220,6 +220,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("l1d_flush", l1d_flush),
 	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
 	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT("bus_locks", bus_locks),
 	VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped),
 	VM_STAT("mmu_pte_write", mmu_pte_write),
 	VM_STAT("mmu_pte_updated", mmu_pte_updated),
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-06-28  8:53 ` [RFC 2/2] KVM: VMX: Enable bus lock VM exit Chenyi Qiang
@ 2020-07-01  9:04   ` Vitaly Kuznetsov
  2020-07-01  9:32     ` Xiaoyao Li
  0 siblings, 1 reply; 12+ messages in thread
From: Vitaly Kuznetsov @ 2020-07-01  9:04 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: kvm, linux-kernel, Paolo Bonzini, Sean Christopherson,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Xiaoyao Li

Chenyi Qiang <chenyi.qiang@intel.com> writes:

> Virtual Machine can exploit bus locks to degrade the performance of
> system. Bus lock can be caused by split locked access to writeback(WB)
> memory or by using locks on uncacheable(UC) memory. The bus lock is
> typically >1000 cycles slower than an atomic operation within a cache
> line. It also disrupts performance on other cores (which must wait for
> the bus lock to be released before their memory operations can
> complete).
>
> To address the threat, bus lock VM exit is introduced to notify the VMM
> when a bus lock was acquired, allowing it to enforce throttling or other
> policy based mitigations.
>
> A VMM can enable VM exit due to bus locks by setting a new "Bus Lock
> Detection" VM-execution control(bit 30 of Secondary Processor-based VM
> execution controls). If delivery of this VM exit was pre-empted by a
> higher priority VM exit, bit 26 of exit-reason is set to 1.
>
> In current implementation, it only records every bus lock acquired in
> non-root mode in vcpu->stat.bus_locks and exposes the data through
> debugfs. It doesn't implement any further handling but leave it to user.
>
> Document for Bus Lock VM exit is now available at the latest "Intel
> Architecture Instruction Set Extensions Programming Reference".
>
> Document Link:
> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
>
> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/include/asm/vmx.h         |  1 +
>  arch/x86/include/asm/vmxfeatures.h |  1 +
>  arch/x86/include/uapi/asm/vmx.h    |  4 +++-
>  arch/x86/kvm/vmx/vmx.c             | 17 ++++++++++++++++-
>  arch/x86/kvm/vmx/vmx.h             |  2 +-
>  arch/x86/kvm/x86.c                 |  1 +
>  7 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f852ee350beb..efb5a117e11a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1051,6 +1051,7 @@ struct kvm_vcpu_stat {
>  	u64 req_event;
>  	u64 halt_poll_success_ns;
>  	u64 halt_poll_fail_ns;
> +	u64 bus_locks;
>  };
>  
>  struct x86_instruction_info;
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index cd7de4b401fe..93a880bc31a7 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -73,6 +73,7 @@
>  #define SECONDARY_EXEC_PT_USE_GPA		VMCS_CONTROL_BIT(PT_USE_GPA)
>  #define SECONDARY_EXEC_TSC_SCALING              VMCS_CONTROL_BIT(TSC_SCALING)
>  #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE	VMCS_CONTROL_BIT(USR_WAIT_PAUSE)
> +#define SECONDARY_EXEC_BUS_LOCK_DETECTION	VMCS_CONTROL_BIT(BUS_LOCK_DETECTION)
>  
>  #define PIN_BASED_EXT_INTR_MASK                 VMCS_CONTROL_BIT(INTR_EXITING)
>  #define PIN_BASED_NMI_EXITING                   VMCS_CONTROL_BIT(NMI_EXITING)
> diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
> index 9915990fd8cf..d9a74681a77d 100644
> --- a/arch/x86/include/asm/vmxfeatures.h
> +++ b/arch/x86/include/asm/vmxfeatures.h
> @@ -83,5 +83,6 @@
>  #define VMX_FEATURE_TSC_SCALING		( 2*32+ 25) /* Scale hardware TSC when read in guest */
>  #define VMX_FEATURE_USR_WAIT_PAUSE	( 2*32+ 26) /* Enable TPAUSE, UMONITOR, UMWAIT in guest */
>  #define VMX_FEATURE_ENCLV_EXITING	( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */
> +#define VMX_FEATURE_BUS_LOCK_DETECTION	( 2*32+ 30) /* "" VM-Exit when bus lock caused */
>  
>  #endif /* _ASM_X86_VMXFEATURES_H */
> diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
> index b8ff9e8ac0d5..6bddfd7b87be 100644
> --- a/arch/x86/include/uapi/asm/vmx.h
> +++ b/arch/x86/include/uapi/asm/vmx.h
> @@ -88,6 +88,7 @@
>  #define EXIT_REASON_XRSTORS             64
>  #define EXIT_REASON_UMWAIT              67
>  #define EXIT_REASON_TPAUSE              68
> +#define EXIT_REASON_BUS_LOCK		74
>  
>  #define VMX_EXIT_REASONS \
>  	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
> @@ -148,7 +149,8 @@
>  	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
>  	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
>  	{ EXIT_REASON_UMWAIT,                "UMWAIT" }, \
> -	{ EXIT_REASON_TPAUSE,                "TPAUSE" }
> +	{ EXIT_REASON_TPAUSE,                "TPAUSE" }, \
> +	{ EXIT_REASON_BUS_LOCK,              "BUS_LOCK" }
>  
>  #define VMX_EXIT_REASON_FLAGS \
>  	{ VMX_EXIT_REASONS_FAILED_VMENTRY,	"FAILED_VMENTRY" }
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 647ee9a1b4e6..9622d7486f6d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2463,7 +2463,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>  			SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
>  			SECONDARY_EXEC_PT_USE_GPA |
>  			SECONDARY_EXEC_PT_CONCEAL_VMX |
> -			SECONDARY_EXEC_ENABLE_VMFUNC;
> +			SECONDARY_EXEC_ENABLE_VMFUNC |
> +			SECONDARY_EXEC_BUS_LOCK_DETECTION;
>  		if (cpu_has_sgx())
>  			opt2 |= SECONDARY_EXEC_ENCLS_EXITING;
>  		if (adjust_vmx_controls(min2, opt2,
> @@ -5664,6 +5665,12 @@ static int handle_encls(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static int handle_bus_lock(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->stat.bus_locks++;
> +	return 1;
> +}
> +
>  /*
>   * The exit handlers return 1 if the exit was handled fully and guest execution
>   * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
> @@ -5720,6 +5727,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[EXIT_REASON_VMFUNC]		      = handle_vmx_instruction,
>  	[EXIT_REASON_PREEMPTION_TIMER]	      = handle_preemption_timer,
>  	[EXIT_REASON_ENCLS]		      = handle_encls,
> +	[EXIT_REASON_BUS_LOCK]                = handle_bus_lock,
>  };
>  
>  static const int kvm_vmx_max_exit_handlers =
> @@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	if (unlikely(vmx->exit_reason.failed_vmentry))
>  		return EXIT_FASTPATH_NONE;
>  
> +	/*
> +	 * check the exit_reason to see if there is a bus lock
> +	 * happened in guest.
> +	 */
> +	if (vmx->exit_reason.bus_lock_detected)
> +		handle_bus_lock(vcpu);

In case the ultimate goal is to have an exit to userspace on bus lock,
the two ways to reach handle_bus_lock() are very different: in case
we're handling EXIT_REASON_BUS_LOCK we can easily drop to userspace by
returning 0 but what are we going to do in case of
exit_reason.bus_lock_detected? The 'higher priority VM exit' may require
exit to userspace too. So what's the plan? Maybe we can ignore the case
when we're exiting to userspace for some other reason as this is slow
already and force the exit otherwise? And should we actually introduce
the KVM_EXIT_BUS_LOCK and a capability to enable it here?

> +
>  	vmx->loaded_vmcs->launched = 1;
>  	vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>  
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index a676db7ac03c..b501a26778fc 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -104,7 +104,7 @@ union vmx_exit_reason {
>  		u32	reserved23		: 1;
>  		u32	reserved24		: 1;
>  		u32	reserved25		: 1;
> -		u32	reserved26		: 1;
> +		u32	bus_lock_detected	: 1;
>  		u32	enclave_mode		: 1;
>  		u32	smi_pending_mtf		: 1;
>  		u32	smi_from_vmx_root	: 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 00c88c2f34e4..6258b453b8dd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -220,6 +220,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>  	VCPU_STAT("l1d_flush", l1d_flush),
>  	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
>  	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +	VCPU_STAT("bus_locks", bus_locks),
>  	VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped),
>  	VM_STAT("mmu_pte_write", mmu_pte_write),
>  	VM_STAT("mmu_pte_updated", mmu_pte_updated),

-- 
Vitaly


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-07-01  9:04   ` Vitaly Kuznetsov
@ 2020-07-01  9:32     ` Xiaoyao Li
  2020-07-01 12:44       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 12+ messages in thread
From: Xiaoyao Li @ 2020-07-01  9:32 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Chenyi Qiang
  Cc: kvm, linux-kernel, Paolo Bonzini, Sean Christopherson,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On 7/1/2020 5:04 PM, Vitaly Kuznetsov wrote:
> Chenyi Qiang <chenyi.qiang@intel.com> writes:
[...]
>>   static const int kvm_vmx_max_exit_handlers =
>> @@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>   	if (unlikely(vmx->exit_reason.failed_vmentry))
>>   		return EXIT_FASTPATH_NONE;
>>   
>> +	/*
>> +	 * check the exit_reason to see if there is a bus lock
>> +	 * happened in guest.
>> +	 */
>> +	if (vmx->exit_reason.bus_lock_detected)
>> +		handle_bus_lock(vcpu);
> 
> In case the ultimate goal is to have an exit to userspace on bus lock,

I don't think we will need an exit to userspace on bus lock. See below.

> the two ways to reach handle_bus_lock() are very different: in case
> we're handling EXIT_REASON_BUS_LOCK we can easily drop to userspace by
> returning 0 but what are we going to do in case of
> exit_reason.bus_lock_detected? The 'higher priority VM exit' may require
> exit to userspace too. So what's the plan? Maybe we can ignore the case
> when we're exiting to userspace for some other reason as this is slow
> already and force the exit otherwise? 

> And should we actually introduce
> the KVM_EXIT_BUS_LOCK and a capability to enable it here?
> 

Introducing KVM_EXIT_BUS_LOCK maybe help nothing. No matter 
EXIT_REASON_BUS_LOCK or exit_reason.bus_lock_detected, the bus lock has 
already happened. Exit to userspace cannot prevent bus lock, so what 
userspace can do is recording and counting as what this patch does in 
vcpu->stat.bus_locks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-07-01  9:32     ` Xiaoyao Li
@ 2020-07-01 12:44       ` Vitaly Kuznetsov
  2020-07-01 14:12         ` Xiaoyao Li
  0 siblings, 1 reply; 12+ messages in thread
From: Vitaly Kuznetsov @ 2020-07-01 12:44 UTC (permalink / raw)
  To: Xiaoyao Li, Chenyi Qiang
  Cc: kvm, linux-kernel, Paolo Bonzini, Sean Christopherson,
	Wanpeng Li, Jim Mattson, Joerg Roedel

Xiaoyao Li <xiaoyao.li@intel.com> writes:

> On 7/1/2020 5:04 PM, Vitaly Kuznetsov wrote:
>> Chenyi Qiang <chenyi.qiang@intel.com> writes:
> [...]
>>>   static const int kvm_vmx_max_exit_handlers =
>>> @@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>   	if (unlikely(vmx->exit_reason.failed_vmentry))
>>>   		return EXIT_FASTPATH_NONE;
>>>   
>>> +	/*
>>> +	 * check the exit_reason to see if there is a bus lock
>>> +	 * happened in guest.
>>> +	 */
>>> +	if (vmx->exit_reason.bus_lock_detected)
>>> +		handle_bus_lock(vcpu);
>> 
>> In case the ultimate goal is to have an exit to userspace on bus lock,
>
> I don't think we will need an exit to userspace on bus lock. See below.
>
>> the two ways to reach handle_bus_lock() are very different: in case
>> we're handling EXIT_REASON_BUS_LOCK we can easily drop to userspace by
>> returning 0 but what are we going to do in case of
>> exit_reason.bus_lock_detected? The 'higher priority VM exit' may require
>> exit to userspace too. So what's the plan? Maybe we can ignore the case
>> when we're exiting to userspace for some other reason as this is slow
>> already and force the exit otherwise? 
>
>> And should we actually introduce
>> the KVM_EXIT_BUS_LOCK and a capability to enable it here?
>> 
>
> Introducing KVM_EXIT_BUS_LOCK maybe help nothing. No matter 
> EXIT_REASON_BUS_LOCK or exit_reason.bus_lock_detected, the bus lock has 
> already happened. Exit to userspace cannot prevent bus lock, so what 
> userspace can do is recording and counting as what this patch does in 
> vcpu->stat.bus_locks.

Exiting to userspace would allow to implement custom 'throttling'
policies to mitigate the 'noisy neighbour' problem. The simplest would
be to just inject some sleep time.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-07-01 12:44       ` Vitaly Kuznetsov
@ 2020-07-01 14:12         ` Xiaoyao Li
  2020-07-01 14:49           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 12+ messages in thread
From: Xiaoyao Li @ 2020-07-01 14:12 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Chenyi Qiang
  Cc: kvm, linux-kernel, Paolo Bonzini, Sean Christopherson,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On 7/1/2020 8:44 PM, Vitaly Kuznetsov wrote:
> Xiaoyao Li <xiaoyao.li@intel.com> writes:
> 
>> On 7/1/2020 5:04 PM, Vitaly Kuznetsov wrote:
>>> Chenyi Qiang <chenyi.qiang@intel.com> writes:
>> [...]
>>>>    static const int kvm_vmx_max_exit_handlers =
>>>> @@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>>    	if (unlikely(vmx->exit_reason.failed_vmentry))
>>>>    		return EXIT_FASTPATH_NONE;
>>>>    
>>>> +	/*
>>>> +	 * check the exit_reason to see if there is a bus lock
>>>> +	 * happened in guest.
>>>> +	 */
>>>> +	if (vmx->exit_reason.bus_lock_detected)
>>>> +		handle_bus_lock(vcpu);
>>>
>>> In case the ultimate goal is to have an exit to userspace on bus lock,
>>
>> I don't think we will need an exit to userspace on bus lock. See below.
>>
>>> the two ways to reach handle_bus_lock() are very different: in case
>>> we're handling EXIT_REASON_BUS_LOCK we can easily drop to userspace by
>>> returning 0 but what are we going to do in case of
>>> exit_reason.bus_lock_detected? The 'higher priority VM exit' may require
>>> exit to userspace too. So what's the plan? Maybe we can ignore the case
>>> when we're exiting to userspace for some other reason as this is slow
>>> already and force the exit otherwise?
>>
>>> And should we actually introduce
>>> the KVM_EXIT_BUS_LOCK and a capability to enable it here?
>>>
>>
>> Introducing KVM_EXIT_BUS_LOCK maybe help nothing. No matter
>> EXIT_REASON_BUS_LOCK or exit_reason.bus_lock_detected, the bus lock has
>> already happened. Exit to userspace cannot prevent bus lock, so what
>> userspace can do is recording and counting as what this patch does in
>> vcpu->stat.bus_locks.
> 
> Exiting to userspace would allow to implement custom 'throttling'
> policies to mitigate the 'noisy neighbour' problem. The simplest would
> be to just inject some sleep time.
> 

So you want an exit to userspace for every bus lock and leave it all to 
userspace. Yes, it's doable.

As you said, the exit_reason.bus_lock_detected case is the tricky one. 
We cannot do the similar to extend vcpu->run->exit_reason, this breaks 
ABI. Maybe we can extend the vcpu->run->flags to indicate bus lock 
detected for the other exit reason?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-07-01 14:12         ` Xiaoyao Li
@ 2020-07-01 14:49           ` Vitaly Kuznetsov
  2020-07-02  9:15             ` Xiaoyao Li
  2020-07-23  1:21             ` Sean Christopherson
  0 siblings, 2 replies; 12+ messages in thread
From: Vitaly Kuznetsov @ 2020-07-01 14:49 UTC (permalink / raw)
  To: Xiaoyao Li, Chenyi Qiang
  Cc: kvm, linux-kernel, Paolo Bonzini, Sean Christopherson,
	Wanpeng Li, Jim Mattson, Joerg Roedel

Xiaoyao Li <xiaoyao.li@intel.com> writes:

> On 7/1/2020 8:44 PM, Vitaly Kuznetsov wrote:
>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>> 
>>> On 7/1/2020 5:04 PM, Vitaly Kuznetsov wrote:
>>>> Chenyi Qiang <chenyi.qiang@intel.com> writes:
>>> [...]
>>>>>    static const int kvm_vmx_max_exit_handlers =
>>>>> @@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>>>    	if (unlikely(vmx->exit_reason.failed_vmentry))
>>>>>    		return EXIT_FASTPATH_NONE;
>>>>>    
>>>>> +	/*
>>>>> +	 * check the exit_reason to see if there is a bus lock
>>>>> +	 * happened in guest.
>>>>> +	 */
>>>>> +	if (vmx->exit_reason.bus_lock_detected)
>>>>> +		handle_bus_lock(vcpu);
>>>>
>>>> In case the ultimate goal is to have an exit to userspace on bus lock,
>>>
>>> I don't think we will need an exit to userspace on bus lock. See below.
>>>
>>>> the two ways to reach handle_bus_lock() are very different: in case
>>>> we're handling EXIT_REASON_BUS_LOCK we can easily drop to userspace by
>>>> returning 0 but what are we going to do in case of
>>>> exit_reason.bus_lock_detected? The 'higher priority VM exit' may require
>>>> exit to userspace too. So what's the plan? Maybe we can ignore the case
>>>> when we're exiting to userspace for some other reason as this is slow
>>>> already and force the exit otherwise?
>>>
>>>> And should we actually introduce
>>>> the KVM_EXIT_BUS_LOCK and a capability to enable it here?
>>>>
>>>
>>> Introducing KVM_EXIT_BUS_LOCK maybe help nothing. No matter
>>> EXIT_REASON_BUS_LOCK or exit_reason.bus_lock_detected, the bus lock has
>>> already happened. Exit to userspace cannot prevent bus lock, so what
>>> userspace can do is recording and counting as what this patch does in
>>> vcpu->stat.bus_locks.
>> 
>> Exiting to userspace would allow to implement custom 'throttling'
>> policies to mitigate the 'noisy neighbour' problem. The simplest would
>> be to just inject some sleep time.
>> 
>
> So you want an exit to userspace for every bus lock and leave it all to 
> userspace. Yes, it's doable.
>

In some cases we may not even want to have a VM exit: think
e.g. real-time/partitioning case when even in case of bus lock we may
not want to add additional latency just to count such events. I'd
suggest we make the new capability tri-state:
- disabled (no vmexit, default)
- stats only (what this patch does)
- userspace exit
But maybe this is an overkill, I'd like to hear what others think.

> As you said, the exit_reason.bus_lock_detected case is the tricky one. 
> We cannot do the similar to extend vcpu->run->exit_reason, this breaks 
> ABI. Maybe we can extend the vcpu->run->flags to indicate bus lock 
> detected for the other exit reason?

This is likely the easiest solution.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-07-01 14:49           ` Vitaly Kuznetsov
@ 2020-07-02  9:15             ` Xiaoyao Li
  2020-07-23  1:21             ` Sean Christopherson
  1 sibling, 0 replies; 12+ messages in thread
From: Xiaoyao Li @ 2020-07-02  9:15 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Chenyi Qiang
  Cc: kvm, linux-kernel, Paolo Bonzini, Sean Christopherson,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On 7/1/2020 10:49 PM, Vitaly Kuznetsov wrote:
> Xiaoyao Li <xiaoyao.li@intel.com> writes:
> 
>> On 7/1/2020 8:44 PM, Vitaly Kuznetsov wrote:
>>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>>
>>>> On 7/1/2020 5:04 PM, Vitaly Kuznetsov wrote:
>>>>> Chenyi Qiang <chenyi.qiang@intel.com> writes:
>>>> [...]
>>>>>>     static const int kvm_vmx_max_exit_handlers =
>>>>>> @@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>>>>     	if (unlikely(vmx->exit_reason.failed_vmentry))
>>>>>>     		return EXIT_FASTPATH_NONE;
>>>>>>     
>>>>>> +	/*
>>>>>> +	 * check the exit_reason to see if there is a bus lock
>>>>>> +	 * happened in guest.
>>>>>> +	 */
>>>>>> +	if (vmx->exit_reason.bus_lock_detected)
>>>>>> +		handle_bus_lock(vcpu);
>>>>>
>>>>> In case the ultimate goal is to have an exit to userspace on bus lock,
>>>>
>>>> I don't think we will need an exit to userspace on bus lock. See below.
>>>>
>>>>> the two ways to reach handle_bus_lock() are very different: in case
>>>>> we're handling EXIT_REASON_BUS_LOCK we can easily drop to userspace by
>>>>> returning 0 but what are we going to do in case of
>>>>> exit_reason.bus_lock_detected? The 'higher priority VM exit' may require
>>>>> exit to userspace too. So what's the plan? Maybe we can ignore the case
>>>>> when we're exiting to userspace for some other reason as this is slow
>>>>> already and force the exit otherwise?
>>>>
>>>>> And should we actually introduce
>>>>> the KVM_EXIT_BUS_LOCK and a capability to enable it here?
>>>>>
>>>>
>>>> Introducing KVM_EXIT_BUS_LOCK maybe help nothing. No matter
>>>> EXIT_REASON_BUS_LOCK or exit_reason.bus_lock_detected, the bus lock has
>>>> already happened. Exit to userspace cannot prevent bus lock, so what
>>>> userspace can do is recording and counting as what this patch does in
>>>> vcpu->stat.bus_locks.
>>>
>>> Exiting to userspace would allow to implement custom 'throttling'
>>> policies to mitigate the 'noisy neighbour' problem. The simplest would
>>> be to just inject some sleep time.
>>>
>>
>> So you want an exit to userspace for every bus lock and leave it all to
>> userspace. Yes, it's doable.
>>
> 
> In some cases we may not even want to have a VM exit: think
> e.g. real-time/partitioning case when even in case of bus lock we may
> not want to add additional latency just to count such events. 

For real-time case, the bus lock needs to be avoided at all, since bus 
lock takes many cycles and prevents others accessing memory. If no bus 
lock, then no bus lock vm exit to worry about. If bus lock, the latency 
requirement maybe cannot be met due to it.

> I'd
> suggest we make the new capability tri-state:
> - disabled (no vmexit, default)
> - stats only (what this patch does)
> - userspace exit
> But maybe this is an overkill, I'd like to hear what others think.

Yeah. Others' thought is very welcomed.

Besides, for how to throttle, KVM maybe has to take kernel policy into 
account. Since in the spec, there is another feature for bare metal to 
raise a #DB for bus lock. Native kernel likely implements some policy to 
restrict the rate the bus lock can happen. So qemu threads will have to 
follow that as well.

>> As you said, the exit_reason.bus_lock_detected case is the tricky one.
>> We cannot do the similar to extend vcpu->run->exit_reason, this breaks
>> ABI. Maybe we can extend the vcpu->run->flags to indicate bus lock
>> detected for the other exit reason?
> 
> This is likely the easiest solution.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-07-01 14:49           ` Vitaly Kuznetsov
  2020-07-02  9:15             ` Xiaoyao Li
@ 2020-07-23  1:21             ` Sean Christopherson
  2020-07-27  4:38               ` Xiaoyao Li
  1 sibling, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2020-07-23  1:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Xiaoyao Li, Chenyi Qiang, kvm, linux-kernel, Paolo Bonzini,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Wed, Jul 01, 2020 at 04:49:49PM +0200, Vitaly Kuznetsov wrote:
> Xiaoyao Li <xiaoyao.li@intel.com> writes:
> > So you want an exit to userspace for every bus lock and leave it all to 
> > userspace. Yes, it's doable.
> 
> In some cases we may not even want to have a VM exit: think
> e.g. real-time/partitioning case when even in case of bus lock we may
> not want to add additional latency just to count such events.

Hmm, I suspect this isn't all that useful for real-time cases because they'd
probably want to prevent the split lock in the first place, e.g. would prefer
to use the #AC variant in fatal mode.  Of course, the availability of split
lock #AC is a whole other can of worms.

But anyways, I 100% agree that this needs either an off-by-default module
param or an opt-in per-VM capability.

> I'd suggest we make the new capability tri-state:
> - disabled (no vmexit, default)
> - stats only (what this patch does)
> - userspace exit
> But maybe this is an overkill, I'd like to hear what others think.

Userspace exit would also be interesting for debug.  Another throttling
option would be schedule() or cond_reched(), though that's probably getting
into overkill territory.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-07-23  1:21             ` Sean Christopherson
@ 2020-07-27  4:38               ` Xiaoyao Li
  2020-07-28 16:25                 ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Xiaoyao Li @ 2020-07-27  4:38 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov
  Cc: Chenyi Qiang, kvm, linux-kernel, Paolo Bonzini, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On 7/23/2020 9:21 AM, Sean Christopherson wrote:
> On Wed, Jul 01, 2020 at 04:49:49PM +0200, Vitaly Kuznetsov wrote:
>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>> So you want an exit to userspace for every bus lock and leave it all to
>>> userspace. Yes, it's doable.
>>
>> In some cases we may not even want to have a VM exit: think
>> e.g. real-time/partitioning case when even in case of bus lock we may
>> not want to add additional latency just to count such events.
> 
> Hmm, I suspect this isn't all that useful for real-time cases because they'd
> probably want to prevent the split lock in the first place, e.g. would prefer
> to use the #AC variant in fatal mode.  Of course, the availability of split
> lock #AC is a whole other can of worms.
> 
> But anyways, I 100% agree that this needs either an off-by-default module
> param or an opt-in per-VM capability.
> 

Maybe on-by-default or an opt-out per-VM capability?
Turning it on introduces no overhead if no bus lock happens in guest but 
gives KVM the capability to track every potential bus lock. If user 
doesn't want the extra latency due to bus lock VM exit, it's better try 
to fix the bus lock, which also incurs high latency.

>> I'd suggest we make the new capability tri-state:
>> - disabled (no vmexit, default)
>> - stats only (what this patch does)
>> - userspace exit
>> But maybe this is an overkill, I'd like to hear what others think.
> 
> Userspace exit would also be interesting for debug.  Another throttling
> option would be schedule() or cond_reched(), though that's probably getting
> into overkill territory.
> 

We're going to leverage host's policy, i.e., calling 
handle_user_bus_lock(), for throttling, as proposed in 
https://lkml.kernel.org/r/1595021700-68460-1-git-send-email-fenghua.yu@intel.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit
  2020-07-27  4:38               ` Xiaoyao Li
@ 2020-07-28 16:25                 ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2020-07-28 16:25 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Vitaly Kuznetsov, Chenyi Qiang, kvm, linux-kernel, Paolo Bonzini,
	Wanpeng Li, Jim Mattson, Joerg Roedel

On Mon, Jul 27, 2020 at 12:38:53PM +0800, Xiaoyao Li wrote:
> On 7/23/2020 9:21 AM, Sean Christopherson wrote:
> >On Wed, Jul 01, 2020 at 04:49:49PM +0200, Vitaly Kuznetsov wrote:
> >>Xiaoyao Li <xiaoyao.li@intel.com> writes:
> >>>So you want an exit to userspace for every bus lock and leave it all to
> >>>userspace. Yes, it's doable.
> >>
> >>In some cases we may not even want to have a VM exit: think
> >>e.g. real-time/partitioning case when even in case of bus lock we may
> >>not want to add additional latency just to count such events.
> >
> >Hmm, I suspect this isn't all that useful for real-time cases because they'd
> >probably want to prevent the split lock in the first place, e.g. would prefer
> >to use the #AC variant in fatal mode.  Of course, the availability of split
> >lock #AC is a whole other can of worms.
> >
> >But anyways, I 100% agree that this needs either an off-by-default module
> >param or an opt-in per-VM capability.
> >
> 
> Maybe on-by-default or an opt-out per-VM capability?
> Turning it on introduces no overhead if no bus lock happens in guest but
> gives KVM the capability to track every potential bus lock. If user doesn't
> want the extra latency due to bus lock VM exit, it's better try to fix the
> bus lock, which also incurs high latency.

Except that I doubt the physical system owner and VM owner are the same
entity in the vast majority of KVM use cases.  So yeah, in a perfect world
the guest application that's causing bus locks would be fixed, but in
practice there is likely no sane way for the KVM owner to inform the guest
application owner that their application is broken, let alone fix said
application.

The caveat would be that we may need to enable this by default if the host
kernel policy mandates it.

> >>I'd suggest we make the new capability tri-state:
> >>- disabled (no vmexit, default)
> >>- stats only (what this patch does)
> >>- userspace exit
> >>But maybe this is an overkill, I'd like to hear what others think.
> >
> >Userspace exit would also be interesting for debug.  Another throttling
> >option would be schedule() or cond_reched(), though that's probably getting
> >into overkill territory.
> >
> 
> We're going to leverage host's policy, i.e., calling handle_user_bus_lock(),
> for throttling, as proposed in https://lkml.kernel.org/r/1595021700-68460-1-git-send-email-fenghua.yu@intel.com
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-28 16:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-28  8:53 [RFC 0/2] Add support for bus lock VM exit Chenyi Qiang
2020-06-28  8:53 ` [RFC 1/2] KVM: VMX: Convert vcpu_vmx.exit_reason to a union Chenyi Qiang
2020-06-28  8:53 ` [RFC 2/2] KVM: VMX: Enable bus lock VM exit Chenyi Qiang
2020-07-01  9:04   ` Vitaly Kuznetsov
2020-07-01  9:32     ` Xiaoyao Li
2020-07-01 12:44       ` Vitaly Kuznetsov
2020-07-01 14:12         ` Xiaoyao Li
2020-07-01 14:49           ` Vitaly Kuznetsov
2020-07-02  9:15             ` Xiaoyao Li
2020-07-23  1:21             ` Sean Christopherson
2020-07-27  4:38               ` Xiaoyao Li
2020-07-28 16:25                 ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).