[PATCH v2 0/6] kvm/split_lock: Add feature split lock detection support in kvm

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/6] kvm/split_lock: Add feature split lock detection support in kvm
@ 2020-02-03 15:16 Xiaoyao Li
  2020-02-03 15:16 ` [PATCH v2 1/6] x86/split_lock: Add and export get_split_lock_detect_state() Xiaoyao Li
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-03 15:16 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight, Xiaoyao Li

This version adds the virtualization of split lock detection for guest
in patch 5 and patch 6.

No matter whether we advertise split lock detection to guest, we have to make
a choice between not burn the old guest and prevent DoS attack from guest since
we cannot identify whether a guest is malicious.

Since sld_warn mode also allows userspace applications to do split lock
we can extend the similar policy to guest that if host is sld_warn, we allow
guest to generate split lock by clearing MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit
when vcpu is running.

If host is sld_fatal mode and guest doesn't set its SPLIT_LOCK_DETECT bit we
forward split lock #AC to user space, similar as sending SIGBUS.

Xiaoyao Li (6):
  x86/split_lock: Add and export get_split_lock_detect_state()
  x86/split_lock: Add and export split_lock_detect_set()
  kvm: x86: Emulate split-lock access as a write
  kvm: vmx: Extend VMX's #AC handding for split lock in guest
  kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES
  x86: vmx: virtualize split lock detection

 arch/x86/include/asm/cpu.h      | 13 +++++
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kernel/cpu/intel.c     | 18 ++++--
 arch/x86/kvm/cpuid.c            |  5 +-
 arch/x86/kvm/vmx/vmx.c          | 98 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h          |  4 ++
 arch/x86/kvm/x86.c              | 44 ++++++++++++++-
 7 files changed, 171 insertions(+), 12 deletions(-)

-- 
2.23.0

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 1/6] x86/split_lock: Add and export get_split_lock_detect_state()
  2020-02-03 15:16 [PATCH v2 0/6] kvm/split_lock: Add feature split lock detection support in kvm Xiaoyao Li
@ 2020-02-03 15:16 ` Xiaoyao Li
  2020-02-03 21:45   ` Sean Christopherson
  2020-02-03 15:16 ` [PATCH v2 2/6] x86/split_lock: Add and export split_lock_detect_set() Xiaoyao Li
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-03 15:16 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight, Xiaoyao Li

get_split_lock_detect_state() will be used by KVM module to get sld_state.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/include/asm/cpu.h  | 12 ++++++++++++
 arch/x86/kernel/cpu/intel.c | 12 ++++++------
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index ff6f3ca649b3..167d0539e0ad 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,11 +40,23 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
 #ifdef CONFIG_CPU_SUP_INTEL
+extern enum split_lock_detect_state get_split_lock_detect_state(void);
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 #else
+static inline enum split_lock_detect_state get_split_lock_detect_state(void)
+{
+	return sld_off;
+}
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
 static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index db3e745e5d47..a810cd022db5 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -33,12 +33,6 @@
 #include <asm/apic.h>
 #endif
 
-enum split_lock_detect_state {
-	sld_off = 0,
-	sld_warn,
-	sld_fatal,
-};
-
 /*
  * Default to sld_off because most systems do not support split lock detection
  * split_lock_setup() will switch this to sld_warn on systems that support
@@ -968,6 +962,12 @@ cpu_dev_register(intel_cpu_dev);
 #undef pr_fmt
 #define pr_fmt(fmt) "x86/split lock detection: " fmt
 
+enum split_lock_detect_state get_split_lock_detect_state(void)
+{
+	return sld_state;
+}
+EXPORT_SYMBOL_GPL(get_split_lock_detect_state);
+
 static const struct {
 	const char			*option;
 	enum split_lock_detect_state	state;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 2/6] x86/split_lock: Add and export split_lock_detect_set()
  2020-02-03 15:16 [PATCH v2 0/6] kvm/split_lock: Add feature split lock detection support in kvm Xiaoyao Li
  2020-02-03 15:16 ` [PATCH v2 1/6] x86/split_lock: Add and export get_split_lock_detect_state() Xiaoyao Li
@ 2020-02-03 15:16 ` Xiaoyao Li
  2020-02-03 15:16 ` [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write Xiaoyao Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-03 15:16 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight, Xiaoyao Li

Add and export split_lock_detect_set(), which will be used by KVM module
to change the MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit to switch SLD.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/include/asm/cpu.h  | 1 +
 arch/x86/kernel/cpu/intel.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 167d0539e0ad..b46262afa6c1 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -52,6 +52,7 @@ extern enum split_lock_detect_state get_split_lock_detect_state(void);
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void split_lock_detect_set(bool on);
 #else
 static inline enum split_lock_detect_state get_split_lock_detect_state(void)
 {
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index a810cd022db5..44138dd64808 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1088,6 +1088,12 @@ void switch_to_sld(unsigned long tifn)
 	__sld_msr_set(!(tifn & _TIF_SLD));
 }
 
+void split_lock_detect_set(bool on)
+{
+	__sld_msr_set(on);
+}
+EXPORT_SYMBOL_GPL(split_lock_detect_set);
+
 #define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
 
 /*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-03 15:16 [PATCH v2 0/6] kvm/split_lock: Add feature split lock detection support in kvm Xiaoyao Li
  2020-02-03 15:16 ` [PATCH v2 1/6] x86/split_lock: Add and export get_split_lock_detect_state() Xiaoyao Li
  2020-02-03 15:16 ` [PATCH v2 2/6] x86/split_lock: Add and export split_lock_detect_set() Xiaoyao Li
@ 2020-02-03 15:16 ` Xiaoyao Li
  2020-02-03 20:54   ` Sean Christopherson
  2020-02-11 12:20   ` Paolo Bonzini
  2020-02-03 15:16 ` [PATCH v2 4/6] kvm: vmx: Extend VMX's #AC handding for split lock in guest Xiaoyao Li
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-03 15:16 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight, Xiaoyao Li

If split lock detect is enabled (warn/fatal), #AC handler calls die()
when split lock happens in kernel.

A sane guest should never tigger emulation on a split-lock access, but
it cannot prevent malicous guest from doing this. So just emulating the
access as a write if it's a split-lock access to avoid malicous guest
polluting the kernel log.

More detail analysis can be found:
https://lkml.kernel.org/r/20200131200134.GD18946@linux.intel.com

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/kvm/x86.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2d3be7f3ad67..821b7404c0fd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5847,6 +5847,13 @@ static int emulator_write_emulated(struct x86_emulate_ctxt *ctxt,
 	(cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old))
 #endif
 
+static inline bool across_cache_line_access(gpa_t gpa, unsigned int bytes)
+{
+	unsigned int cache_line_size = cache_line_size();
+
+	return (gpa & (cache_line_size - 1)) + bytes > cache_line_size;
+}
+
 static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 				     unsigned long addr,
 				     const void *old,
@@ -5873,6 +5880,10 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 	if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
 		goto emul_write;
 
+	if (get_split_lock_detect_state() != sld_off &&
+	    across_cache_line_access(gpa, bytes))
+		goto emul_write;
+
 	if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map))
 		goto emul_write;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 4/6] kvm: vmx: Extend VMX's #AC handding for split lock in guest
  2020-02-03 15:16 [PATCH v2 0/6] kvm/split_lock: Add feature split lock detection support in kvm Xiaoyao Li
                   ` (2 preceding siblings ...)
  2020-02-03 15:16 ` [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write Xiaoyao Li
@ 2020-02-03 15:16 ` Xiaoyao Li
  2020-02-03 21:14   ` Sean Christopherson
  2020-02-03 15:16 ` [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES Xiaoyao Li
  2020-02-03 15:16 ` [PATCH v2 6/6] x86: vmx: virtualize split lock detection Xiaoyao Li
  5 siblings, 1 reply; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-03 15:16 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight, Xiaoyao Li

There are two types of #AC can be generated in Intel CPUs:
 1. legacy alignment check #AC;
 2. split lock #AC;

Legacy alignment check #AC can be injected to guest if guest has enabled
alignemnet check.

When host enables split lock detection, i.e., split_lock_detect != off,
guest will receive an unexpected #AC when there is a split lock happens
since KVM doesn't virtualize this feature to guest hardware value of
MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit stays unchanged when vcpu is running.

Since old guests lack split_lock #AC handler and may have split lock buges.
To make them survive from split lock, applying the similar policy
as host's split lock detect configuration:
 - host split lock detect is sld_warn:
   warn the split lock happened in guest, and disabling split lock
   detect during vcpu is running to allow the guest to continue running.
 - host split lock detect is sld_fatal:
   forwarding #AC to userspace, somewhat similar as sending SIGBUS.

Please note:
1. If sld_warn and SMT is enabled, the split lock in guest's vcpu
leads to disable split lock detect on the sibling CPU thread during
the vcpu is running.

2. When host is sld_warn, it allows guest to generate split lock which also
opens the door for malicious guest to do DoS attack. It is same that in
sld_warn mode, userspace application can do DoS attack.

3. If want to prevent DoS attack from guest, host must use sld_fatal mode.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 48 +++++++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/vmx.h |  3 +++
 2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c475fa2aaae0..93e3370c5f84 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4233,6 +4233,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vmx->msr_ia32_umwait_control = 0;
 
+	vmx->disable_split_lock_detect = false;
+
 	vcpu->arch.microcode_version = 0x100000000ULL;
 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
 	vmx->hv_deadline_tsc = -1;
@@ -4557,6 +4559,12 @@ static int handle_machine_check(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
+{
+	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
+	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
+}
+
 static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4622,9 +4630,6 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		return handle_rmode_exception(vcpu, ex_no, error_code);
 
 	switch (ex_no) {
-	case AC_VECTOR:
-		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
-		return 1;
 	case DB_VECTOR:
 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
 		if (!(vcpu->guest_debug &
@@ -4653,6 +4658,33 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
 		kvm_run->debug.arch.exception = ex_no;
 		break;
+	case AC_VECTOR:
+		/*
+		 * Inject #AC back to guest only when legacy alignment check
+		 * enabled.
+		 * Otherwise, it must be an unexpected split-lock #AC for guest
+		 * since KVM keeps hardware SPLIT_LOCK_DETECT bit unchanged
+		 * when vcpu is running.
+		 *  - If sld_state == sld_warn, treat it similar as user space
+		 *    process that warn and allow it to continue running.
+		 *    In this case, setting vmx->diasble_split_lock_detect to
+		 *    true so that it will toggle MSR_TEST.SPLIT_LOCK_DETECT
+		 *    bit during every following VM Entry and Exit;
+		 *  - If sld_state == sld_fatal, it forwards #AC to userspace,
+		 *    similar as sending SIGBUS.
+		 */
+		if (guest_cpu_alignment_check_enabled(vcpu) ||
+		    WARN_ON(get_split_lock_detect_state() == sld_off)) {
+			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
+			return 1;
+		}
+		if (get_split_lock_detect_state() == sld_warn) {
+			pr_warn("kvm: split lock #AC happened in %s [%d]\n",
+				current->comm, current->pid);
+			vmx->disable_split_lock_detect = true;
+			return 1;
+		}
+		/* fall through*/
 	default:
 		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
 		kvm_run->ex.exception = ex_no;
@@ -6530,6 +6562,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
 
+	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
+	    unlikely(vmx->disable_split_lock_detect) &&
+	    !test_tsk_thread_flag(current, TIF_SLD))
+		split_lock_detect_set(false);
+
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
 		vmx_l1d_flush(vcpu);
@@ -6564,6 +6601,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
 
+	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
+	    unlikely(vmx->disable_split_lock_detect) &&
+	    !test_tsk_thread_flag(current, TIF_SLD))
+		split_lock_detect_set(true);
+
 	/* All fields are clean at this point */
 	if (static_branch_unlikely(&enable_evmcs))
 		current_evmcs->hv_clean_fields |=
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 7f42cf3dcd70..912eba66c5d5 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -274,6 +274,9 @@ struct vcpu_vmx {
 
 	bool req_immediate_exit;
 
+	/* Disable split-lock detection when running the vCPU */
+	bool disable_split_lock_detect;
+
 	/* Support for PML */
 #define PML_ENTITY_NUM		512
 	struct page *pml_pg;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES
  2020-02-03 15:16 [PATCH v2 0/6] kvm/split_lock: Add feature split lock detection support in kvm Xiaoyao Li
                   ` (3 preceding siblings ...)
  2020-02-03 15:16 ` [PATCH v2 4/6] kvm: vmx: Extend VMX's #AC handding for split lock in guest Xiaoyao Li
@ 2020-02-03 15:16 ` Xiaoyao Li
  2020-02-03 21:43   ` Sean Christopherson
  2020-02-03 15:16 ` [PATCH v2 6/6] x86: vmx: virtualize split lock detection Xiaoyao Li
  5 siblings, 1 reply; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-03 15:16 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight, Xiaoyao Li

Emulate MSR_IA32_CORE_CAPABILITIES in software and unconditionally
advertise its support to userspace. Like MSR_IA32_ARCH_CAPABILITIES, it
is a feature-enumerating MSR and can be fully emulated regardless of
hardware support. Existence of CORE_CAPABILITIES is enumerated via
CPUID.(EAX=7H,ECX=0):EDX[30].

Note, support for individual features enumerated via CORE_CAPABILITIES,
e.g., split lock detection, will be added in future patches.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            |  5 +++--
 arch/x86/kvm/x86.c              | 22 ++++++++++++++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 329d01c689b7..dc231240102f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -591,6 +591,7 @@ struct kvm_vcpu_arch {
 	u64 ia32_xss;
 	u64 microcode_version;
 	u64 arch_capabilities;
+	u64 core_capabilities;
 
 	/*
 	 * Paging state of the vcpu
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b1c469446b07..7282d04f3a6b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -409,10 +409,11 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
 		    boot_cpu_has(X86_FEATURE_AMD_SSBD))
 			entry->edx |= F(SPEC_CTRL_SSBD);
 		/*
-		 * We emulate ARCH_CAPABILITIES in software even
-		 * if the host doesn't support it.
+		 * ARCH_CAPABILITIES and CORE_CAPABILITIES are emulated in
+		 * software regardless of host support.
 		 */
 		entry->edx |= F(ARCH_CAPABILITIES);
+		entry->edx |= F(CORE_CAPABILITIES);
 		break;
 	case 1:
 		entry->eax &= kvm_cpuid_7_1_eax_x86_features;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 821b7404c0fd..a97a8f5dd1df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1222,6 +1222,7 @@ static const u32 emulated_msrs_all[] = {
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
 	MSR_IA32_ARCH_CAPABILITIES,
+	MSR_IA32_CORE_CAPS,
 	MSR_IA32_MISC_ENABLE,
 	MSR_IA32_MCG_STATUS,
 	MSR_IA32_MCG_CTL,
@@ -1288,6 +1289,7 @@ static const u32 msr_based_features_all[] = {
 	MSR_F10H_DECFG,
 	MSR_IA32_UCODE_REV,
 	MSR_IA32_ARCH_CAPABILITIES,
+	MSR_IA32_CORE_CAPS,
 };
 
 static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all)];
@@ -1341,12 +1343,20 @@ static u64 kvm_get_arch_capabilities(void)
 	return data;
 }
 
+static u64 kvm_get_core_capabilities(void)
+{
+	return 0;
+}
+
 static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
 	case MSR_IA32_ARCH_CAPABILITIES:
 		msr->data = kvm_get_arch_capabilities();
 		break;
+	case MSR_IA32_CORE_CAPS:
+		msr->data = kvm_get_core_capabilities();
+		break;
 	case MSR_IA32_UCODE_REV:
 		rdmsrl_safe(msr->index, &msr->data);
 		break;
@@ -2716,6 +2726,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		vcpu->arch.arch_capabilities = data;
 		break;
+	case MSR_IA32_CORE_CAPS:
+		if (!msr_info->host_initiated)
+			return 1;
+		vcpu->arch.core_capabilities = data;
+		break;
 	case MSR_EFER:
 		return set_efer(vcpu, msr_info);
 	case MSR_K7_HWCR:
@@ -3044,6 +3059,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		msr_info->data = vcpu->arch.arch_capabilities;
 		break;
+	case MSR_IA32_CORE_CAPS:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITIES))
+			return 1;
+		msr_info->data = vcpu->arch.core_capabilities;
+		break;
 	case MSR_IA32_POWER_CTL:
 		msr_info->data = vcpu->arch.msr_ia32_power_ctl;
 		break;
@@ -9288,6 +9309,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		goto free_guest_fpu;
 
 	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
+	vcpu->arch.core_capabilities = kvm_get_core_capabilities();
 	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
 	kvm_vcpu_mtrr_init(vcpu);
 	vcpu_load(vcpu);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 6/6] x86: vmx: virtualize split lock detection
  2020-02-03 15:16 [PATCH v2 0/6] kvm/split_lock: Add feature split lock detection support in kvm Xiaoyao Li
                   ` (4 preceding siblings ...)
  2020-02-03 15:16 ` [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES Xiaoyao Li
@ 2020-02-03 15:16 ` Xiaoyao Li
  2020-02-03 15:58   ` Xiaoyao Li
                     ` (2 more replies)
  5 siblings, 3 replies; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-03 15:16 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight, Xiaoyao Li

Due to the fact that MSR_TEST_CTRL is per-core scope, i.e., the sibling
threads in the same physical CPU core share the same MSR, only
advertising feature split lock detection to guest when SMT is disabled
or unsupported for simplicitly.

Only when host is sld_off, can guest control the hardware value of
MSR_TEST_CTL, i.e., KVM loads guest's value into hardware when vcpu is
running.

The vmx->disable_split_lock_detect can be set to true after unhandled
split_lock #AC in guest only when host is sld_warn mode. It's for not
burnning old guest, of course malicous guest can exploit it for DoS
attack.

If want to prevent DoS attack from malicious guest, it must use sld_fatal
mode in host. When host is sld_fatal, hardware value of
MSR_TEST_CTL.SPLIT_LOCK_DETECT never cleared.

Below summarizing how guest behaves if SMT is off and it's a linux guest:

-----------------------------------------------------------------------
   Host	| Guest | Guest behavior
-----------------------------------------------------------------------
1. off	|	| same as in bare metal
-----------------------------------------------------------------------
2. warn | off	| hardware bit set initially. Once split lock happens,
	|	| it sets vmx->disable_split_lock_detect, which leads
	|	| hardware bit to be cleared when vcpu is running
        |	| So, it's the same as in bare metal
	---------------------------------------------------------------
3.	| warn	| - user space: get #AC when split lock, then clear
	|	|   MSR bit, but hardware bit is not cleared. #AC again,
	|	|   finally sets vmx->disable_split_lock_detect, which
	|	|   leads hardware bit to be cleared when vcpu is running;
	|	|   After the userspace process finishes, it sets vcpu's
	|	|   MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit, which causes
	|	|   vmx->disable_split_lock_detect to be set false
        |	|   So it's somehow the same as in bare-metal
        |	| - kernel: same as in bare metal.
	--------------------------------------------------------------
4.	| fatal | same as in bare metal
----------------------------------------------------------------------
5. fatal| off   | #AC reported to userspace
	--------------------------------------------------------------
6.	| warn  | - user space: get #AC when split lock, then clear
	|	|   MSR bit, but hardware bit is not cleared, #AC again,
        |	|   #AC reported to userspace
        |	| - kernel: same as in bare metal, call die();
	-------------------------------------------------------------
7.    	| fatal | same as in bare metal
----------------------------------------------------------------------

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 72 +++++++++++++++++++++++++++++++++++-------
 arch/x86/kvm/vmx/vmx.h |  1 +
 arch/x86/kvm/x86.c     | 13 ++++++--
 3 files changed, 73 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 93e3370c5f84..a0c3f579ecb6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1781,6 +1781,26 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 	}
 }
 
+/*
+ * Note: for guest, feature split lock detection can only be enumerated by
+ * MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT. The FMS enumeration is invalid.
+ */
+static inline bool guest_has_feature_split_lock_detect(struct kvm_vcpu *vcpu)
+{
+	return !!(vcpu->arch.core_capabilities &
+		  MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT);
+}
+
+static inline u64 vmx_msr_test_ctrl_valid_bits(struct kvm_vcpu *vcpu)
+{
+	u64 valid_bits = 0;
+
+	if (guest_has_feature_split_lock_detect(vcpu))
+		valid_bits |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	return valid_bits;
+}
+
 /*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
@@ -1793,6 +1813,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u32 index;
 
 	switch (msr_info->index) {
+	case MSR_TEST_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_has_feature_split_lock_detect(vcpu))
+			return 1;
+		msr_info->data = vmx->msr_test_ctrl;
+		break;
 #ifdef CONFIG_X86_64
 	case MSR_FS_BASE:
 		msr_info->data = vmcs_readl(GUEST_FS_BASE);
@@ -1934,6 +1960,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u32 index;
 
 	switch (msr_index) {
+	case MSR_TEST_CTRL:
+		if (!msr_info->host_initiated &&
+		    (!guest_has_feature_split_lock_detect(vcpu) ||
+		     data & ~vmx_msr_test_ctrl_valid_bits(vcpu)))
+			return 1;
+		if (data & MSR_TEST_CTRL_SPLIT_LOCK_DETECT)
+			vmx->disable_split_lock_detect = false;
+		vmx->msr_test_ctrl = data;
+		break;
 	case MSR_EFER:
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
@@ -4233,6 +4268,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vmx->msr_ia32_umwait_control = 0;
 
+	vmx->msr_test_ctrl = 0;
 	vmx->disable_split_lock_detect = false;
 
 	vcpu->arch.microcode_version = 0x100000000ULL;
@@ -4565,6 +4601,11 @@ static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
 	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
 }
 
+static inline bool guest_cpu_split_lock_detect_enabled(struct vcpu_vmx *vmx)
+{
+	return !!(vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT);
+}
+
 static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4660,8 +4701,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		break;
 	case AC_VECTOR:
 		/*
-		 * Inject #AC back to guest only when legacy alignment check
-		 * enabled.
+		 * Inject #AC back to guest only when guest is expecting it,
+		 * i.e., legacy alignment check or split lock #AC enabled.
 		 * Otherwise, it must be an unexpected split-lock #AC for guest
 		 * since KVM keeps hardware SPLIT_LOCK_DETECT bit unchanged
 		 * when vcpu is running.
@@ -4674,12 +4715,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		 *    similar as sending SIGBUS.
 		 */
 		if (guest_cpu_alignment_check_enabled(vcpu) ||
+		    guest_cpu_split_lock_detect_enabled(vmx) ||
 		    WARN_ON(get_split_lock_detect_state() == sld_off)) {
 			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
 			return 1;
 		}
 		if (get_split_lock_detect_state() == sld_warn) {
-			pr_warn("kvm: split lock #AC happened in %s [%d]\n",
+			pr_warn_ratelimited("kvm: split lock #AC happened in %s [%d]\n",
 				current->comm, current->pid);
 			vmx->disable_split_lock_detect = true;
 			return 1;
@@ -6491,6 +6533,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long cr3, cr4;
+	bool host_sld_enabled, guest_sld_enabled;
 
 	/* Record the guest's net vcpu time for enforced NMI injections. */
 	if (unlikely(!enable_vnmi &&
@@ -6562,10 +6605,15 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
 
-	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
-	    unlikely(vmx->disable_split_lock_detect) &&
-	    !test_tsk_thread_flag(current, TIF_SLD))
-		split_lock_detect_set(false);
+	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
+		host_sld_enabled = get_split_lock_detect_state() &&
+				   !test_tsk_thread_flag(current, TIF_SLD);
+		guest_sld_enabled = guest_cpu_split_lock_detect_enabled(vmx);
+		if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect))
+			split_lock_detect_set(false);
+		else if (!host_sld_enabled && guest_sld_enabled)
+			split_lock_detect_set(true);
+	}
 
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
@@ -6601,10 +6649,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
 
-	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
-	    unlikely(vmx->disable_split_lock_detect) &&
-	    !test_tsk_thread_flag(current, TIF_SLD))
-		split_lock_detect_set(true);
+	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
+		if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect))
+			split_lock_detect_set(true);
+		else if (!host_sld_enabled && guest_sld_enabled)
+			split_lock_detect_set(false);
+	}
 
 	/* All fields are clean at this point */
 	if (static_branch_unlikely(&enable_evmcs))
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 912eba66c5d5..c36c663f4bae 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -222,6 +222,7 @@ struct vcpu_vmx {
 #endif
 
 	u64		      spec_ctrl;
+	u64		      msr_test_ctrl;
 	u32		      msr_ia32_umwait_control;
 
 	u32 secondary_exec_control;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a97a8f5dd1df..56e799981d53 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1163,7 +1163,7 @@ static const u32 msrs_to_save_all[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
-	MSR_IA32_SPEC_CTRL,
+	MSR_IA32_SPEC_CTRL, MSR_TEST_CTRL,
 	MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
 	MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
 	MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
@@ -1345,7 +1345,12 @@ static u64 kvm_get_arch_capabilities(void)
 
 static u64 kvm_get_core_capabilities(void)
 {
-	return 0;
+	u64 data = 0;
+
+	if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && !cpu_smt_possible())
+		data |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+
+	return data;
 }
 
 static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
@@ -5259,6 +5264,10 @@ static void kvm_init_msr_list(void)
 		 * to the guests in some cases.
 		 */
 		switch (msrs_to_save_all[i]) {
+		case MSR_TEST_CTRL:
+			if (!(kvm_get_core_capabilities() &
+			      MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT))
+				continue;
 		case MSR_IA32_BNDCFGS:
 			if (!kvm_mpx_supported())
 				continue;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 6/6] x86: vmx: virtualize split lock detection
  2020-02-03 15:16 ` [PATCH v2 6/6] x86: vmx: virtualize split lock detection Xiaoyao Li
@ 2020-02-03 15:58   ` Xiaoyao Li
  2020-02-03 18:52   ` Andy Lutomirski
  2020-02-03 21:42   ` Sean Christopherson
  2 siblings, 0 replies; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-03 15:58 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight

On 2/3/2020 11:16 PM, Xiaoyao Li wrote:
> Due to the fact that MSR_TEST_CTRL is per-core scope, i.e., the sibling
> threads in the same physical CPU core share the same MSR, only
> advertising feature split lock detection to guest when SMT is disabled
> or unsupported for simplicitly.
> 
> Only when host is sld_off, can guest control the hardware value of
> MSR_TEST_CTL, i.e., KVM loads guest's value into hardware when vcpu is
> running.
> 
> The vmx->disable_split_lock_detect can be set to true after unhandled
> split_lock #AC in guest only when host is sld_warn mode. It's for not
> burnning old guest, of course malicous guest can exploit it for DoS
> attack.
> 
> If want to prevent DoS attack from malicious guest, it must use sld_fatal
> mode in host. When host is sld_fatal, hardware value of
> MSR_TEST_CTL.SPLIT_LOCK_DETECT never cleared.
> 
> Below summarizing how guest behaves if SMT is off and it's a linux guest:
> 
> -----------------------------------------------------------------------
>     Host	| Guest | Guest behavior
> -----------------------------------------------------------------------
> 1. off	|	| same as in bare metal
> -----------------------------------------------------------------------
> 2. warn | off	| hardware bit set initially. Once split lock happens,
> 	|	| it sets vmx->disable_split_lock_detect, which leads
> 	|	| hardware bit to be cleared when vcpu is running
>          |	| So, it's the same as in bare metal
> 	---------------------------------------------------------------
> 3.	| warn	| - user space: get #AC when split lock, then clear
> 	|	|   MSR bit, but hardware bit is not cleared. #AC again,
> 	|	|   finally sets vmx->disable_split_lock_detect, which
> 	|	|   leads hardware bit to be cleared when vcpu is running;
> 	|	|   After the userspace process finishes, it sets vcpu's
> 	|	|   MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit, which causes
> 	|	|   vmx->disable_split_lock_detect to be set false
>          |	|   So it's somehow the same as in bare-metal
>          |	| - kernel: same as in bare metal.
> 	--------------------------------------------------------------
> 4.	| fatal | same as in bare metal
> ----------------------------------------------------------------------
> 5. fatal| off   | #AC reported to userspace
> 	--------------------------------------------------------------
> 6.	| warn  | - user space: get #AC when split lock, then clear
> 	|	|   MSR bit, but hardware bit is not cleared, #AC again,
>          |	|   #AC reported to userspace
>          |	| - kernel: same as in bare metal, call die();
> 	-------------------------------------------------------------
> 7.    	| fatal | same as in bare metal
> ----------------------------------------------------------------------
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>   arch/x86/kvm/vmx/vmx.c | 72 +++++++++++++++++++++++++++++++++++-------
>   arch/x86/kvm/vmx/vmx.h |  1 +
>   arch/x86/kvm/x86.c     | 13 ++++++--
>   3 files changed, 73 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 93e3370c5f84..a0c3f579ecb6 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1781,6 +1781,26 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
>   	}
>   }
>   
> +/*
> + * Note: for guest, feature split lock detection can only be enumerated by
> + * MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT. The FMS enumeration is invalid.
> + */
> +static inline bool guest_has_feature_split_lock_detect(struct kvm_vcpu *vcpu)
> +{
> +	return !!(vcpu->arch.core_capabilities &
> +		  MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT);
> +}
> +
> +static inline u64 vmx_msr_test_ctrl_valid_bits(struct kvm_vcpu *vcpu)
> +{
> +	u64 valid_bits = 0;
> +
> +	if (guest_has_feature_split_lock_detect(vcpu))
> +		valid_bits |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> +	return valid_bits;
> +}
> +
>   /*
>    * Reads an msr value (of 'msr_index') into 'pdata'.
>    * Returns 0 on success, non-0 otherwise.
> @@ -1793,6 +1813,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	u32 index;
>   
>   	switch (msr_info->index) {
> +	case MSR_TEST_CTRL:
> +		if (!msr_info->host_initiated &&
> +		    !guest_has_feature_split_lock_detect(vcpu))
> +			return 1;
> +		msr_info->data = vmx->msr_test_ctrl;
> +		break;
>   #ifdef CONFIG_X86_64
>   	case MSR_FS_BASE:
>   		msr_info->data = vmcs_readl(GUEST_FS_BASE);
> @@ -1934,6 +1960,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	u32 index;
>   
>   	switch (msr_index) {
> +	case MSR_TEST_CTRL:
> +		if (!msr_info->host_initiated &&
> +		    (!guest_has_feature_split_lock_detect(vcpu) ||
> +		     data & ~vmx_msr_test_ctrl_valid_bits(vcpu)))
> +			return 1;
> +		if (data & MSR_TEST_CTRL_SPLIT_LOCK_DETECT)
> +			vmx->disable_split_lock_detect = false;
> +		vmx->msr_test_ctrl = data;
> +		break;
>   	case MSR_EFER:
>   		ret = kvm_set_msr_common(vcpu, msr_info);
>   		break;
> @@ -4233,6 +4268,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>   
>   	vmx->msr_ia32_umwait_control = 0;
>   
> +	vmx->msr_test_ctrl = 0;
>   	vmx->disable_split_lock_detect = false;
>   
>   	vcpu->arch.microcode_version = 0x100000000ULL;
> @@ -4565,6 +4601,11 @@ static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
>   	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
>   }
>   
> +static inline bool guest_cpu_split_lock_detect_enabled(struct vcpu_vmx *vmx)
> +{
> +	return !!(vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT);
> +}
> +
>   static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>   {
>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -4660,8 +4701,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>   		break;
>   	case AC_VECTOR:
>   		/*
> -		 * Inject #AC back to guest only when legacy alignment check
> -		 * enabled.
> +		 * Inject #AC back to guest only when guest is expecting it,
> +		 * i.e., legacy alignment check or split lock #AC enabled.
>   		 * Otherwise, it must be an unexpected split-lock #AC for guest
>   		 * since KVM keeps hardware SPLIT_LOCK_DETECT bit unchanged
>   		 * when vcpu is running.
> @@ -4674,12 +4715,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>   		 *    similar as sending SIGBUS.
>   		 */
>   		if (guest_cpu_alignment_check_enabled(vcpu) ||
> +		    guest_cpu_split_lock_detect_enabled(vmx) ||
>   		    WARN_ON(get_split_lock_detect_state() == sld_off)) {
>   			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
>   			return 1;
>   		}
>   		if (get_split_lock_detect_state() == sld_warn) {
> -			pr_warn("kvm: split lock #AC happened in %s [%d]\n",
> +			pr_warn_ratelimited("kvm: split lock #AC happened in %s [%d]\n",
>   				current->comm, current->pid);
>   			vmx->disable_split_lock_detect = true;
>   			return 1;
> @@ -6491,6 +6533,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>   {
>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>   	unsigned long cr3, cr4;
> +	bool host_sld_enabled, guest_sld_enabled;
>   
>   	/* Record the guest's net vcpu time for enforced NMI injections. */
>   	if (unlikely(!enable_vnmi &&
> @@ -6562,10 +6605,15 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>   	 */
>   	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
>   
> -	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> -	    unlikely(vmx->disable_split_lock_detect) &&
> -	    !test_tsk_thread_flag(current, TIF_SLD))
> -		split_lock_detect_set(false);
> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> +		host_sld_enabled = get_split_lock_detect_state() &&
> +				   !test_tsk_thread_flag(current, TIF_SLD);
> +		guest_sld_enabled = guest_cpu_split_lock_detect_enabled(vmx);
> +		if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect))
> +			split_lock_detect_set(false);
> +		else if (!host_sld_enabled && guest_sld_enabled)
> +			split_lock_detect_set(true);
> +	}
>   
>   	/* L1D Flush includes CPU buffer clear to mitigate MDS */
>   	if (static_branch_unlikely(&vmx_l1d_should_flush))
> @@ -6601,10 +6649,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>   
>   	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
>   
> -	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> -	    unlikely(vmx->disable_split_lock_detect) &&
> -	    !test_tsk_thread_flag(current, TIF_SLD))
> -		split_lock_detect_set(true);
> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> +		if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect))
> +			split_lock_detect_set(true);
> +		else if (!host_sld_enabled && guest_sld_enabled)
> +			split_lock_detect_set(false);
> +	}
>   
>   	/* All fields are clean at this point */
>   	if (static_branch_unlikely(&enable_evmcs))
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 912eba66c5d5..c36c663f4bae 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -222,6 +222,7 @@ struct vcpu_vmx {
>   #endif
>   
>   	u64		      spec_ctrl;
> +	u64		      msr_test_ctrl;
>   	u32		      msr_ia32_umwait_control;
>   
>   	u32 secondary_exec_control;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a97a8f5dd1df..56e799981d53 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1163,7 +1163,7 @@ static const u32 msrs_to_save_all[] = {
>   #endif
>   	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
>   	MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
> -	MSR_IA32_SPEC_CTRL,
> +	MSR_IA32_SPEC_CTRL, MSR_TEST_CTRL,
>   	MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
>   	MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
>   	MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
> @@ -1345,7 +1345,12 @@ static u64 kvm_get_arch_capabilities(void)
>   
>   static u64 kvm_get_core_capabilities(void)
>   {
> -	return 0;
> +	u64 data = 0;
> +
> +	if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && !cpu_smt_possible())
> +		data |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
> +
> +	return data;
>   }
>   
>   static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
> @@ -5259,6 +5264,10 @@ static void kvm_init_msr_list(void)
>   		 * to the guests in some cases.
>   		 */
>   		switch (msrs_to_save_all[i]) {
> +		case MSR_TEST_CTRL:
> +			if (!(kvm_get_core_capabilities() &
> +			      MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT))
> +				continue;
			
sorry, forget	 	break;
>   		case MSR_IA32_BNDCFGS:
>   			if (!kvm_mpx_supported())
>   				continue;
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 6/6] x86: vmx: virtualize split lock detection
  2020-02-03 15:16 ` [PATCH v2 6/6] x86: vmx: virtualize split lock detection Xiaoyao Li
  2020-02-03 15:58   ` Xiaoyao Li
@ 2020-02-03 18:52   ` Andy Lutomirski
  2020-02-03 21:42   ` Sean Christopherson
  2 siblings, 0 replies; 31+ messages in thread
From: Andy Lutomirski @ 2020-02-03 18:52 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, kvm list, LKML, David Laight

On Mon, Feb 3, 2020 at 7:21 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> Due to the fact that MSR_TEST_CTRL is per-core scope, i.e., the sibling
> threads in the same physical CPU core share the same MSR, only
> advertising feature split lock detection to guest when SMT is disabled
> or unsupported for simplicitly.
>
> Only when host is sld_off, can guest control the hardware value of
> MSR_TEST_CTL, i.e., KVM loads guest's value into hardware when vcpu is
> running.
>
> The vmx->disable_split_lock_detect can be set to true after unhandled
> split_lock #AC in guest only when host is sld_warn mode. It's for not
> burnning old guest, of course malicous guest can exploit it for DoS
> attack.

Is this actually worthwhile?  This only applies to the host having
sld=off or warn and the host having HT off.  I suspect that
deployments supporting migration will not want to use this, and
multi-tenant deployments won't want to use it for SLD-aware guests doe
to DoS risk.

--Andy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-03 15:16 ` [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write Xiaoyao Li
@ 2020-02-03 20:54   ` Sean Christopherson
  2020-02-04  2:55     ` Xiaoyao Li
  2020-02-11 12:20   ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2020-02-03 20:54 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On Mon, Feb 03, 2020 at 11:16:05PM +0800, Xiaoyao Li wrote:
> If split lock detect is enabled (warn/fatal), #AC handler calls die()
> when split lock happens in kernel.
> 
> A sane guest should never tigger emulation on a split-lock access, but
> it cannot prevent malicous guest from doing this. So just emulating the
> access as a write if it's a split-lock access to avoid malicous guest
> polluting the kernel log.
> 
> More detail analysis can be found:
> https://lkml.kernel.org/r/20200131200134.GD18946@linux.intel.com
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  arch/x86/kvm/x86.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2d3be7f3ad67..821b7404c0fd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5847,6 +5847,13 @@ static int emulator_write_emulated(struct x86_emulate_ctxt *ctxt,
>  	(cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old))
>  #endif
>  
> +static inline bool across_cache_line_access(gpa_t gpa, unsigned int bytes)

s/across/split so as not to introduce another name.

> +{
> +	unsigned int cache_line_size = cache_line_size();
> +
> +	return (gpa & (cache_line_size - 1)) + bytes > cache_line_size;

I'd prefer to use the same logic as the page-split to avoid having to
reason about the correctness of two different algorithms.

> +}
> +
>  static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
>  				     unsigned long addr,
>  				     const void *old,
> @@ -5873,6 +5880,10 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
>  	if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
>  		goto emul_write;
>  
> +	if (get_split_lock_detect_state() != sld_off &&
> +	    across_cache_line_access(gpa, bytes))
> +		goto emul_write;

As an alternative to the above, the page/line splits can be handled in a
single check, e.g.

	page_line_mask = PAGE_MASK;
	if (is_split_lock_detect_enabled())
		page_line_mask = cache_line_size() - 1;
	if (((gpa + bytes - 1) & page_line_mask) != (gpa & page_line_mask))
		goto emul_write;

> +
>  	if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map))
>  		goto emul_write;
>  
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 4/6] kvm: vmx: Extend VMX's #AC handding for split lock in guest
  2020-02-03 15:16 ` [PATCH v2 4/6] kvm: vmx: Extend VMX's #AC handding for split lock in guest Xiaoyao Li
@ 2020-02-03 21:14   ` Sean Christopherson
  2020-02-04  6:46     ` Xiaoyao Li
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2020-02-03 21:14 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On Mon, Feb 03, 2020 at 11:16:06PM +0800, Xiaoyao Li wrote:
> There are two types of #AC can be generated in Intel CPUs:
>  1. legacy alignment check #AC;
>  2. split lock #AC;
> 
> Legacy alignment check #AC can be injected to guest if guest has enabled
> alignemnet check.
> 
> When host enables split lock detection, i.e., split_lock_detect != off,
> guest will receive an unexpected #AC when there is a split lock happens
> since KVM doesn't virtualize this feature to guest hardware value of
> MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit stays unchanged when vcpu is running.
> 
> Since old guests lack split_lock #AC handler and may have split lock buges.
> To make them survive from split lock, applying the similar policy
> as host's split lock detect configuration:
>  - host split lock detect is sld_warn:
>    warn the split lock happened in guest, and disabling split lock
>    detect during vcpu is running to allow the guest to continue running.
>  - host split lock detect is sld_fatal:
>    forwarding #AC to userspace, somewhat similar as sending SIGBUS.
> 
> Please note:
> 1. If sld_warn and SMT is enabled, the split lock in guest's vcpu
> leads to disable split lock detect on the sibling CPU thread during
> the vcpu is running.
> 
> 2. When host is sld_warn, it allows guest to generate split lock which also
> opens the door for malicious guest to do DoS attack. It is same that in
> sld_warn mode, userspace application can do DoS attack.
> 
> 3. If want to prevent DoS attack from guest, host must use sld_fatal mode.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 48 +++++++++++++++++++++++++++++++++++++++---
>  arch/x86/kvm/vmx/vmx.h |  3 +++
>  2 files changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c475fa2aaae0..93e3370c5f84 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4233,6 +4233,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  
>  	vmx->msr_ia32_umwait_control = 0;
>  
> +	vmx->disable_split_lock_detect = false;
> +

I see no reason to give special treatment to RESET/INIT, i.e. leave the
flag set.  vCPUs are zeroed on allocation.

>  	vcpu->arch.microcode_version = 0x100000000ULL;
>  	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
>  	vmx->hv_deadline_tsc = -1;
> @@ -4557,6 +4559,12 @@ static int handle_machine_check(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
> +{
> +	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
> +	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
> +}
> +
>  static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -4622,9 +4630,6 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  		return handle_rmode_exception(vcpu, ex_no, error_code);
>  
>  	switch (ex_no) {
> -	case AC_VECTOR:
> -		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> -		return 1;
>  	case DB_VECTOR:
>  		dr6 = vmcs_readl(EXIT_QUALIFICATION);
>  		if (!(vcpu->guest_debug &
> @@ -4653,6 +4658,33 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
>  		kvm_run->debug.arch.exception = ex_no;
>  		break;
> +	case AC_VECTOR:
> +		/*
> +		 * Inject #AC back to guest only when legacy alignment check
> +		 * enabled.
> +		 * Otherwise, it must be an unexpected split-lock #AC for guest
> +		 * since KVM keeps hardware SPLIT_LOCK_DETECT bit unchanged
> +		 * when vcpu is running.
> +		 *  - If sld_state == sld_warn, treat it similar as user space
> +		 *    process that warn and allow it to continue running.
> +		 *    In this case, setting vmx->diasble_split_lock_detect to
> +		 *    true so that it will toggle MSR_TEST.SPLIT_LOCK_DETECT
> +		 *    bit during every following VM Entry and Exit;
> +		 *  - If sld_state == sld_fatal, it forwards #AC to userspace,
> +		 *    similar as sending SIGBUS.
> +		 */
> +		if (guest_cpu_alignment_check_enabled(vcpu) ||
> +		    WARN_ON(get_split_lock_detect_state() == sld_off)) {

Eh, I'd omit the WARN.  And invert the ordering to avoid multiple VMREADs
when SLD is disabled, which will be the common case.

> +			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> +			return 1;
> +		}
> +		if (get_split_lock_detect_state() == sld_warn) {
> +			pr_warn("kvm: split lock #AC happened in %s [%d]\n",
> +				current->comm, current->pid);

Set TIF_SLD and the MSR bit, then __switch_to_xtra() will automatically
handle writing the MSR when necessary.

Even better would be to export handle_user_split_lock() and call that
directly.  The EFLAGS.AC logic in handle_user_split_lock() can be moved out
to do_alignment_check() to avoid that complication; arguably that should be
done in the initial SLD patch.

> +			vmx->disable_split_lock_detect = true;
> +			return 1;
> +		}
> +		/* fall through*/
>  	default:
>  		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
>  		kvm_run->ex.exception = ex_no;
> @@ -6530,6 +6562,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	 */
>  	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
>  
> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> +	    unlikely(vmx->disable_split_lock_detect) &&
> +	    !test_tsk_thread_flag(current, TIF_SLD))
> +		split_lock_detect_set(false);
> +
>  	/* L1D Flush includes CPU buffer clear to mitigate MDS */
>  	if (static_branch_unlikely(&vmx_l1d_should_flush))
>  		vmx_l1d_flush(vcpu);
> @@ -6564,6 +6601,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
>  
> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> +	    unlikely(vmx->disable_split_lock_detect) &&
> +	    !test_tsk_thread_flag(current, TIF_SLD))
> +		split_lock_detect_set(true);

Manually calling split_lock_detect_set() in vmx_vcpu_run() is unnecessary.
The MSR only needs to be written on the initial #AC, after that KVM can
rely on the stickiness of TIF_SLD to ensure the MSR is set correctly when
control transfer to/from this vCPU.

> +
>  	/* All fields are clean at this point */
>  	if (static_branch_unlikely(&enable_evmcs))
>  		current_evmcs->hv_clean_fields |=
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 7f42cf3dcd70..912eba66c5d5 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -274,6 +274,9 @@ struct vcpu_vmx {
>  
>  	bool req_immediate_exit;
>  
> +	/* Disable split-lock detection when running the vCPU */
> +	bool disable_split_lock_detect;
> +
>  	/* Support for PML */
>  #define PML_ENTITY_NUM		512
>  	struct page *pml_pg;
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 6/6] x86: vmx: virtualize split lock detection
  2020-02-03 15:16 ` [PATCH v2 6/6] x86: vmx: virtualize split lock detection Xiaoyao Li
  2020-02-03 15:58   ` Xiaoyao Li
  2020-02-03 18:52   ` Andy Lutomirski
@ 2020-02-03 21:42   ` Sean Christopherson
  2020-02-04  2:52     ` Xiaoyao Li
  2 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2020-02-03 21:42 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On Mon, Feb 03, 2020 at 11:16:08PM +0800, Xiaoyao Li wrote:
> Due to the fact that MSR_TEST_CTRL is per-core scope, i.e., the sibling
> threads in the same physical CPU core share the same MSR, only
> advertising feature split lock detection to guest when SMT is disabled
> or unsupported for simplicitly.
> 
> Only when host is sld_off, can guest control the hardware value of
> MSR_TEST_CTL, i.e., KVM loads guest's value into hardware when vcpu is
> running.
> 
> The vmx->disable_split_lock_detect can be set to true after unhandled
> split_lock #AC in guest only when host is sld_warn mode. It's for not
> burnning old guest, of course malicous guest can exploit it for DoS
> attack.
> 
> If want to prevent DoS attack from malicious guest, it must use sld_fatal
> mode in host. When host is sld_fatal, hardware value of
> MSR_TEST_CTL.SPLIT_LOCK_DETECT never cleared.
> 
> Below summarizing how guest behaves if SMT is off and it's a linux guest:
> 
> -----------------------------------------------------------------------
>    Host	| Guest | Guest behavior
> -----------------------------------------------------------------------
> 1. off	|	| same as in bare metal
> -----------------------------------------------------------------------
> 2. warn | off	| hardware bit set initially. Once split lock happens,
> 	|	| it sets vmx->disable_split_lock_detect, which leads
> 	|	| hardware bit to be cleared when vcpu is running
>         |	| So, it's the same as in bare metal
> 	---------------------------------------------------------------
> 3.	| warn	| - user space: get #AC when split lock, then clear
> 	|	|   MSR bit, but hardware bit is not cleared. #AC again,
> 	|	|   finally sets vmx->disable_split_lock_detect, which
> 	|	|   leads hardware bit to be cleared when vcpu is running;
> 	|	|   After the userspace process finishes, it sets vcpu's
> 	|	|   MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit, which causes
> 	|	|   vmx->disable_split_lock_detect to be set false
>         |	|   So it's somehow the same as in bare-metal
>         |	| - kernel: same as in bare metal.
> 	--------------------------------------------------------------
> 4.	| fatal | same as in bare metal
> ----------------------------------------------------------------------
> 5. fatal| off   | #AC reported to userspace
> 	--------------------------------------------------------------
> 6.	| warn  | - user space: get #AC when split lock, then clear
> 	|	|   MSR bit, but hardware bit is not cleared, #AC again,
>         |	|   #AC reported to userspace
>         |	| - kernel: same as in bare metal, call die();
> 	-------------------------------------------------------------
> 7.    	| fatal | same as in bare metal
> ----------------------------------------------------------------------

This table and half the changelog is unnecessary and confusing.  State that
SLD is exposed to the guest if and only if SLD is disabled in the host and
SMT is disabled (or the MSR is thread scoped), and leave it at that.

Reiterating everything that was implemented in previous patches does more
harm than good.
 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 72 +++++++++++++++++++++++++++++++++++-------
>  arch/x86/kvm/vmx/vmx.h |  1 +
>  arch/x86/kvm/x86.c     | 13 ++++++--
>  3 files changed, 73 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 93e3370c5f84..a0c3f579ecb6 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1781,6 +1781,26 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
>  	}
>  }
>  
> +/*
> + * Note: for guest, feature split lock detection can only be enumerated by
> + * MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT. The FMS enumeration is invalid.
> + */
> +static inline bool guest_has_feature_split_lock_detect(struct kvm_vcpu *vcpu)
> +{
> +	return !!(vcpu->arch.core_capabilities &
> +		  MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT);
> +}
> +
> +static inline u64 vmx_msr_test_ctrl_valid_bits(struct kvm_vcpu *vcpu)
> +{
> +	u64 valid_bits = 0;
> +
> +	if (guest_has_feature_split_lock_detect(vcpu))
> +		valid_bits |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> +	return valid_bits;
> +}
> +
>  /*
>   * Reads an msr value (of 'msr_index') into 'pdata'.
>   * Returns 0 on success, non-0 otherwise.
> @@ -1793,6 +1813,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	u32 index;
>  
>  	switch (msr_info->index) {
> +	case MSR_TEST_CTRL:
> +		if (!msr_info->host_initiated &&
> +		    !guest_has_feature_split_lock_detect(vcpu))
> +			return 1;
> +		msr_info->data = vmx->msr_test_ctrl;
> +		break;
>  #ifdef CONFIG_X86_64
>  	case MSR_FS_BASE:
>  		msr_info->data = vmcs_readl(GUEST_FS_BASE);
> @@ -1934,6 +1960,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	u32 index;
>  
>  	switch (msr_index) {
> +	case MSR_TEST_CTRL:
> +		if (!msr_info->host_initiated &&
> +		    (!guest_has_feature_split_lock_detect(vcpu) ||
> +		     data & ~vmx_msr_test_ctrl_valid_bits(vcpu)))
> +			return 1;
> +		if (data & MSR_TEST_CTRL_SPLIT_LOCK_DETECT)
> +			vmx->disable_split_lock_detect = false;

Pretty sure disable_split_lock_detect won't exist, but if it does, don't
reuse it for emulating guest behavior.  Keep the two things separate, i.e.
use vmx->msr_test_ctrl to track guest state and use the disable_sld to
track when the feature has been disabled for an ignorant guest.

> +		vmx->msr_test_ctrl = data;
> +		break;
>  	case MSR_EFER:
>  		ret = kvm_set_msr_common(vcpu, msr_info);
>  		break;
> @@ -4233,6 +4268,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  
>  	vmx->msr_ia32_umwait_control = 0;
>  
> +	vmx->msr_test_ctrl = 0;
>  	vmx->disable_split_lock_detect = false;
>  
>  	vcpu->arch.microcode_version = 0x100000000ULL;
> @@ -4565,6 +4601,11 @@ static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
>  	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
>  }
>  
> +static inline bool guest_cpu_split_lock_detect_enabled(struct vcpu_vmx *vmx)
> +{
> +	return !!(vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT);

The "!!" isn't necessary.

> +}
> +
>  static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -4660,8 +4701,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  		break;
>  	case AC_VECTOR:
>  		/*
> -		 * Inject #AC back to guest only when legacy alignment check
> -		 * enabled.
> +		 * Inject #AC back to guest only when guest is expecting it,
> +		 * i.e., legacy alignment check or split lock #AC enabled.
>  		 * Otherwise, it must be an unexpected split-lock #AC for guest
>  		 * since KVM keeps hardware SPLIT_LOCK_DETECT bit unchanged
>  		 * when vcpu is running.
> @@ -4674,12 +4715,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  		 *    similar as sending SIGBUS.
>  		 */
>  		if (guest_cpu_alignment_check_enabled(vcpu) ||
> +		    guest_cpu_split_lock_detect_enabled(vmx) ||

Again, check for SLD before AC, it's significantly cheaper.

>  		    WARN_ON(get_split_lock_detect_state() == sld_off)) {
>  			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
>  			return 1;
>  		}
>  		if (get_split_lock_detect_state() == sld_warn) {
> -			pr_warn("kvm: split lock #AC happened in %s [%d]\n",
> +			pr_warn_ratelimited("kvm: split lock #AC happened in %s [%d]\n",
>  				current->comm, current->pid);

Ratelimiting change belongs in the earlier path.  Moot point if this routes
through handle_user_split_lock().

>  			vmx->disable_split_lock_detect = true;
>  			return 1;
> @@ -6491,6 +6533,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	unsigned long cr3, cr4;
> +	bool host_sld_enabled, guest_sld_enabled;
>  
>  	/* Record the guest's net vcpu time for enforced NMI injections. */
>  	if (unlikely(!enable_vnmi &&
> @@ -6562,10 +6605,15 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	 */
>  	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
>  
> -	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> -	    unlikely(vmx->disable_split_lock_detect) &&
> -	    !test_tsk_thread_flag(current, TIF_SLD))
> -		split_lock_detect_set(false);
> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> +		host_sld_enabled = get_split_lock_detect_state() &&
> +				   !test_tsk_thread_flag(current, TIF_SLD);
> +		guest_sld_enabled = guest_cpu_split_lock_detect_enabled(vmx);
> +		if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect))
> +			split_lock_detect_set(false);
> +		else if (!host_sld_enabled && guest_sld_enabled)
> +			split_lock_detect_set(true);

This will be massively simplified by letting TIF_SLD do the dirty work.
Since SLD will be exposed to the guest if and only if it's disabled in the
host, this becomes:

	if (static_cpu_has(...) && vmx->msr_test_control)
		wrmsrl(MSR_TEST_CTL,
		       this_cpu_read(msr_test_ctl_val) | vmx->msr_test_ctl);

	__vmx_vcpu_run();


        if (static_cpu_has(...) && vmx->msr_test_control)
                wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_val));

> +	}
>  
>  	/* L1D Flush includes CPU buffer clear to mitigate MDS */
>  	if (static_branch_unlikely(&vmx_l1d_should_flush))
> @@ -6601,10 +6649,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
>  
> -	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> -	    unlikely(vmx->disable_split_lock_detect) &&
> -	    !test_tsk_thread_flag(current, TIF_SLD))
> -		split_lock_detect_set(true);
> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> +		if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect))
> +			split_lock_detect_set(true);
> +		else if (!host_sld_enabled && guest_sld_enabled)
> +			split_lock_detect_set(false);
> +	}
>  
>  	/* All fields are clean at this point */
>  	if (static_branch_unlikely(&enable_evmcs))
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 912eba66c5d5..c36c663f4bae 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -222,6 +222,7 @@ struct vcpu_vmx {
>  #endif
>  
>  	u64		      spec_ctrl;
> +	u64		      msr_test_ctrl;
>  	u32		      msr_ia32_umwait_control;
>  
>  	u32 secondary_exec_control;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a97a8f5dd1df..56e799981d53 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1163,7 +1163,7 @@ static const u32 msrs_to_save_all[] = {
>  #endif
>  	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
>  	MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
> -	MSR_IA32_SPEC_CTRL,
> +	MSR_IA32_SPEC_CTRL, MSR_TEST_CTRL,
>  	MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
>  	MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
>  	MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
> @@ -1345,7 +1345,12 @@ static u64 kvm_get_arch_capabilities(void)
>  
>  static u64 kvm_get_core_capabilities(void)
>  {
> -	return 0;
> +	u64 data = 0;
> +
> +	if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && !cpu_smt_possible())
> +		data |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;

And only if SLD is disabled, no?

> +
> +	return data;
>  }
>  
>  static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
> @@ -5259,6 +5264,10 @@ static void kvm_init_msr_list(void)
>  		 * to the guests in some cases.
>  		 */
>  		switch (msrs_to_save_all[i]) {
> +		case MSR_TEST_CTRL:
> +			if (!(kvm_get_core_capabilities() &
> +			      MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT))
> +				continue;
>  		case MSR_IA32_BNDCFGS:
>  			if (!kvm_mpx_supported())
>  				continue;
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES
  2020-02-03 15:16 ` [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES Xiaoyao Li
@ 2020-02-03 21:43   ` Sean Christopherson
  2020-02-04  9:19     ` Xiaoyao Li
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2020-02-03 21:43 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On Mon, Feb 03, 2020 at 11:16:07PM +0800, Xiaoyao Li wrote:
> Emulate MSR_IA32_CORE_CAPABILITIES in software and unconditionally
> advertise its support to userspace. Like MSR_IA32_ARCH_CAPABILITIES, it
> is a feature-enumerating MSR and can be fully emulated regardless of
> hardware support. Existence of CORE_CAPABILITIES is enumerated via
> CPUID.(EAX=7H,ECX=0):EDX[30].
> 
> Note, support for individual features enumerated via CORE_CAPABILITIES,
> e.g., split lock detection, will be added in future patches.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/cpuid.c            |  5 +++--
>  arch/x86/kvm/x86.c              | 22 ++++++++++++++++++++++
>  3 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 329d01c689b7..dc231240102f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -591,6 +591,7 @@ struct kvm_vcpu_arch {
>  	u64 ia32_xss;
>  	u64 microcode_version;
>  	u64 arch_capabilities;
> +	u64 core_capabilities;
>  
>  	/*
>  	 * Paging state of the vcpu
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b1c469446b07..7282d04f3a6b 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -409,10 +409,11 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
>  		    boot_cpu_has(X86_FEATURE_AMD_SSBD))
>  			entry->edx |= F(SPEC_CTRL_SSBD);
>  		/*
> -		 * We emulate ARCH_CAPABILITIES in software even
> -		 * if the host doesn't support it.
> +		 * ARCH_CAPABILITIES and CORE_CAPABILITIES are emulated in
> +		 * software regardless of host support.
>  		 */
>  		entry->edx |= F(ARCH_CAPABILITIES);
> +		entry->edx |= F(CORE_CAPABILITIES);
>  		break;
>  	case 1:
>  		entry->eax &= kvm_cpuid_7_1_eax_x86_features;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 821b7404c0fd..a97a8f5dd1df 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1222,6 +1222,7 @@ static const u32 emulated_msrs_all[] = {
>  	MSR_IA32_TSC_ADJUST,
>  	MSR_IA32_TSCDEADLINE,
>  	MSR_IA32_ARCH_CAPABILITIES,
> +	MSR_IA32_CORE_CAPS,
>  	MSR_IA32_MISC_ENABLE,
>  	MSR_IA32_MCG_STATUS,
>  	MSR_IA32_MCG_CTL,
> @@ -1288,6 +1289,7 @@ static const u32 msr_based_features_all[] = {
>  	MSR_F10H_DECFG,
>  	MSR_IA32_UCODE_REV,
>  	MSR_IA32_ARCH_CAPABILITIES,
> +	MSR_IA32_CORE_CAPS,
>  };
>  
>  static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all)];
> @@ -1341,12 +1343,20 @@ static u64 kvm_get_arch_capabilities(void)
>  	return data;
>  }
>  
> +static u64 kvm_get_core_capabilities(void)
> +{
> +	return 0;
> +}
> +
>  static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
>  {
>  	switch (msr->index) {
>  	case MSR_IA32_ARCH_CAPABILITIES:
>  		msr->data = kvm_get_arch_capabilities();
>  		break;
> +	case MSR_IA32_CORE_CAPS:
> +		msr->data = kvm_get_core_capabilities();
> +		break;
>  	case MSR_IA32_UCODE_REV:
>  		rdmsrl_safe(msr->index, &msr->data);
>  		break;
> @@ -2716,6 +2726,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			return 1;
>  		vcpu->arch.arch_capabilities = data;
>  		break;
> +	case MSR_IA32_CORE_CAPS:
> +		if (!msr_info->host_initiated)

Shouldn't @data be checked against kvm_get_core_capabilities()?

> +			return 1;
> +		vcpu->arch.core_capabilities = data;
> +		break;
>  	case MSR_EFER:
>  		return set_efer(vcpu, msr_info);
>  	case MSR_K7_HWCR:
> @@ -3044,6 +3059,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			return 1;
>  		msr_info->data = vcpu->arch.arch_capabilities;
>  		break;
> +	case MSR_IA32_CORE_CAPS:
> +		if (!msr_info->host_initiated &&
> +		    !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITIES))
> +			return 1;
> +		msr_info->data = vcpu->arch.core_capabilities;
> +		break;
>  	case MSR_IA32_POWER_CTL:
>  		msr_info->data = vcpu->arch.msr_ia32_power_ctl;
>  		break;
> @@ -9288,6 +9309,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  		goto free_guest_fpu;
>  
>  	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
> +	vcpu->arch.core_capabilities = kvm_get_core_capabilities();
>  	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
>  	kvm_vcpu_mtrr_init(vcpu);
>  	vcpu_load(vcpu);
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/6] x86/split_lock: Add and export get_split_lock_detect_state()
  2020-02-03 15:16 ` [PATCH v2 1/6] x86/split_lock: Add and export get_split_lock_detect_state() Xiaoyao Li
@ 2020-02-03 21:45   ` Sean Christopherson
  0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2020-02-03 21:45 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On Mon, Feb 03, 2020 at 11:16:03PM +0800, Xiaoyao Li wrote:
> get_split_lock_detect_state() will be used by KVM module to get sld_state.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  arch/x86/include/asm/cpu.h  | 12 ++++++++++++
>  arch/x86/kernel/cpu/intel.c | 12 ++++++------
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
> index ff6f3ca649b3..167d0539e0ad 100644
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -40,11 +40,23 @@ int mwait_usable(const struct cpuinfo_x86 *);
>  unsigned int x86_family(unsigned int sig);
>  unsigned int x86_model(unsigned int sig);
>  unsigned int x86_stepping(unsigned int sig);
> +
> +enum split_lock_detect_state {
> +	sld_off = 0,
> +	sld_warn,
> +	sld_fatal,
> +};
> +
>  #ifdef CONFIG_CPU_SUP_INTEL
> +extern enum split_lock_detect_state get_split_lock_detect_state(void);
>  extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
>  extern void switch_to_sld(unsigned long tifn);
>  extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
>  #else
> +static inline enum split_lock_detect_state get_split_lock_detect_state(void)
> +{
> +	return sld_off;
> +}
>  static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
>  static inline void switch_to_sld(unsigned long tifn) {}
>  static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index db3e745e5d47..a810cd022db5 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -33,12 +33,6 @@
>  #include <asm/apic.h>
>  #endif
>  
> -enum split_lock_detect_state {
> -	sld_off = 0,
> -	sld_warn,
> -	sld_fatal,
> -};
> -
>  /*
>   * Default to sld_off because most systems do not support split lock detection
>   * split_lock_setup() will switch this to sld_warn on systems that support
> @@ -968,6 +962,12 @@ cpu_dev_register(intel_cpu_dev);
>  #undef pr_fmt
>  #define pr_fmt(fmt) "x86/split lock detection: " fmt
>  
> +enum split_lock_detect_state get_split_lock_detect_state(void)
> +{
> +	return sld_state;
> +}
> +EXPORT_SYMBOL_GPL(get_split_lock_detect_state);

I'm pretty sure KVM doesn't need to differentiate between warn and fatal if
its #AC interceptor is routed through handle_user_split_lock().  I.e. this
can return a boolean without exporting the enum.

bool is_split_lock_detect_enabled(void)
{
	return sld_state != sld_off;
}

> +
>  static const struct {
>  	const char			*option;
>  	enum split_lock_detect_state	state;
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 6/6] x86: vmx: virtualize split lock detection
  2020-02-03 21:42   ` Sean Christopherson
@ 2020-02-04  2:52     ` Xiaoyao Li
  2020-02-04  5:35       ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-04  2:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On 2/4/2020 5:42 AM, Sean Christopherson wrote:
> On Mon, Feb 03, 2020 at 11:16:08PM +0800, Xiaoyao Li wrote:
>> Due to the fact that MSR_TEST_CTRL is per-core scope, i.e., the sibling
>> threads in the same physical CPU core share the same MSR, only
>> advertising feature split lock detection to guest when SMT is disabled
>> or unsupported for simplicitly.
>>
>> Only when host is sld_off, can guest control the hardware value of
>> MSR_TEST_CTL, i.e., KVM loads guest's value into hardware when vcpu is
>> running.
>>
>> The vmx->disable_split_lock_detect can be set to true after unhandled
>> split_lock #AC in guest only when host is sld_warn mode. It's for not
>> burnning old guest, of course malicous guest can exploit it for DoS
>> attack.
>>
>> If want to prevent DoS attack from malicious guest, it must use sld_fatal
>> mode in host. When host is sld_fatal, hardware value of
>> MSR_TEST_CTL.SPLIT_LOCK_DETECT never cleared.
>>
>> Below summarizing how guest behaves if SMT is off and it's a linux guest:
>>
>> -----------------------------------------------------------------------
>>     Host	| Guest | Guest behavior
>> -----------------------------------------------------------------------
>> 1. off	|	| same as in bare metal
>> -----------------------------------------------------------------------
>> 2. warn | off	| hardware bit set initially. Once split lock happens,
>> 	|	| it sets vmx->disable_split_lock_detect, which leads
>> 	|	| hardware bit to be cleared when vcpu is running
>>          |	| So, it's the same as in bare metal
>> 	---------------------------------------------------------------
>> 3.	| warn	| - user space: get #AC when split lock, then clear
>> 	|	|   MSR bit, but hardware bit is not cleared. #AC again,
>> 	|	|   finally sets vmx->disable_split_lock_detect, which
>> 	|	|   leads hardware bit to be cleared when vcpu is running;
>> 	|	|   After the userspace process finishes, it sets vcpu's
>> 	|	|   MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit, which causes
>> 	|	|   vmx->disable_split_lock_detect to be set false
>>          |	|   So it's somehow the same as in bare-metal
>>          |	| - kernel: same as in bare metal.
>> 	--------------------------------------------------------------
>> 4.	| fatal | same as in bare metal
>> ----------------------------------------------------------------------
>> 5. fatal| off   | #AC reported to userspace
>> 	--------------------------------------------------------------
>> 6.	| warn  | - user space: get #AC when split lock, then clear
>> 	|	|   MSR bit, but hardware bit is not cleared, #AC again,
>>          |	|   #AC reported to userspace
>>          |	| - kernel: same as in bare metal, call die();
>> 	-------------------------------------------------------------
>> 7.    	| fatal | same as in bare metal
>> ----------------------------------------------------------------------
> 
> This table and half the changelog is unnecessary and confusing.  State that
> SLD is exposed to the guest if and only if SLD is disabled in the host and
> SMT is disabled (or the MSR is thread scoped), and leave it at that.

Right, SLD is exposed to the guest only when host is sld_off makes thing 
much simpler. But this seems only meaning for using guest for debugging 
or testing?

> Reiterating everything that was implemented in previous patches does more
> harm than good.

OK. Will remove them.

>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   arch/x86/kvm/vmx/vmx.c | 72 +++++++++++++++++++++++++++++++++++-------
>>   arch/x86/kvm/vmx/vmx.h |  1 +
>>   arch/x86/kvm/x86.c     | 13 ++++++--
>>   3 files changed, 73 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 93e3370c5f84..a0c3f579ecb6 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -1781,6 +1781,26 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
>>   	}
>>   }
>>   
>> +/*
>> + * Note: for guest, feature split lock detection can only be enumerated by
>> + * MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT. The FMS enumeration is invalid.
>> + */
>> +static inline bool guest_has_feature_split_lock_detect(struct kvm_vcpu *vcpu)
>> +{
>> +	return !!(vcpu->arch.core_capabilities &
>> +		  MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT);
>> +}
>> +
>> +static inline u64 vmx_msr_test_ctrl_valid_bits(struct kvm_vcpu *vcpu)
>> +{
>> +	u64 valid_bits = 0;
>> +
>> +	if (guest_has_feature_split_lock_detect(vcpu))
>> +		valid_bits |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
>> +
>> +	return valid_bits;
>> +}
>> +
>>   /*
>>    * Reads an msr value (of 'msr_index') into 'pdata'.
>>    * Returns 0 on success, non-0 otherwise.
>> @@ -1793,6 +1813,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   	u32 index;
>>   
>>   	switch (msr_info->index) {
>> +	case MSR_TEST_CTRL:
>> +		if (!msr_info->host_initiated &&
>> +		    !guest_has_feature_split_lock_detect(vcpu))
>> +			return 1;
>> +		msr_info->data = vmx->msr_test_ctrl;
>> +		break;
>>   #ifdef CONFIG_X86_64
>>   	case MSR_FS_BASE:
>>   		msr_info->data = vmcs_readl(GUEST_FS_BASE);
>> @@ -1934,6 +1960,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   	u32 index;
>>   
>>   	switch (msr_index) {
>> +	case MSR_TEST_CTRL:
>> +		if (!msr_info->host_initiated &&
>> +		    (!guest_has_feature_split_lock_detect(vcpu) ||
>> +		     data & ~vmx_msr_test_ctrl_valid_bits(vcpu)))
>> +			return 1;
>> +		if (data & MSR_TEST_CTRL_SPLIT_LOCK_DETECT)
>> +			vmx->disable_split_lock_detect = false;
> 
> Pretty sure disable_split_lock_detect won't exist, but if it does, don't
> reuse it for emulating guest behavior.  Keep the two things separate, i.e.
> use vmx->msr_test_ctrl to track guest state and use the disable_sld to
> track when the feature has been disabled for an ignorant guest.

My thought was that when both host and guest are sld_warn.
If there is a split lock in guest user space,
  1. #AC trapped in kvm, and re-injected to guest due to guest's MSR bit 
set;
  2. Guest clears MSR bit but hardware bit not cleared, re-execute the 
instruction
  3. #AC trapped again, vmx->disable_sld set to true, vm-enter to guest 
with hardware MSR bit cleared, re-execute the instruction
  4. After guest user space application finishes/ or scheduled, guest 
set MSR bit, here we'd better clear vmx->disable_sld, otherwise hardware 
MSR bit keeps cleared for this vcpu thread.

Also, this makes a difference for guest user space application that when 
it scheduled out then scheduled in, the MSR bit is set again while in 
bare metal it keeps cleared. That's why I use pr_warn_ratelimited() in 
#AC interceptor.

>> +		vmx->msr_test_ctrl = data;
>> +		break;
>>   	case MSR_EFER:
>>   		ret = kvm_set_msr_common(vcpu, msr_info);
>>   		break;
>> @@ -4233,6 +4268,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>   
>>   	vmx->msr_ia32_umwait_control = 0;
>>   
>> +	vmx->msr_test_ctrl = 0;
>>   	vmx->disable_split_lock_detect = false;
>>   
>>   	vcpu->arch.microcode_version = 0x100000000ULL;
>> @@ -4565,6 +4601,11 @@ static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
>>   	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
>>   }
>>   
>> +static inline bool guest_cpu_split_lock_detect_enabled(struct vcpu_vmx *vmx)
>> +{
>> +	return !!(vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT);
> 
> The "!!" isn't necessary.

Ok. will remove it.

>> +}
>> +
>>   static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>>   {
>>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> @@ -4660,8 +4701,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>>   		break;
>>   	case AC_VECTOR:
>>   		/*
>> -		 * Inject #AC back to guest only when legacy alignment check
>> -		 * enabled.
>> +		 * Inject #AC back to guest only when guest is expecting it,
>> +		 * i.e., legacy alignment check or split lock #AC enabled.
>>   		 * Otherwise, it must be an unexpected split-lock #AC for guest
>>   		 * since KVM keeps hardware SPLIT_LOCK_DETECT bit unchanged
>>   		 * when vcpu is running.
>> @@ -4674,12 +4715,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>>   		 *    similar as sending SIGBUS.
>>   		 */
>>   		if (guest_cpu_alignment_check_enabled(vcpu) ||
>> +		    guest_cpu_split_lock_detect_enabled(vmx) ||
> 
> Again, check for SLD before AC, it's significantly cheaper.

OK. Thanks.

>>   		    WARN_ON(get_split_lock_detect_state() == sld_off)) {
>>   			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
>>   			return 1;
>>   		}
>>   		if (get_split_lock_detect_state() == sld_warn) {
>> -			pr_warn("kvm: split lock #AC happened in %s [%d]\n",
>> +			pr_warn_ratelimited("kvm: split lock #AC happened in %s [%d]\n",
>>   				current->comm, current->pid);
> 
> Ratelimiting change belongs in the earlier path.  Moot point if this routes
> through handle_user_split_lock().
> 
>>   			vmx->disable_split_lock_detect = true;
>>   			return 1;
>> @@ -6491,6 +6533,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>   {
>>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>   	unsigned long cr3, cr4;
>> +	bool host_sld_enabled, guest_sld_enabled;
>>   
>>   	/* Record the guest's net vcpu time for enforced NMI injections. */
>>   	if (unlikely(!enable_vnmi &&
>> @@ -6562,10 +6605,15 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>   	 */
>>   	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
>>   
>> -	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
>> -	    unlikely(vmx->disable_split_lock_detect) &&
>> -	    !test_tsk_thread_flag(current, TIF_SLD))
>> -		split_lock_detect_set(false);
>> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
>> +		host_sld_enabled = get_split_lock_detect_state() &&
>> +				   !test_tsk_thread_flag(current, TIF_SLD);
>> +		guest_sld_enabled = guest_cpu_split_lock_detect_enabled(vmx);
>> +		if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect))
>> +			split_lock_detect_set(false);
>> +		else if (!host_sld_enabled && guest_sld_enabled)
>> +			split_lock_detect_set(true);
> 
> This will be massively simplified by letting TIF_SLD do the dirty work.
> Since SLD will be exposed to the guest if and only if it's disabled in the
> host, this becomes:
> 
> 	if (static_cpu_has(...) && vmx->msr_test_control)
> 		wrmsrl(MSR_TEST_CTL,
> 		       this_cpu_read(msr_test_ctl_val) | vmx->msr_test_ctl);
> 
> 	__vmx_vcpu_run();
> 
> 
>          if (static_cpu_has(...) && vmx->msr_test_control)
>                  wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_val));
> 
>> +	}
>>   
>>   	/* L1D Flush includes CPU buffer clear to mitigate MDS */
>>   	if (static_branch_unlikely(&vmx_l1d_should_flush))
>> @@ -6601,10 +6649,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>   
>>   	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
>>   
>> -	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
>> -	    unlikely(vmx->disable_split_lock_detect) &&
>> -	    !test_tsk_thread_flag(current, TIF_SLD))
>> -		split_lock_detect_set(true);
>> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
>> +		if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect))
>> +			split_lock_detect_set(true);
>> +		else if (!host_sld_enabled && guest_sld_enabled)
>> +			split_lock_detect_set(false);
>> +	}
>>   
>>   	/* All fields are clean at this point */
>>   	if (static_branch_unlikely(&enable_evmcs))
>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> index 912eba66c5d5..c36c663f4bae 100644
>> --- a/arch/x86/kvm/vmx/vmx.h
>> +++ b/arch/x86/kvm/vmx/vmx.h
>> @@ -222,6 +222,7 @@ struct vcpu_vmx {
>>   #endif
>>   
>>   	u64		      spec_ctrl;
>> +	u64		      msr_test_ctrl;
>>   	u32		      msr_ia32_umwait_control;
>>   
>>   	u32 secondary_exec_control;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index a97a8f5dd1df..56e799981d53 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1163,7 +1163,7 @@ static const u32 msrs_to_save_all[] = {
>>   #endif
>>   	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
>>   	MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
>> -	MSR_IA32_SPEC_CTRL,
>> +	MSR_IA32_SPEC_CTRL, MSR_TEST_CTRL,
>>   	MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
>>   	MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
>>   	MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
>> @@ -1345,7 +1345,12 @@ static u64 kvm_get_arch_capabilities(void)
>>   
>>   static u64 kvm_get_core_capabilities(void)
>>   {
>> -	return 0;
>> +	u64 data = 0;
>> +
>> +	if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && !cpu_smt_possible())
>> +		data |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
> 
> And only if SLD is disabled, no?
> 
>> +
>> +	return data;
>>   }
>>   
>>   static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
>> @@ -5259,6 +5264,10 @@ static void kvm_init_msr_list(void)
>>   		 * to the guests in some cases.
>>   		 */
>>   		switch (msrs_to_save_all[i]) {
>> +		case MSR_TEST_CTRL:
>> +			if (!(kvm_get_core_capabilities() &
>> +			      MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT))
>> +				continue;
>>   		case MSR_IA32_BNDCFGS:
>>   			if (!kvm_mpx_supported())
>>   				continue;
>> -- 
>> 2.23.0
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-03 20:54   ` Sean Christopherson
@ 2020-02-04  2:55     ` Xiaoyao Li
  0 siblings, 0 replies; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-04  2:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On 2/4/2020 4:54 AM, Sean Christopherson wrote:
> On Mon, Feb 03, 2020 at 11:16:05PM +0800, Xiaoyao Li wrote:
>> If split lock detect is enabled (warn/fatal), #AC handler calls die()
>> when split lock happens in kernel.
>>
>> A sane guest should never tigger emulation on a split-lock access, but
>> it cannot prevent malicous guest from doing this. So just emulating the
>> access as a write if it's a split-lock access to avoid malicous guest
>> polluting the kernel log.
>>
>> More detail analysis can be found:
>> https://lkml.kernel.org/r/20200131200134.GD18946@linux.intel.com
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   arch/x86/kvm/x86.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 2d3be7f3ad67..821b7404c0fd 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5847,6 +5847,13 @@ static int emulator_write_emulated(struct x86_emulate_ctxt *ctxt,
>>   	(cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old))
>>   #endif
>>   
>> +static inline bool across_cache_line_access(gpa_t gpa, unsigned int bytes)
> 
> s/across/split so as not to introduce another name.
> 
>> +{
>> +	unsigned int cache_line_size = cache_line_size();
>> +
>> +	return (gpa & (cache_line_size - 1)) + bytes > cache_line_size;
> 
> I'd prefer to use the same logic as the page-split to avoid having to
> reason about the correctness of two different algorithms.
> 
>> +}
>> +
>>   static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
>>   				     unsigned long addr,
>>   				     const void *old,
>> @@ -5873,6 +5880,10 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
>>   	if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
>>   		goto emul_write;
>>   
>> +	if (get_split_lock_detect_state() != sld_off &&
>> +	    across_cache_line_access(gpa, bytes))
>> +		goto emul_write;
> 
> As an alternative to the above, the page/line splits can be handled in a
> single check, e.g.
> 
> 	page_line_mask = PAGE_MASK;
> 	if (is_split_lock_detect_enabled())
> 		page_line_mask = cache_line_size() - 1;
> 	if (((gpa + bytes - 1) & page_line_mask) != (gpa & page_line_mask))
> 		goto emul_write;

It's better, will use your suggestion.
Thanks!

>> +
>>   	if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map))
>>   		goto emul_write;
>>   
>> -- 
>> 2.23.0
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 6/6] x86: vmx: virtualize split lock detection
  2020-02-04  2:52     ` Xiaoyao Li
@ 2020-02-04  5:35       ` Sean Christopherson
  0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2020-02-04  5:35 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On Tue, Feb 04, 2020 at 10:52:01AM +0800, Xiaoyao Li wrote:
> On 2/4/2020 5:42 AM, Sean Christopherson wrote:
> >On Mon, Feb 03, 2020 at 11:16:08PM +0800, Xiaoyao Li wrote:
> >>
> >>Only when host is sld_off, can guest control the hardware value of
> >>MSR_TEST_CTL, i.e., KVM loads guest's value into hardware when vcpu is
> >>running.

...

> Right, SLD is exposed to the guest only when host is sld_off makes thing
> much simpler. But this seems only meaning for using guest for debugging or
> testing?

Ah, I misunderstood.  I thought the above quote was saying SLD would be
exposed to the guest if it's off in the host, i.e. intended only to reword
the changelog.

Per our offline discussion:

  sld_fatal - MSR_TEST_CTL.SDL is forced on and is sticky from the guest's
              perspective (so the guest can detect a forced fatal mode).

  sld_warn - SLD is exposed to the guest.  MSR_TEST_CTL.SDL is left on
             until an #AC is intercepted with MSR_TEST_CTL.SDL=0 in the
             guest, at which point normal sld_warn rules apply.  If a vCPU
             associated with the task does VM-Enter with MSR_TEST_CTL.SDL=1,
             TIF_SLD is reset and the cycle begins anew.

  sld_off - When set by the guest, MSR_TEST_CTL.SLD is set on VM-Entry
            and cleared on VM-Exit.

Side topic, this means we need more than is_split_lock_detect_enabled(),
but it's probably still a good idea to hide the enum, e.g. have
is_sld_enabled() and is_sld_fatal() wrappers.
 
> >Reiterating everything that was implemented in previous patches does more
> >harm than good.

...

> >>@@ -1934,6 +1960,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >>  	u32 index;
> >>  	switch (msr_index) {
> >>+	case MSR_TEST_CTRL:
> >>+		if (!msr_info->host_initiated &&
> >>+		    (!guest_has_feature_split_lock_detect(vcpu) ||
> >>+		     data & ~vmx_msr_test_ctrl_valid_bits(vcpu)))
> >>+			return 1;
> >>+		if (data & MSR_TEST_CTRL_SPLIT_LOCK_DETECT)
> >>+			vmx->disable_split_lock_detect = false;
> >
> >Pretty sure disable_split_lock_detect won't exist, but if it does, don't
> >reuse it for emulating guest behavior.  Keep the two things separate, i.e.
> >use vmx->msr_test_ctrl to track guest state and use the disable_sld to
> >track when the feature has been disabled for an ignorant guest.
> 
> My thought was that when both host and guest are sld_warn.
> If there is a split lock in guest user space,
>  1. #AC trapped in kvm, and re-injected to guest due to guest's MSR bit set;
>  2. Guest clears MSR bit but hardware bit not cleared, re-execute the
> instruction
>  3. #AC trapped again, vmx->disable_sld set to true, vm-enter to guest with
> hardware MSR bit cleared, re-execute the instruction
>  4. After guest user space application finishes/ or scheduled, guest set MSR
> bit, here we'd better clear vmx->disable_sld, otherwise hardware MSR bit
> keeps cleared for this vcpu thread.

Ya, all that works.  But I don't think KVM needs to context switch
MSR_TEST_CTRL in any mode except sld_off.  For sld_fatal, it's simply on.
For sld_warn, it's only disabled when TIF_SLD=1, i.e. after a warning #AC.

I suppose there's a corner case where userspace is multiplexing vCPUs on
tasks, in which case we could end up with TIF_SLD=1 and MSR_TEST_CTRL.SLD=1.
KVM still doesn't need a separate flag, e.g.:

        if (static_cpu_has(...) && vmx->msr_test_control) {
                if (test_thread_flag(TIF_SLD))
                        sld_turn_back_on();
                else if (!is_split_lock_detect_enabled())
                        wrmsrl(MSR_TEST_CTL,
                               this_cpu_read(msr_test_ctl_val) |
                               vmx->msr_test_ctl);
	}

        __vmx_vcpu_run();

        if (static_cpu_has(...) && vmx->msr_test_control &&
            !is_split_lock_detect_enabled())
                wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_val));




> Also, this makes a difference for guest user space application that when it
> scheduled out then scheduled in, the MSR bit is set again while in bare
> metal it keeps cleared. That's why I use pr_warn_ratelimited() in #AC
> interceptor.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 4/6] kvm: vmx: Extend VMX's #AC handding for split lock in guest
  2020-02-03 21:14   ` Sean Christopherson
@ 2020-02-04  6:46     ` Xiaoyao Li
  2020-02-10 21:30       ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-04  6:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On 2/4/2020 5:14 AM, Sean Christopherson wrote:
> On Mon, Feb 03, 2020 at 11:16:06PM +0800, Xiaoyao Li wrote:
>> There are two types of #AC can be generated in Intel CPUs:
>>   1. legacy alignment check #AC;
>>   2. split lock #AC;
>>
>> Legacy alignment check #AC can be injected to guest if guest has enabled
>> alignemnet check.
>>
>> When host enables split lock detection, i.e., split_lock_detect != off,
>> guest will receive an unexpected #AC when there is a split lock happens
>> since KVM doesn't virtualize this feature to guest hardware value of
>> MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit stays unchanged when vcpu is running.
>>
>> Since old guests lack split_lock #AC handler and may have split lock buges.
>> To make them survive from split lock, applying the similar policy
>> as host's split lock detect configuration:
>>   - host split lock detect is sld_warn:
>>     warn the split lock happened in guest, and disabling split lock
>>     detect during vcpu is running to allow the guest to continue running.
>>   - host split lock detect is sld_fatal:
>>     forwarding #AC to userspace, somewhat similar as sending SIGBUS.
>>
>> Please note:
>> 1. If sld_warn and SMT is enabled, the split lock in guest's vcpu
>> leads to disable split lock detect on the sibling CPU thread during
>> the vcpu is running.
>>
>> 2. When host is sld_warn, it allows guest to generate split lock which also
>> opens the door for malicious guest to do DoS attack. It is same that in
>> sld_warn mode, userspace application can do DoS attack.
>>
>> 3. If want to prevent DoS attack from guest, host must use sld_fatal mode.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   arch/x86/kvm/vmx/vmx.c | 48 +++++++++++++++++++++++++++++++++++++++---
>>   arch/x86/kvm/vmx/vmx.h |  3 +++
>>   2 files changed, 48 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index c475fa2aaae0..93e3370c5f84 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -4233,6 +4233,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>   
>>   	vmx->msr_ia32_umwait_control = 0;
>>   
>> +	vmx->disable_split_lock_detect = false;
>> +
> 
> I see no reason to give special treatment to RESET/INIT, i.e. leave the
> flag set.  vCPUs are zeroed on allocation.

So when guest reboots, it doesn't need to reset it to false?
I am not clear about difference between RESET and INIT, so I didn't 
differentiate them into different case with init_event

>>   	vcpu->arch.microcode_version = 0x100000000ULL;
>>   	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
>>   	vmx->hv_deadline_tsc = -1;
>> @@ -4557,6 +4559,12 @@ static int handle_machine_check(struct kvm_vcpu *vcpu)
>>   	return 1;
>>   }
>>   
>> +static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
>> +{
>> +	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
>> +	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
>> +}
>> +
>>   static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>>   {
>>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> @@ -4622,9 +4630,6 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>>   		return handle_rmode_exception(vcpu, ex_no, error_code);
>>   
>>   	switch (ex_no) {
>> -	case AC_VECTOR:
>> -		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
>> -		return 1;
>>   	case DB_VECTOR:
>>   		dr6 = vmcs_readl(EXIT_QUALIFICATION);
>>   		if (!(vcpu->guest_debug &
>> @@ -4653,6 +4658,33 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>>   		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
>>   		kvm_run->debug.arch.exception = ex_no;
>>   		break;
>> +	case AC_VECTOR:
>> +		/*
>> +		 * Inject #AC back to guest only when legacy alignment check
>> +		 * enabled.
>> +		 * Otherwise, it must be an unexpected split-lock #AC for guest
>> +		 * since KVM keeps hardware SPLIT_LOCK_DETECT bit unchanged
>> +		 * when vcpu is running.
>> +		 *  - If sld_state == sld_warn, treat it similar as user space
>> +		 *    process that warn and allow it to continue running.
>> +		 *    In this case, setting vmx->diasble_split_lock_detect to
>> +		 *    true so that it will toggle MSR_TEST.SPLIT_LOCK_DETECT
>> +		 *    bit during every following VM Entry and Exit;
>> +		 *  - If sld_state == sld_fatal, it forwards #AC to userspace,
>> +		 *    similar as sending SIGBUS.
>> +		 */
>> +		if (guest_cpu_alignment_check_enabled(vcpu) ||
>> +		    WARN_ON(get_split_lock_detect_state() == sld_off)) {
> 
> Eh, I'd omit the WARN.  And invert the ordering to avoid multiple VMREADs
> when SLD is disabled, which will be the common case.
> 
>> +			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
>> +			return 1;
>> +		}
>> +		if (get_split_lock_detect_state() == sld_warn) {
>> +			pr_warn("kvm: split lock #AC happened in %s [%d]\n",
>> +				current->comm, current->pid);
> 
> Set TIF_SLD and the MSR bit, then __switch_to_xtra() will automatically
> handle writing the MSR when necessary.

Right, we can do this.

However, if using TIF_SLD and __switch_to_xtra() to switch MSR bit. Once 
there is a split lock in guest, it set TIF_SLD for the vcpu thread, so 
it loses the capability to find and warn the split locks in the user 
space thread, e.g., QEMU vcpu thread, and also loses the capability to 
find the split lock in KVM.

If it's not a problem, I agree to use TIF_SLD.

> Even better would be to export handle_user_split_lock() and call that
> directly.  The EFLAGS.AC logic in handle_user_split_lock() can be moved out
> to do_alignment_check() to avoid that complication; arguably that should be
> done in the initial SLD patch.

the warning message of handle_user_split_lock() contains the RIP of 
userspace application. If use it here, what RIP should we use? the guest 
RIP of the faulting instruction?

>> +			vmx->disable_split_lock_detect = true;
>> +			return 1;
>> +		}
>> +		/* fall through*/
>>   	default:
>>   		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
>>   		kvm_run->ex.exception = ex_no;
>> @@ -6530,6 +6562,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>   	 */
>>   	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
>>   
>> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
>> +	    unlikely(vmx->disable_split_lock_detect) &&
>> +	    !test_tsk_thread_flag(current, TIF_SLD))
>> +		split_lock_detect_set(false);
>> +
>>   	/* L1D Flush includes CPU buffer clear to mitigate MDS */
>>   	if (static_branch_unlikely(&vmx_l1d_should_flush))
>>   		vmx_l1d_flush(vcpu);
>> @@ -6564,6 +6601,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>   
>>   	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
>>   
>> +	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
>> +	    unlikely(vmx->disable_split_lock_detect) &&
>> +	    !test_tsk_thread_flag(current, TIF_SLD))
>> +		split_lock_detect_set(true);
> 
> Manually calling split_lock_detect_set() in vmx_vcpu_run() is unnecessary.
> The MSR only needs to be written on the initial #AC, after that KVM can
> rely on the stickiness of TIF_SLD to ensure the MSR is set correctly when
> control transfer to/from this vCPU.
> 
>> +
>>   	/* All fields are clean at this point */
>>   	if (static_branch_unlikely(&enable_evmcs))
>>   		current_evmcs->hv_clean_fields |=
>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> index 7f42cf3dcd70..912eba66c5d5 100644
>> --- a/arch/x86/kvm/vmx/vmx.h
>> +++ b/arch/x86/kvm/vmx/vmx.h
>> @@ -274,6 +274,9 @@ struct vcpu_vmx {
>>   
>>   	bool req_immediate_exit;
>>   
>> +	/* Disable split-lock detection when running the vCPU */
>> +	bool disable_split_lock_detect;
>> +
>>   	/* Support for PML */
>>   #define PML_ENTITY_NUM		512
>>   	struct page *pml_pg;
>> -- 
>> 2.23.0
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES
  2020-02-03 21:43   ` Sean Christopherson
@ 2020-02-04  9:19     ` Xiaoyao Li
  2020-02-04  9:37       ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-04  9:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On 2/4/2020 5:43 AM, Sean Christopherson wrote:
> On Mon, Feb 03, 2020 at 11:16:07PM +0800, Xiaoyao Li wrote:
>> Emulate MSR_IA32_CORE_CAPABILITIES in software and unconditionally
>> advertise its support to userspace. Like MSR_IA32_ARCH_CAPABILITIES, it
>> is a feature-enumerating MSR and can be fully emulated regardless of
>> hardware support. Existence of CORE_CAPABILITIES is enumerated via
>> CPUID.(EAX=7H,ECX=0):EDX[30].
>>
>> Note, support for individual features enumerated via CORE_CAPABILITIES,
>> e.g., split lock detection, will be added in future patches.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  1 +
>>   arch/x86/kvm/cpuid.c            |  5 +++--
>>   arch/x86/kvm/x86.c              | 22 ++++++++++++++++++++++
>>   3 files changed, 26 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 329d01c689b7..dc231240102f 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -591,6 +591,7 @@ struct kvm_vcpu_arch {
>>   	u64 ia32_xss;
>>   	u64 microcode_version;
>>   	u64 arch_capabilities;
>> +	u64 core_capabilities;
>>   
>>   	/*
>>   	 * Paging state of the vcpu
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index b1c469446b07..7282d04f3a6b 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -409,10 +409,11 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
>>   		    boot_cpu_has(X86_FEATURE_AMD_SSBD))
>>   			entry->edx |= F(SPEC_CTRL_SSBD);
>>   		/*
>> -		 * We emulate ARCH_CAPABILITIES in software even
>> -		 * if the host doesn't support it.
>> +		 * ARCH_CAPABILITIES and CORE_CAPABILITIES are emulated in
>> +		 * software regardless of host support.
>>   		 */
>>   		entry->edx |= F(ARCH_CAPABILITIES);
>> +		entry->edx |= F(CORE_CAPABILITIES);
>>   		break;
>>   	case 1:
>>   		entry->eax &= kvm_cpuid_7_1_eax_x86_features;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 821b7404c0fd..a97a8f5dd1df 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1222,6 +1222,7 @@ static const u32 emulated_msrs_all[] = {
>>   	MSR_IA32_TSC_ADJUST,
>>   	MSR_IA32_TSCDEADLINE,
>>   	MSR_IA32_ARCH_CAPABILITIES,
>> +	MSR_IA32_CORE_CAPS,
>>   	MSR_IA32_MISC_ENABLE,
>>   	MSR_IA32_MCG_STATUS,
>>   	MSR_IA32_MCG_CTL,
>> @@ -1288,6 +1289,7 @@ static const u32 msr_based_features_all[] = {
>>   	MSR_F10H_DECFG,
>>   	MSR_IA32_UCODE_REV,
>>   	MSR_IA32_ARCH_CAPABILITIES,
>> +	MSR_IA32_CORE_CAPS,
>>   };
>>   
>>   static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all)];
>> @@ -1341,12 +1343,20 @@ static u64 kvm_get_arch_capabilities(void)
>>   	return data;
>>   }
>>   
>> +static u64 kvm_get_core_capabilities(void)
>> +{
>> +	return 0;
>> +}
>> +
>>   static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
>>   {
>>   	switch (msr->index) {
>>   	case MSR_IA32_ARCH_CAPABILITIES:
>>   		msr->data = kvm_get_arch_capabilities();
>>   		break;
>> +	case MSR_IA32_CORE_CAPS:
>> +		msr->data = kvm_get_core_capabilities();
>> +		break;
>>   	case MSR_IA32_UCODE_REV:
>>   		rdmsrl_safe(msr->index, &msr->data);
>>   		break;
>> @@ -2716,6 +2726,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   			return 1;
>>   		vcpu->arch.arch_capabilities = data;
>>   		break;
>> +	case MSR_IA32_CORE_CAPS:
>> +		if (!msr_info->host_initiated)
> 
> Shouldn't @data be checked against kvm_get_core_capabilities()?

Maybe it's for the case that userspace might have the ability to emulate 
SLD feature? And we usually let userspace set whatever it wants, e.g., 
ARCH_CAPABILITIES.

Anyway, I have no objection to add this check.

>> +			return 1;
>> +		vcpu->arch.core_capabilities = data;
>> +		break;
>>   	case MSR_EFER:
>>   		return set_efer(vcpu, msr_info);
>>   	case MSR_K7_HWCR:
>> @@ -3044,6 +3059,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   			return 1;
>>   		msr_info->data = vcpu->arch.arch_capabilities;
>>   		break;
>> +	case MSR_IA32_CORE_CAPS:
>> +		if (!msr_info->host_initiated &&
>> +		    !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITIES))
>> +			return 1;
>> +		msr_info->data = vcpu->arch.core_capabilities;
>> +		break;
>>   	case MSR_IA32_POWER_CTL:
>>   		msr_info->data = vcpu->arch.msr_ia32_power_ctl;
>>   		break;
>> @@ -9288,6 +9309,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>>   		goto free_guest_fpu;
>>   
>>   	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
>> +	vcpu->arch.core_capabilities = kvm_get_core_capabilities();
>>   	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
>>   	kvm_vcpu_mtrr_init(vcpu);
>>   	vcpu_load(vcpu);
>> -- 
>> 2.23.0
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES
  2020-02-04  9:19     ` Xiaoyao Li
@ 2020-02-04  9:37       ` Peter Zijlstra
  2020-02-11  3:52         ` Andy Lutomirski
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2020-02-04  9:37 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, x86, kvm, linux-kernel,
	David Laight

On Tue, Feb 04, 2020 at 05:19:26PM +0800, Xiaoyao Li wrote:

> > > +	case MSR_IA32_CORE_CAPS:
> > > +		if (!msr_info->host_initiated)
> > 
> > Shouldn't @data be checked against kvm_get_core_capabilities()?
> 
> Maybe it's for the case that userspace might have the ability to emulate SLD
> feature? And we usually let userspace set whatever it wants, e.g.,
> ARCH_CAPABILITIES.

If the 'sq_misc.split_lock' event is sufficiently accurate, I suppose
the host could use that to emulate the feature at the cost of one
counter used.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 4/6] kvm: vmx: Extend VMX's #AC handding for split lock in guest
  2020-02-04  6:46     ` Xiaoyao Li
@ 2020-02-10 21:30       ` Sean Christopherson
  0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2020-02-10 21:30 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On Tue, Feb 04, 2020 at 02:46:14PM +0800, Xiaoyao Li wrote:
> On 2/4/2020 5:14 AM, Sean Christopherson wrote:
> >>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >>index c475fa2aaae0..93e3370c5f84 100644
> >>--- a/arch/x86/kvm/vmx/vmx.c
> >>+++ b/arch/x86/kvm/vmx/vmx.c
> >>@@ -4233,6 +4233,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> >>  	vmx->msr_ia32_umwait_control = 0;
> >>+	vmx->disable_split_lock_detect = false;
> >>+
> >
> >I see no reason to give special treatment to RESET/INIT, i.e. leave the
> >flag set.  vCPUs are zeroed on allocation.
> 
> So when guest reboots, it doesn't need to reset it to false?

No.  KVM _could_ clear disable_split_lock_detect, but it's not required.
E.g. KVM could periodically clear disable_split_lock_detect irrespective
of RESET/INIT.

> I am not clear about difference between RESET and INIT, so I didn't
> differentiate them into different case with init_event

...

> >>+			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> >>+			return 1;
> >>+		}
> >>+		if (get_split_lock_detect_state() == sld_warn) {
> >>+			pr_warn("kvm: split lock #AC happened in %s [%d]\n",
> >>+				current->comm, current->pid);
> >
> >Set TIF_SLD and the MSR bit, then __switch_to_xtra() will automatically
> >handle writing the MSR when necessary.
> 
> Right, we can do this.
> 
> However, if using TIF_SLD and __switch_to_xtra() to switch MSR bit. Once
> there is a split lock in guest, it set TIF_SLD for the vcpu thread, so it
> loses the capability to find and warn the split locks in the user space
> thread, e.g., QEMU vcpu thread, and also loses the capability to find the
> split lock in KVM.

Finding split locks in KVM is a non-issue, in the (hopefully unlikely) event
KVM ends up with a split lock bug, the odds of the bug being hit *only* 
after a guest also hits a split-lock #AC are tiny.

Userspace is a different question.  My preference would to keep KVM simple
and rely in TIF_SLD.

> If it's not a problem, I agree to use TIF_SLD.
> 
> >Even better would be to export handle_user_split_lock() and call that
> >directly.  The EFLAGS.AC logic in handle_user_split_lock() can be moved out
> >to do_alignment_check() to avoid that complication; arguably that should be
> >done in the initial SLD patch.
> 
> the warning message of handle_user_split_lock() contains the RIP of
> userspace application. If use it here, what RIP should we use? the guest RIP
> of the faulting instruction?

Yes, guest RIP.

> >>+			vmx->disable_split_lock_detect = true;
> >>+			return 1;
> >>+		}
> >>+		/* fall through*/
> >>  	default:
> >>  		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
> >>  		kvm_run->ex.exception = ex_no;
> >>@@ -6530,6 +6562,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >>  	 */
> >>  	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
> >>+	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> >>+	    unlikely(vmx->disable_split_lock_detect) &&
> >>+	    !test_tsk_thread_flag(current, TIF_SLD))
> >>+		split_lock_detect_set(false);
> >>+
> >>  	/* L1D Flush includes CPU buffer clear to mitigate MDS */
> >>  	if (static_branch_unlikely(&vmx_l1d_should_flush))
> >>  		vmx_l1d_flush(vcpu);
> >>@@ -6564,6 +6601,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >>  	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
> >>+	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> >>+	    unlikely(vmx->disable_split_lock_detect) &&
> >>+	    !test_tsk_thread_flag(current, TIF_SLD))
> >>+		split_lock_detect_set(true);
> >
> >Manually calling split_lock_detect_set() in vmx_vcpu_run() is unnecessary.
> >The MSR only needs to be written on the initial #AC, after that KVM can
> >rely on the stickiness of TIF_SLD to ensure the MSR is set correctly when
> >control transfer to/from this vCPU.
> >
> >>+
> >>  	/* All fields are clean at this point */
> >>  	if (static_branch_unlikely(&enable_evmcs))
> >>  		current_evmcs->hv_clean_fields |=
> >>diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> >>index 7f42cf3dcd70..912eba66c5d5 100644
> >>--- a/arch/x86/kvm/vmx/vmx.h
> >>+++ b/arch/x86/kvm/vmx/vmx.h
> >>@@ -274,6 +274,9 @@ struct vcpu_vmx {
> >>  	bool req_immediate_exit;
> >>+	/* Disable split-lock detection when running the vCPU */
> >>+	bool disable_split_lock_detect;
> >>+
> >>  	/* Support for PML */
> >>  #define PML_ENTITY_NUM		512
> >>  	struct page *pml_pg;
> >>-- 
> >>2.23.0
> >>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES
  2020-02-04  9:37       ` Peter Zijlstra
@ 2020-02-11  3:52         ` Andy Lutomirski
  2020-02-11 12:38           ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Andy Lutomirski @ 2020-02-11  3:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Xiaoyao Li, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, kvm list, LKML,
	David Laight

On Tue, Feb 4, 2020 at 1:37 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Feb 04, 2020 at 05:19:26PM +0800, Xiaoyao Li wrote:
>
> > > > + case MSR_IA32_CORE_CAPS:
> > > > +         if (!msr_info->host_initiated)
> > >
> > > Shouldn't @data be checked against kvm_get_core_capabilities()?
> >
> > Maybe it's for the case that userspace might have the ability to emulate SLD
> > feature? And we usually let userspace set whatever it wants, e.g.,
> > ARCH_CAPABILITIES.
>
> If the 'sq_misc.split_lock' event is sufficiently accurate, I suppose
> the host could use that to emulate the feature at the cost of one
> counter used.

I would be impressed if the event were to fire before executing the
offending split lock.  Wouldn't the best possible result be for it to
fire with RIP pointing to the *next* instruction?  This seems like it
could be quite confusing to a guest.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-03 15:16 ` [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write Xiaoyao Li
  2020-02-03 20:54   ` Sean Christopherson
@ 2020-02-11 12:20   ` Paolo Bonzini
  2020-02-11 13:22     ` Thomas Gleixner
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2020-02-11 12:20 UTC (permalink / raw)
  To: Xiaoyao Li, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight

On 03/02/20 16:16, Xiaoyao Li wrote:
> A sane guest should never tigger emulation on a split-lock access, but
> it cannot prevent malicous guest from doing this. So just emulating the
> access as a write if it's a split-lock access to avoid malicous guest
> polluting the kernel log.

Saying that anything doing a split lock access is malicious makes little
sense.

Split lock detection is essentially a debugging feature, there's a
reason why the MSR is called "TEST_CTL".  So you don't want to make the
corresponding behavior wrong.  Just kill the guest with a
KVM_INTERNAL_ERROR userspace exit so people will notice quickly and
either disable the feature or see if they can fix the guest.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES
  2020-02-11  3:52         ` Andy Lutomirski
@ 2020-02-11 12:38           ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2020-02-11 12:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Xiaoyao Li, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, kvm list, LKML,
	David Laight

On Mon, Feb 10, 2020 at 07:52:13PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 4, 2020 at 1:37 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Feb 04, 2020 at 05:19:26PM +0800, Xiaoyao Li wrote:
> >
> > > > > + case MSR_IA32_CORE_CAPS:
> > > > > +         if (!msr_info->host_initiated)
> > > >
> > > > Shouldn't @data be checked against kvm_get_core_capabilities()?
> > >
> > > Maybe it's for the case that userspace might have the ability to emulate SLD
> > > feature? And we usually let userspace set whatever it wants, e.g.,
> > > ARCH_CAPABILITIES.
> >
> > If the 'sq_misc.split_lock' event is sufficiently accurate, I suppose
> > the host could use that to emulate the feature at the cost of one
> > counter used.
> 
> I would be impressed if the event were to fire before executing the
> offending split lock.  Wouldn't the best possible result be for it to
> fire with RIP pointing to the *next* instruction?  This seems like it
> could be quite confusing to a guest.

True; and I see no indication the event is PEBS capable, so even that is
pushing it.

However, it's virt; isn't that confused per definition? ;-))

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-11 12:20   ` Paolo Bonzini
@ 2020-02-11 13:22     ` Thomas Gleixner
  2020-02-11 13:34       ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2020-02-11 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Xiaoyao Li, Sean Christopherson, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight

Paolo Bonzini <pbonzini@redhat.com> writes:
> On 03/02/20 16:16, Xiaoyao Li wrote:
>> A sane guest should never tigger emulation on a split-lock access, but
>> it cannot prevent malicous guest from doing this. So just emulating the
>> access as a write if it's a split-lock access to avoid malicous guest
>> polluting the kernel log.
>
> Saying that anything doing a split lock access is malicious makes little
> sense.

Correct, but we also have to accept, that split lock access can be used
in a malicious way, aka. DoS.

> Split lock detection is essentially a debugging feature, there's a
> reason why the MSR is called "TEST_CTL".  So you don't want to make the

The fact that it ended up in MSR_TEST_CTL does not say anything. That's
where they it ended up to be as it was hastily cobbled together for
whatever reason.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-11 13:22     ` Thomas Gleixner
@ 2020-02-11 13:34       ` Paolo Bonzini
  2020-02-11 14:02         ` Xiaoyao Li
  2020-02-27  0:11         ` Sean Christopherson
  0 siblings, 2 replies; 31+ messages in thread
From: Paolo Bonzini @ 2020-02-11 13:34 UTC (permalink / raw)
  To: Thomas Gleixner, Xiaoyao Li, Sean Christopherson, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight

On 11/02/20 14:22, Thomas Gleixner wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
>> On 03/02/20 16:16, Xiaoyao Li wrote:
>>> A sane guest should never tigger emulation on a split-lock access, but
>>> it cannot prevent malicous guest from doing this. So just emulating the
>>> access as a write if it's a split-lock access to avoid malicous guest
>>> polluting the kernel log.
>>
>> Saying that anything doing a split lock access is malicious makes little
>> sense.
> 
> Correct, but we also have to accept, that split lock access can be used
> in a malicious way, aka. DoS.

Indeed, a more accurate emulation such as temporarily disabling
split-lock detection in the emulator would allow the guest to use split
lock access as a vehicle for DoS, but that's not what the commit message
says.  If it were only about polluting the kernel log, there's
printk_ratelimited for that.  (In fact, if we went for incorrect
emulation as in this patch, a rate-limited pr_warn would be a good idea).

It is much more convincing to say that since this is pretty much a
theoretical case, we can assume that it is only done with the purpose of
DoS-ing the host or something like that, and therefore we kill the guest.

>> Split lock detection is essentially a debugging feature, there's a
>> reason why the MSR is called "TEST_CTL".  So you don't want to make the
> 
> The fact that it ended up in MSR_TEST_CTL does not say anything. That's
> where they it ended up to be as it was hastily cobbled together for
> whatever reason.

Or perhaps it was there all the time in test silicon or something like
that...  That would be a very plausible reason for all the quirks behind it.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-11 13:34       ` Paolo Bonzini
@ 2020-02-11 14:02         ` Xiaoyao Li
  2020-02-11 14:34           ` David Laight
  2020-02-27  0:11         ` Sean Christopherson
  1 sibling, 1 reply; 31+ messages in thread
From: Xiaoyao Li @ 2020-02-11 14:02 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, Sean Christopherson, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel, David Laight

On 2/11/2020 9:34 PM, Paolo Bonzini wrote:
> On 11/02/20 14:22, Thomas Gleixner wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>> On 03/02/20 16:16, Xiaoyao Li wrote:
>>>> A sane guest should never tigger emulation on a split-lock access, but
>>>> it cannot prevent malicous guest from doing this. So just emulating the
>>>> access as a write if it's a split-lock access to avoid malicous guest
>>>> polluting the kernel log.
>>>
>>> Saying that anything doing a split lock access is malicious makes little
>>> sense.
>>
>> Correct, but we also have to accept, that split lock access can be used
>> in a malicious way, aka. DoS.
> 
> Indeed, a more accurate emulation such as temporarily disabling
> split-lock detection in the emulator would allow the guest to use split
> lock access as a vehicle for DoS, but that's not what the commit message
> says.  If it were only about polluting the kernel log, there's
> printk_ratelimited for that.  (In fact, if we went for incorrect
> emulation as in this patch, a rate-limited pr_warn would be a good idea).
> 
> It is much more convincing to say that since this is pretty much a
> theoretical case, we can assume that it is only done with the purpose of
> DoS-ing the host or something like that, and therefore we kill the guest.

So you think there is no need to emulate this feature and return #AC to 
guest?
Anyway, I'm fine with killing the guest.

BTW, Can it really be used for DoS purpose by malicious guest? Since 
it's in kvm emulator so it needs vm-exit first and won't the die() in 
kernel handler kill KVM? (Actually I'm not clear about KVM after die())

>>> Split lock detection is essentially a debugging feature, there's a
>>> reason why the MSR is called "TEST_CTL".  So you don't want to make the
>>
>> The fact that it ended up in MSR_TEST_CTL does not say anything. That's
>> where they it ended up to be as it was hastily cobbled together for
>> whatever reason.
> 
> Or perhaps it was there all the time in test silicon or something like
> that...  That would be a very plausible reason for all the quirks behind it.

Alright, I don't know the history of TEST_CTRL, there is a bit 31 in it 
which means "Disable LOCK# assertion for split locked access" when set. 
Bit 31 exists for a long period, but linux seems not use it so I guess 
it may be a testing purpose bit.

However, when it comes to bit 29, split lock #AC, the main purpose is to 
prevent any split lock more than debugging.

BTW, I guess the reason putting it in MSR_TEST_CTRL is that it's related 
with split lock as bit 31.

> Paolo
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-11 14:02         ` Xiaoyao Li
@ 2020-02-11 14:34           ` David Laight
  0 siblings, 0 replies; 31+ messages in thread
From: David Laight @ 2020-02-11 14:34 UTC (permalink / raw)
  To: 'Xiaoyao Li',
	Paolo Bonzini, Thomas Gleixner, Sean Christopherson, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski
  Cc: x86, kvm, linux-kernel

From: Xiaoyao Li
> Sent: 11 February 2020 14:03
...
> Alright, I don't know the history of TEST_CTRL, there is a bit 31 in it
> which means "Disable LOCK# assertion for split locked access" when set.
> Bit 31 exists for a long period, but linux seems not use it so I guess
> it may be a testing purpose bit.

My brain remembers something about some system just ignoring
locked accesses for misaligned transfers.
Trouble is it was probably nearly 30 years ago and there are
no details coming out of 'long term storage'.

It might be that some systems I had either set this bit
or acted as if it was set.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-11 13:34       ` Paolo Bonzini
  2020-02-11 14:02         ` Xiaoyao Li
@ 2020-02-27  0:11         ` Sean Christopherson
  2020-03-12 11:42           ` Xiaoyao Li
  1 sibling, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2020-02-27  0:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Thomas Gleixner, Xiaoyao Li, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, x86, kvm, linux-kernel, David Laight

On Tue, Feb 11, 2020 at 02:34:18PM +0100, Paolo Bonzini wrote:
> On 11/02/20 14:22, Thomas Gleixner wrote:
> > Paolo Bonzini <pbonzini@redhat.com> writes:
> >> On 03/02/20 16:16, Xiaoyao Li wrote:
> >>> A sane guest should never tigger emulation on a split-lock access, but
> >>> it cannot prevent malicous guest from doing this. So just emulating the
> >>> access as a write if it's a split-lock access to avoid malicous guest
> >>> polluting the kernel log.
> >>
> >> Saying that anything doing a split lock access is malicious makes little
> >> sense.
> > 
> > Correct, but we also have to accept, that split lock access can be used
> > in a malicious way, aka. DoS.
> 
> Indeed, a more accurate emulation such as temporarily disabling
> split-lock detection in the emulator would allow the guest to use split
> lock access as a vehicle for DoS, but that's not what the commit message
> says.  If it were only about polluting the kernel log, there's
> printk_ratelimited for that.  (In fact, if we went for incorrect
> emulation as in this patch, a rate-limited pr_warn would be a good idea).
> 
> It is much more convincing to say that since this is pretty much a
> theoretical case, we can assume that it is only done with the purpose of
> DoS-ing the host or something like that, and therefore we kill the guest.

The problem with "kill the guest", and the reason I'd prefer to emulate the
split-lock as a write, is that killing the guest in this case is annoyingly
difficult.

Returning X86EMUL_UNHANDLEABLE / EMULATION_FAILED gets KVM to
handle_emulation_failure(), but handle_emulation_failure() will only "kill"
the guest if emulation failed in L1 CPL==0.  For all other modes, it will
inject a #UD and resume the guest.  KVM also injects a #UD for L1 CPL==0,
but that's the least annoying thing.

Adding a new emulation type isn't an option because this code can be
triggered through normal emulation.  A new return type could be added for
split-lock, but that's code I'd really not add, both from an Intel
perspective and a KVM maintenance perspective.  And, we'd still have the
conundrum of what to do if/when split-lock #AC is exposed to L1, e.g. in
that case, KVM should inject an #AC into L1, not kill the guest.  Again,
totally doable, but ugly and IMO an unnecessary maintenance burden.

I completely agree that poorly emulating the instruction from the (likely)
malicious guest is a hack, but it's a simple and easy to maintain hack.

> >> Split lock detection is essentially a debugging feature, there's a
> >> reason why the MSR is called "TEST_CTL".  So you don't want to make the
> > 
> > The fact that it ended up in MSR_TEST_CTL does not say anything. That's
> > where they it ended up to be as it was hastily cobbled together for
> > whatever reason.
> 
> Or perhaps it was there all the time in test silicon or something like
> that...  That would be a very plausible reason for all the quirks behind it.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-02-27  0:11         ` Sean Christopherson
@ 2020-03-12 11:42           ` Xiaoyao Li
  2020-03-12 15:00             ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Xiaoyao Li @ 2020-03-12 11:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	x86, kvm, linux-kernel, David Laight

On 2/27/2020 8:11 AM, Sean Christopherson wrote:
> On Tue, Feb 11, 2020 at 02:34:18PM +0100, Paolo Bonzini wrote:
>> On 11/02/20 14:22, Thomas Gleixner wrote:
>>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>>> On 03/02/20 16:16, Xiaoyao Li wrote:
>>>>> A sane guest should never tigger emulation on a split-lock access, but
>>>>> it cannot prevent malicous guest from doing this. So just emulating the
>>>>> access as a write if it's a split-lock access to avoid malicous guest
>>>>> polluting the kernel log.
>>>>
>>>> Saying that anything doing a split lock access is malicious makes little
>>>> sense.
>>>
>>> Correct, but we also have to accept, that split lock access can be used
>>> in a malicious way, aka. DoS.
>>
>> Indeed, a more accurate emulation such as temporarily disabling
>> split-lock detection in the emulator would allow the guest to use split
>> lock access as a vehicle for DoS, but that's not what the commit message
>> says.  If it were only about polluting the kernel log, there's
>> printk_ratelimited for that.  (In fact, if we went for incorrect
>> emulation as in this patch, a rate-limited pr_warn would be a good idea).
>>
>> It is much more convincing to say that since this is pretty much a
>> theoretical case, we can assume that it is only done with the purpose of
>> DoS-ing the host or something like that, and therefore we kill the guest.
> 
> The problem with "kill the guest", and the reason I'd prefer to emulate the
> split-lock as a write, is that killing the guest in this case is annoyingly
> difficult.
> 
> Returning X86EMUL_UNHANDLEABLE / EMULATION_FAILED gets KVM to
> handle_emulation_failure(), but handle_emulation_failure() will only "kill"
> the guest if emulation failed in L1 CPL==0.  For all other modes, it will
> inject a #UD and resume the guest.  KVM also injects a #UD for L1 CPL==0,
> but that's the least annoying thing.
> 
> Adding a new emulation type isn't an option because this code can be
> triggered through normal emulation.  A new return type could be added for
> split-lock, but that's code I'd really not add, both from an Intel
> perspective and a KVM maintenance perspective.  And, we'd still have the
> conundrum of what to do if/when split-lock #AC is exposed to L1, e.g. in
> that case, KVM should inject an #AC into L1, not kill the guest.  Again,
> totally doable, but ugly and IMO an unnecessary maintenance burden.
> 
> I completely agree that poorly emulating the instruction from the (likely)
> malicious guest is a hack, but it's a simple and easy to maintain hack.

Paolo,

What's your opinion about above?

>>>> Split lock detection is essentially a debugging feature, there's a
>>>> reason why the MSR is called "TEST_CTL".  So you don't want to make the
>>>
>>> The fact that it ended up in MSR_TEST_CTL does not say anything. That's
>>> where they it ended up to be as it was hastily cobbled together for
>>> whatever reason.
>>
>> Or perhaps it was there all the time in test silicon or something like
>> that...  That would be a very plausible reason for all the quirks behind it.
>>
>> Paolo
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write
  2020-03-12 11:42           ` Xiaoyao Li
@ 2020-03-12 15:00             ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2020-03-12 15:00 UTC (permalink / raw)
  To: Xiaoyao Li, Sean Christopherson
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	x86, kvm, linux-kernel, David Laight

On 12/03/20 12:42, Xiaoyao Li wrote:
>> I completely agree that poorly emulating the instruction from the 
>> (likely) malicious guest is a hack, but it's a simple and easy to
>> maintain hack.
> 
> What's your opinion about above?

That's okay, I suppose.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-03-12 15:00 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03 15:16 [PATCH v2 0/6] kvm/split_lock: Add feature split lock detection support in kvm Xiaoyao Li
2020-02-03 15:16 ` [PATCH v2 1/6] x86/split_lock: Add and export get_split_lock_detect_state() Xiaoyao Li
2020-02-03 21:45   ` Sean Christopherson
2020-02-03 15:16 ` [PATCH v2 2/6] x86/split_lock: Add and export split_lock_detect_set() Xiaoyao Li
2020-02-03 15:16 ` [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write Xiaoyao Li
2020-02-03 20:54   ` Sean Christopherson
2020-02-04  2:55     ` Xiaoyao Li
2020-02-11 12:20   ` Paolo Bonzini
2020-02-11 13:22     ` Thomas Gleixner
2020-02-11 13:34       ` Paolo Bonzini
2020-02-11 14:02         ` Xiaoyao Li
2020-02-11 14:34           ` David Laight
2020-02-27  0:11         ` Sean Christopherson
2020-03-12 11:42           ` Xiaoyao Li
2020-03-12 15:00             ` Paolo Bonzini
2020-02-03 15:16 ` [PATCH v2 4/6] kvm: vmx: Extend VMX's #AC handding for split lock in guest Xiaoyao Li
2020-02-03 21:14   ` Sean Christopherson
2020-02-04  6:46     ` Xiaoyao Li
2020-02-10 21:30       ` Sean Christopherson
2020-02-03 15:16 ` [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES Xiaoyao Li
2020-02-03 21:43   ` Sean Christopherson
2020-02-04  9:19     ` Xiaoyao Li
2020-02-04  9:37       ` Peter Zijlstra
2020-02-11  3:52         ` Andy Lutomirski
2020-02-11 12:38           ` Peter Zijlstra
2020-02-03 15:16 ` [PATCH v2 6/6] x86: vmx: virtualize split lock detection Xiaoyao Li
2020-02-03 15:58   ` Xiaoyao Li
2020-02-03 18:52   ` Andy Lutomirski
2020-02-03 21:42   ` Sean Christopherson
2020-02-04  2:52     ` Xiaoyao Li
2020-02-04  5:35       ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.