linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/4] KVM: Add virtualization support of split lock detection
@ 2020-04-14  6:31 Xiaoyao Li
  2020-04-14  6:31 ` [PATCH v8 1/4] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES Xiaoyao Li
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Xiaoyao Li @ 2020-04-14  6:31 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, Sean Christopherson, Thomas Gleixner
  Cc: linux-kernel, x86, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra, Arvind Sankar, Xiaoyao Li

This series aims to add the virtualization of split lock detection in
KVM.

Due to the fact MSR TEST_CTRL is per-core scope, feature split lock
detection can be exposed to guest only when SMT is disabled/unsupported.

Changes in v8:
 - rebase to v5.7-rc1.
 - basic enabling of split lock detection already merged.
 - When host is sld_warn and nosmt, load guest's sld bit when in KVM
   context, i.e., between vmx_prepare_switch_to_guest() and before
   vmx_prepare_switch_to_host(), KVM uses guest sld setting.  

Changes in v7:
https://lkml.kernel.org/r/20200325030924.132881-1-xiaoyao.li@intel.com
 - only pick patch 1 and patch 2, and hold all the left.
 - Update SLD bit on each processor based on sld_state.

Changes in v6:
https://lkml.kernel.org/r/20200324151859.31068-1-xiaoyao.li@intel.com
 - Drop the sld_not_exist flag and use X86_FEATURE_SPLIT_LOCK_DETECT to
   check whether need to init split lock detection. [tglx]
 - Use tglx's method to verify the existence of split lock detectoin.
 - small optimization of sld_update_msr() that the default value of
   msr_test_ctrl_cache has split_lock_detect bit cleared.
 - Drop the patch3 in v5 that introducing kvm_only option. [tglx]
 - Rebase patch4-8 to kvm/queue.
 - use the new kvm-cpu-cap to expose X86_FEATURE_CORE_CAPABILITIES in
   Patch 6.

Changes in v5:
https://lkml.kernel.org/r/20200315050517.127446-1-xiaoyao.li@intel.com
 - Use X86_FEATURE_SPLIT_LOCK_DETECT flag in kvm to ensure split lock
   detection is really supported.
 - Add and export sld related helper functions in their related usecase 
   kvm patches.

Xiaoyao Li (4):
  kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES
  kvm: vmx: Enable MSR TEST_CTRL for guest
  x86/split_lock: Export sld_update_msr() and sld_state
  kvm: vmx: virtualize split lock detection

 arch/x86/include/asm/cpu.h      | 12 +++++
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kernel/cpu/intel.c     | 13 ++----
 arch/x86/kvm/cpuid.c            |  3 +-
 arch/x86/kvm/vmx/vmx.c          | 83 ++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/vmx.h          |  2 +
 arch/x86/kvm/x86.c              | 35 +++++++++++++-
 7 files changed, 132 insertions(+), 17 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v8 1/4] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES
  2020-04-14  6:31 [PATCH v8 0/4] KVM: Add virtualization support of split lock detection Xiaoyao Li
@ 2020-04-14  6:31 ` Xiaoyao Li
  2020-04-14  6:31 ` [PATCH v8 2/4] kvm: vmx: Enable MSR TEST_CTRL for guest Xiaoyao Li
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Xiaoyao Li @ 2020-04-14  6:31 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, Sean Christopherson, Thomas Gleixner
  Cc: linux-kernel, x86, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra, Arvind Sankar, Xiaoyao Li

Emulate MSR_IA32_CORE_CAPABILITIES in software and unconditionally
advertise its support to userspace. Like MSR_IA32_ARCH_CAPABILITIES, it
is a feature-enumerating MSR and can be fully emulated regardless of
hardware support.

Note, support for individual features enumerated via CORE_CAPABILITIES,
e.g., split lock detection, will be added in future patches.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            |  3 ++-
 arch/x86/kvm/x86.c              | 22 ++++++++++++++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 42a2d0d3984a..30aee4dd3760 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -597,6 +597,7 @@ struct kvm_vcpu_arch {
 	u64 ia32_xss;
 	u64 microcode_version;
 	u64 arch_capabilities;
+	u64 core_capabilities;
 
 	/*
 	 * Paging state of the vcpu
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 901cd1fdecd9..3f9c09a34ed4 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -341,9 +341,10 @@ void kvm_set_cpu_caps(void)
 		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM)
 	);
 
-	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
+	/* Uconditionally advertise features that are emulated in software. */
 	kvm_cpu_cap_set(X86_FEATURE_TSC_ADJUST);
 	kvm_cpu_cap_set(X86_FEATURE_ARCH_CAPABILITIES);
+	kvm_cpu_cap_set(X86_FEATURE_CORE_CAPABILITIES);
 
 	if (boot_cpu_has(X86_FEATURE_IBPB) && boot_cpu_has(X86_FEATURE_IBRS))
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bf2ecafd027..adfd4d74ea53 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1248,6 +1248,7 @@ static const u32 emulated_msrs_all[] = {
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
 	MSR_IA32_ARCH_CAPABILITIES,
+	MSR_IA32_CORE_CAPS,
 	MSR_IA32_MISC_ENABLE,
 	MSR_IA32_MCG_STATUS,
 	MSR_IA32_MCG_CTL,
@@ -1314,6 +1315,7 @@ static const u32 msr_based_features_all[] = {
 	MSR_F10H_DECFG,
 	MSR_IA32_UCODE_REV,
 	MSR_IA32_ARCH_CAPABILITIES,
+	MSR_IA32_CORE_CAPS,
 };
 
 static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all)];
@@ -1367,12 +1369,20 @@ static u64 kvm_get_arch_capabilities(void)
 	return data;
 }
 
+static u64 kvm_get_core_capabilities(void)
+{
+	return 0;
+}
+
 static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
 	case MSR_IA32_ARCH_CAPABILITIES:
 		msr->data = kvm_get_arch_capabilities();
 		break;
+	case MSR_IA32_CORE_CAPS:
+		msr->data = kvm_get_core_capabilities();
+		break;
 	case MSR_IA32_UCODE_REV:
 		rdmsrl_safe(msr->index, &msr->data);
 		break;
@@ -2753,6 +2763,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		vcpu->arch.arch_capabilities = data;
 		break;
+	case MSR_IA32_CORE_CAPS:
+		if (!msr_info->host_initiated)
+			return 1;
+		vcpu->arch.core_capabilities = data;
+		break;
 	case MSR_EFER:
 		return set_efer(vcpu, msr_info);
 	case MSR_K7_HWCR:
@@ -3080,6 +3095,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		msr_info->data = vcpu->arch.arch_capabilities;
 		break;
+	case MSR_IA32_CORE_CAPS:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITIES))
+			return 1;
+		msr_info->data = vcpu->arch.core_capabilities;
+		break;
 	case MSR_IA32_POWER_CTL:
 		msr_info->data = vcpu->arch.msr_ia32_power_ctl;
 		break;
@@ -9378,6 +9399,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		goto free_guest_fpu;
 
 	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
+	vcpu->arch.core_capabilities = kvm_get_core_capabilities();
 	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
 	kvm_vcpu_mtrr_init(vcpu);
 	vcpu_load(vcpu);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 2/4] kvm: vmx: Enable MSR TEST_CTRL for guest
  2020-04-14  6:31 [PATCH v8 0/4] KVM: Add virtualization support of split lock detection Xiaoyao Li
  2020-04-14  6:31 ` [PATCH v8 1/4] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES Xiaoyao Li
@ 2020-04-14  6:31 ` Xiaoyao Li
  2020-04-14  6:31 ` [PATCH v8 3/4] x86/split_lock: Export sld_update_msr() and sld_state Xiaoyao Li
  2020-04-14  6:31 ` [PATCH v8 4/4] kvm: vmx: virtualize split lock detection Xiaoyao Li
  3 siblings, 0 replies; 15+ messages in thread
From: Xiaoyao Li @ 2020-04-14  6:31 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, Sean Christopherson, Thomas Gleixner
  Cc: linux-kernel, x86, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra, Arvind Sankar, Xiaoyao Li

Unconditionally allow the guest to read and zero-write MSR TEST_CTRL.

This matches the fact that most Intel CPUs support MSR TEST_CTRL, and
it also alleviates the effort to handle wrmsr/rdmsr when split lock
detection is exposed to the guest in a future patch.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 83050977490c..ae394ed174cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1789,6 +1789,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u32 index;
 
 	switch (msr_info->index) {
+	case MSR_TEST_CTRL:
+		msr_info->data = 0;
+		break;
 #ifdef CONFIG_X86_64
 	case MSR_FS_BASE:
 		msr_info->data = vmcs_readl(GUEST_FS_BASE);
@@ -1942,6 +1945,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u32 index;
 
 	switch (msr_index) {
+	case MSR_TEST_CTRL:
+		if (data)
+			return 1;
+
+		break;
 	case MSR_EFER:
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 3/4] x86/split_lock: Export sld_update_msr() and sld_state
  2020-04-14  6:31 [PATCH v8 0/4] KVM: Add virtualization support of split lock detection Xiaoyao Li
  2020-04-14  6:31 ` [PATCH v8 1/4] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES Xiaoyao Li
  2020-04-14  6:31 ` [PATCH v8 2/4] kvm: vmx: Enable MSR TEST_CTRL for guest Xiaoyao Li
@ 2020-04-14  6:31 ` Xiaoyao Li
  2020-04-14  6:31 ` [PATCH v8 4/4] kvm: vmx: virtualize split lock detection Xiaoyao Li
  3 siblings, 0 replies; 15+ messages in thread
From: Xiaoyao Li @ 2020-04-14  6:31 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, Sean Christopherson, Thomas Gleixner
  Cc: linux-kernel, x86, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra, Arvind Sankar, Xiaoyao Li

sld_update_msr() and sld_state will be used in KVM in future patch
to add virtualization support of split lock detection.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/include/asm/cpu.h  | 12 ++++++++++++
 arch/x86/kernel/cpu/intel.c | 13 +++++--------
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index dd17c2da1af5..6c6528b3153e 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,12 +40,23 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
 #ifdef CONFIG_CPU_SUP_INTEL
+extern enum split_lock_detect_state sld_state __ro_after_init;
+
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
+extern void sld_update_msr(bool on);
 #else
+#define sld_state sld_off
+
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
 static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
@@ -57,5 +68,6 @@ static inline bool handle_guest_split_lock(unsigned long ip)
 {
 	return false;
 }
+static inline void sld_update_msr(bool on) {}
 #endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bf08d4508ecb..80d1c0c93c08 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -34,18 +34,14 @@
 #include <asm/apic.h>
 #endif
 
-enum split_lock_detect_state {
-	sld_off = 0,
-	sld_warn,
-	sld_fatal,
-};
-
 /*
  * Default to sld_off because most systems do not support split lock detection
  * split_lock_setup() will switch this to sld_warn on systems that support
  * split lock detect, unless there is a command line override.
  */
-static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
+enum split_lock_detect_state sld_state __ro_after_init = sld_off;
+EXPORT_SYMBOL_GPL(sld_state);
+
 static u64 msr_test_ctrl_cache __ro_after_init;
 
 /*
@@ -1052,7 +1048,7 @@ static void __init split_lock_setup(void)
  * is not implemented as one thread could undo the setting of the other
  * thread immediately after dropping the lock anyway.
  */
-static void sld_update_msr(bool on)
+void sld_update_msr(bool on)
 {
 	u64 test_ctrl_val = msr_test_ctrl_cache;
 
@@ -1061,6 +1057,7 @@ static void sld_update_msr(bool on)
 
 	wrmsrl(MSR_TEST_CTRL, test_ctrl_val);
 }
+EXPORT_SYMBOL_GPL(sld_update_msr);
 
 static void split_lock_init(void)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-14  6:31 [PATCH v8 0/4] KVM: Add virtualization support of split lock detection Xiaoyao Li
                   ` (2 preceding siblings ...)
  2020-04-14  6:31 ` [PATCH v8 3/4] x86/split_lock: Export sld_update_msr() and sld_state Xiaoyao Li
@ 2020-04-14  6:31 ` Xiaoyao Li
  2020-04-15 17:43   ` Thomas Gleixner
  2020-04-15 19:47   ` Thomas Gleixner
  3 siblings, 2 replies; 15+ messages in thread
From: Xiaoyao Li @ 2020-04-14  6:31 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, Sean Christopherson, Thomas Gleixner
  Cc: linux-kernel, x86, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra, Arvind Sankar, Xiaoyao Li

Due to the fact that TEST_CTRL MSR is per-core scope, i.e., the sibling
threads in the same physical CPU core share the same MSR, only
advertising feature split lock detection to guest when SMT is disabled
or unsupported, for simplicitly.

1) When host sld_state is sld_off, feature split lock detection is
   unsupported/disabled. Cannot expose it to guest in this case.

2) When host sld_state is sld_warn, feature split lock detection can be
   exposed to guest if nosmt. Further, to avoid the potential
   MSR_TEST_CTRL.SLD toggling overhead during every vm-enter/-exit,
   loading guest's SLD setting when in KVM context.

3) When host sld_state is sld_fatal, feature split lock detection can also
   be exposed to guest if nosmt. But the feature is forced on for
   guest, i.e., the hardware MSR_TEST_CTRL.SLD bit is always set even if
   guest clears the SLD bit.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 79 +++++++++++++++++++++++++++++++++++++-----
 arch/x86/kvm/vmx/vmx.h |  2 ++
 arch/x86/kvm/x86.c     | 17 +++++++--
 3 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ae394ed174cd..2077abe4edf9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1120,6 +1120,35 @@ void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
 	}
 }
 
+/*
+ * Note: for guest, feature split lock detection can only be enumerated through
+ * MSR_IA32_CORE_CAPABILITIES bit. The FMS enumeration is unsupported.
+ */
+static inline bool guest_cpu_has_feature_sld(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.core_capabilities &
+	       MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+}
+
+static inline bool guest_cpu_sld_on(struct vcpu_vmx *vmx)
+{
+	return vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+}
+
+static inline void vmx_update_sld(struct kvm_vcpu *vcpu, bool on)
+{
+	/*
+	 * Toggle SLD if the guest wants it enabled but its been disabled for
+	 * the userspace VMM, and vice versa.  Note, TIF_SLD is true if SLD has
+	 * been turned off.  Yes, it's a terrible name.
+	 */
+	if (sld_state == sld_warn && guest_cpu_has_feature_sld(vcpu) &&
+	    on == test_thread_flag(TIF_SLD)) {
+		    sld_update_msr(on);
+		    update_thread_flag(TIF_SLD, !on);
+	}
+}
+
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -1188,6 +1217,10 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 #endif
 
 	vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
+
+	vmx->host_sld_on = !test_thread_flag(TIF_SLD);
+	vmx_update_sld(vcpu, guest_cpu_sld_on(vmx));
+
 	vmx->guest_state_loaded = true;
 }
 
@@ -1226,6 +1259,9 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
 #endif
 	load_fixmap_gdt(raw_smp_processor_id());
+
+	vmx_update_sld(&vmx->vcpu, vmx->host_sld_on);
+
 	vmx->guest_state_loaded = false;
 	vmx->guest_msrs_ready = false;
 }
@@ -1777,6 +1813,16 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 	}
 }
 
+static inline u64 vmx_msr_test_ctrl_valid_bits(struct kvm_vcpu *vcpu)
+{
+	u64 valid_bits = 0;
+
+	if (guest_cpu_has_feature_sld(vcpu))
+		valid_bits |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	return valid_bits;
+}
+
 /*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
@@ -1790,7 +1836,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 	switch (msr_info->index) {
 	case MSR_TEST_CTRL:
-		msr_info->data = 0;
+		msr_info->data = vmx->msr_test_ctrl;
 		break;
 #ifdef CONFIG_X86_64
 	case MSR_FS_BASE:
@@ -1946,9 +1992,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 	switch (msr_index) {
 	case MSR_TEST_CTRL:
-		if (data)
+		if (data & ~vmx_msr_test_ctrl_valid_bits(vcpu))
 			return 1;
 
+		vmx->msr_test_ctrl = data;
+
+		preempt_disable();
+		if (vmx->guest_state_loaded)
+			vmx_update_sld(vcpu, guest_cpu_sld_on(vmx));
+		preempt_enable();
 		break;
 	case MSR_EFER:
 		ret = kvm_set_msr_common(vcpu, msr_info);
@@ -4266,7 +4318,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vmx->rmode.vm86_active = 0;
 	vmx->spec_ctrl = 0;
-
+	vmx->msr_test_ctrl = 0;
 	vmx->msr_ia32_umwait_control = 0;
 
 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
@@ -4596,24 +4648,33 @@ static int handle_machine_check(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
+{
+	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
+	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
+}
+
 /*
  * If the host has split lock detection disabled, then #AC is
  * unconditionally injected into the guest, which is the pre split lock
  * detection behaviour.
  *
  * If the host has split lock detection enabled then #AC is
- * only injected into the guest when:
- *  - Guest CPL == 3 (user mode)
- *  - Guest has #AC detection enabled in CR0
- *  - Guest EFLAGS has AC bit set
+ * injected into the guest when:
+ * 1) guest has alignment check enabled;
+ * or 2) guest has split lock detection enabled;
  */
 static inline bool guest_inject_ac(struct kvm_vcpu *vcpu)
 {
 	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
 		return true;
 
-	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
-	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
+	/*
+	 * A split lock access must be an unaligned access, so we should check
+	 * guest_cpu_alignment_check_enabled() first.
+	 */
+	return guest_cpu_alignment_check_enabled(vcpu) ||
+	       guest_cpu_sld_on(to_vmx(vcpu));
 }
 
 static int handle_exception_nmi(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index aab9df55336e..b3c5be90b023 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -216,12 +216,14 @@ struct vcpu_vmx {
 	int                   nmsrs;
 	int                   save_nmsrs;
 	bool                  guest_msrs_ready;
+	bool                  host_sld_on;
 #ifdef CONFIG_X86_64
 	u64		      msr_host_kernel_gs_base;
 	u64		      msr_guest_kernel_gs_base;
 #endif
 
 	u64		      spec_ctrl;
+	u64                   msr_test_ctrl;
 	u32		      msr_ia32_umwait_control;
 
 	u32 secondary_exec_control;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index adfd4d74ea53..8c8f5ccfd98b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1189,7 +1189,7 @@ static const u32 msrs_to_save_all[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
-	MSR_IA32_SPEC_CTRL,
+	MSR_IA32_SPEC_CTRL, MSR_TEST_CTRL,
 	MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
 	MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
 	MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
@@ -1371,7 +1371,12 @@ static u64 kvm_get_arch_capabilities(void)
 
 static u64 kvm_get_core_capabilities(void)
 {
-	return 0;
+	u64 data = 0;
+
+	if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && !cpu_smt_possible())
+		data |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+
+	return data;
 }
 
 static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
@@ -2764,7 +2769,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.arch_capabilities = data;
 		break;
 	case MSR_IA32_CORE_CAPS:
-		if (!msr_info->host_initiated)
+		if (!msr_info->host_initiated ||
+		    data & ~kvm_get_core_capabilities())
 			return 1;
 		vcpu->arch.core_capabilities = data;
 		break;
@@ -5243,6 +5249,11 @@ static void kvm_init_msr_list(void)
 		 * to the guests in some cases.
 		 */
 		switch (msrs_to_save_all[i]) {
+		case MSR_TEST_CTRL:
+			if (!(kvm_get_core_capabilities() &
+			      MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT))
+				continue;
+			break;
 		case MSR_IA32_BNDCFGS:
 			if (!kvm_mpx_supported())
 				continue;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-14  6:31 ` [PATCH v8 4/4] kvm: vmx: virtualize split lock detection Xiaoyao Li
@ 2020-04-15 17:43   ` Thomas Gleixner
  2020-04-15 19:18     ` Sean Christopherson
  2020-04-15 19:47   ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2020-04-15 17:43 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, kvm, Sean Christopherson
  Cc: linux-kernel, x86, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra, Arvind Sankar, Xiaoyao Li

Xiaoyao Li <xiaoyao.li@intel.com> writes:
> +/*
> + * Note: for guest, feature split lock detection can only be enumerated through
> + * MSR_IA32_CORE_CAPABILITIES bit. The FMS enumeration is unsupported.

That comment is confusing at best.

> + */
> +static inline bool guest_cpu_has_feature_sld(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.core_capabilities &
> +	       MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
> +}
> +
> +static inline bool guest_cpu_sld_on(struct vcpu_vmx *vmx)
> +{
> +	return vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +}
> +
> +static inline void vmx_update_sld(struct kvm_vcpu *vcpu, bool on)
> +{
> +	/*
> +	 * Toggle SLD if the guest wants it enabled but its been disabled for
> +	 * the userspace VMM, and vice versa.  Note, TIF_SLD is true if SLD has
> +	 * been turned off.  Yes, it's a terrible name.

Instead of writing that useless blurb you could have written a patch
which changes TIF_SLD to TIF_SLD_OFF to make it clear.

> +	 */
> +	if (sld_state == sld_warn && guest_cpu_has_feature_sld(vcpu) &&
> +	    on == test_thread_flag(TIF_SLD)) {
> +		    sld_update_msr(on);
> +		    update_thread_flag(TIF_SLD, !on);

Of course you completely fail to explain why TIF_SLD needs to be fiddled
with.

> @@ -1188,6 +1217,10 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>  #endif
>
> 	vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
> +
> +	vmx->host_sld_on = !test_thread_flag(TIF_SLD);

This inverted storage is non-intuitive. What's wrong with simply
reflecting the TIF_SLD state?

> +	vmx_update_sld(vcpu, guest_cpu_sld_on(vmx));
> +
>	vmx->guest_state_loaded = true;
> }
>
> @@ -1226,6 +1259,9 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
> 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
>  #endif
> 	load_fixmap_gdt(raw_smp_processor_id());
> +
> +	vmx_update_sld(&vmx->vcpu, vmx->host_sld_on);
> +

vmx_prepare_switch_to_guest() is called via:

kvm_arch_vcpu_ioctl_run()
  vcpu_run()
    vcpu_enter_guest()
      preempt_disable();
      kvm_x86_ops.prepare_guest_switch(vcpu);

but vmx_prepare_switch_to_host() is invoked at the very end of:

kvm_arch_vcpu_ioctl_run()
  .....
  vcpu_run()
  .....
  vcpu_put()
    vmx_vcpu_put()
      vmx_prepare_switch_to_host();

That asymmetry does not make any sense without an explanation.

What's even worse is that vmx_prepare_switch_to_host() is invoked with
preemption enabled, so MSR state and TIF_SLD state can get out of sync
on preemption/migration.

> @@ -1946,9 +1992,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 
> 	switch (msr_index) {
> 	case MSR_TEST_CTRL:
> -		if (data)
> +		if (data & ~vmx_msr_test_ctrl_valid_bits(vcpu))
> 			return 1;
> 
> +		vmx->msr_test_ctrl = data;
> +
> +		preempt_disable();

This preempt_disable/enable() lacks explanation as well.

> +		if (vmx->guest_state_loaded)
> +			vmx_update_sld(vcpu, guest_cpu_sld_on(vmx));
> +		preempt_enable();

How is updating msr_test_ctrl valid if this is invoked from the IOCTL,
i.e. host_initiated == true?

That said, I also hate the fact that you export both the low level MSR
function _and_ the state variable. Having all these details including the
TIF mangling in the VMX code is just wrong.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-15 17:43   ` Thomas Gleixner
@ 2020-04-15 19:18     ` Sean Christopherson
  2020-04-15 21:22       ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2020-04-15 19:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Xiaoyao Li, Paolo Bonzini, kvm, linux-kernel, x86, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Peter Zijlstra, Arvind Sankar

On Wed, Apr 15, 2020 at 07:43:22PM +0200, Thomas Gleixner wrote:
> Xiaoyao Li <xiaoyao.li@intel.com> writes:
> > +/*
> > + * Note: for guest, feature split lock detection can only be enumerated through
> > + * MSR_IA32_CORE_CAPABILITIES bit. The FMS enumeration is unsupported.
> 
> That comment is confusing at best.
> 
> > + */
> > +static inline bool guest_cpu_has_feature_sld(struct kvm_vcpu *vcpu)
> > +{
> > +	return vcpu->arch.core_capabilities &
> > +	       MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
> > +}
> > +
> > +static inline bool guest_cpu_sld_on(struct vcpu_vmx *vmx)
> > +{
> > +	return vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> > +}
> > +
> > +static inline void vmx_update_sld(struct kvm_vcpu *vcpu, bool on)
> > +{
> > +	/*
> > +	 * Toggle SLD if the guest wants it enabled but its been disabled for
> > +	 * the userspace VMM, and vice versa.  Note, TIF_SLD is true if SLD has
> > +	 * been turned off.  Yes, it's a terrible name.
> 
> Instead of writing that useless blurb you could have written a patch
> which changes TIF_SLD to TIF_SLD_OFF to make it clear.

Hah, that's my comment, though I must admit I didn't fully intend for the
editorial at the end to get submitted upstream.

Anyways, I _did_ point out that TIF_SLD is a terrible name[1][2], and my
feedback got ignored/overlooked.  I'd be more than happy to write a patch,
I didn't do so because I assumed that people wanted TIF_SLD as the name for
whatever reason.

[1] https://lkml.kernel.org/r/20191122184457.GA31235@linux.intel.com
[2] https://lkml.kernel.org/r/20200115225724.GA18268@linux.intel.com

> > +	 */
> > +	if (sld_state == sld_warn && guest_cpu_has_feature_sld(vcpu) &&
> > +	    on == test_thread_flag(TIF_SLD)) {
> > +		    sld_update_msr(on);
> > +		    update_thread_flag(TIF_SLD, !on);
> 
> Of course you completely fail to explain why TIF_SLD needs to be fiddled
> with.

Ya, that comment should be something like:

	* Toggle SLD if the guest wants it enabled but its been disabled for
	* the userspace VMM, and vice versa, so that the flag and MSR state
	* are consistent, i.e. its handling during task switches naturally does
	* the right thing if KVM is preempted with guest state loaded.

> > @@ -1188,6 +1217,10 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> >  #endif
> >
> > 	vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
> > +
> > +	vmx->host_sld_on = !test_thread_flag(TIF_SLD);
> 
> This inverted storage is non-intuitive. What's wrong with simply
> reflecting the TIF_SLD state?

So that the guest/host tracking use the same polairy, and IMO it makes
the restoration code more intuitive, e.g.:

	vmx_update_sld(&vmx->vcpu, vmx->host_sld_on);
vs
	vmx_update_sld(&vmx->vcpu, !vmx->host_tif_sld);

I.e. the inversion needs to happen somewhere.

> > +	vmx_update_sld(vcpu, guest_cpu_sld_on(vmx));
> > +
> >	vmx->guest_state_loaded = true;
> > }
> >
> > @@ -1226,6 +1259,9 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
> > 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
> >  #endif
> > 	load_fixmap_gdt(raw_smp_processor_id());
> > +
> > +	vmx_update_sld(&vmx->vcpu, vmx->host_sld_on);
> > +
> 
> vmx_prepare_switch_to_guest() is called via:
> 
> kvm_arch_vcpu_ioctl_run()
>   vcpu_run()
>     vcpu_enter_guest()
>       preempt_disable();
>       kvm_x86_ops.prepare_guest_switch(vcpu);
> 
> but vmx_prepare_switch_to_host() is invoked at the very end of:
> 
> kvm_arch_vcpu_ioctl_run()
>   .....
>   vcpu_run()
>   .....
>   vcpu_put()
>     vmx_vcpu_put()
>       vmx_prepare_switch_to_host();
> 
> That asymmetry does not make any sense without an explanation.

Deferring the "switch to host" until the vCPU is put allows KVM to keep
certain guest state loaded when staying in the vCPU run loop, e.g.
MSR_KERNEL_GS_BASE can be exposed to the guest without having to save and
restore it on every VM-Enter/VM-Exit.

I agree that all of KVM's state save/load trickerly lacks documentation,
I'll put that on my todo list.
 
> What's even worse is that vmx_prepare_switch_to_host() is invoked with
> preemption enabled, so MSR state and TIF_SLD state can get out of sync
> on preemption/migration.

It shouldn't be (called with preempation enabled):

void vcpu_put(struct kvm_vcpu *vcpu)
{
	preempt_disable();
	kvm_arch_vcpu_put(vcpu); <-- leads to vmx_prepare_switch_to_host()
	preempt_notifier_unregister(&vcpu->preempt_notifier);
	__this_cpu_write(kvm_running_vcpu, NULL);
	preempt_enable();
}

> > @@ -1946,9 +1992,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > 
> > 	switch (msr_index) {
> > 	case MSR_TEST_CTRL:
> > -		if (data)
> > +		if (data & ~vmx_msr_test_ctrl_valid_bits(vcpu))
> > 			return 1;
> > 
> > +		vmx->msr_test_ctrl = data;
> > +
> > +		preempt_disable();
> 
> This preempt_disable/enable() lacks explanation as well.

Is an explanation still needed if it's made clear (somewhere) that
interacting with guest_state_loaded needs to be done with preemption
disabled?
 
> > +		if (vmx->guest_state_loaded)
> > +			vmx_update_sld(vcpu, guest_cpu_sld_on(vmx));
> > +		preempt_enable();
> 
> How is updating msr_test_ctrl valid if this is invoked from the IOCTL,
> i.e. host_initiated == true?

Not sure I understand the underlying question.  The host is always allowed
to manipulate guest state, including MSRs.

I'm pretty sure guest_state_loaded should always be false if host_initiated
is true, e.g. we could technically do a WARN on guest_state_loaded and
host_initiated, but the ioctl() is obviously not a hot path and nothing
will break if the assumption doesn't hold.

> That said, I also hate the fact that you export both the low level MSR
> function _and_ the state variable. Having all these details including the
> TIF mangling in the VMX code is just wrong.

I'm not a fan of exporting the low level state either, but IIRC trying to
hide the low level details while achieving the same resulting functionality
was even messier.

I don't see any way to avoid having KVM differentiate between sld_warn and
sld_fatal.  Even if KVM is able to virtualize SLD in sld_fatal mode, e.g.
by telling the guest it must not try to disable SLD, KVM would still need
to know the kernel is sld_fatal so that it can forward that information to
the guest.

It'd be possible to avoid mucking with TIF or exporting the MSR helper, but
that would require KVM to manually save/restore the MSR when KVM is
preempted with guest state loaded.  That probably wouldn't actually affect
performance for most use cases, but IMO it's not worth the extra headache
just to avoid exporting a helper.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-14  6:31 ` [PATCH v8 4/4] kvm: vmx: virtualize split lock detection Xiaoyao Li
  2020-04-15 17:43   ` Thomas Gleixner
@ 2020-04-15 19:47   ` Thomas Gleixner
  2020-04-16  2:12     ` Xiaoyao Li
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2020-04-15 19:47 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, kvm, Sean Christopherson
  Cc: linux-kernel, x86, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra, Arvind Sankar, Xiaoyao Li

Xiaoyao Li <xiaoyao.li@intel.com> writes:

> Due to the fact that TEST_CTRL MSR is per-core scope, i.e., the sibling
> threads in the same physical CPU core share the same MSR, only
> advertising feature split lock detection to guest when SMT is disabled
> or unsupported, for simplicitly.

That's not for simplicity. It's for correctness because you cannot
provide consistent state to a guest.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-15 19:18     ` Sean Christopherson
@ 2020-04-15 21:22       ` Thomas Gleixner
  2020-04-15 21:43         ` Sean Christopherson
  2020-04-16 12:34         ` Xiaoyao Li
  0 siblings, 2 replies; 15+ messages in thread
From: Thomas Gleixner @ 2020-04-15 21:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Paolo Bonzini, kvm, linux-kernel, x86, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Peter Zijlstra, Arvind Sankar

Sean,

Sean Christopherson <sean.j.christopherson@intel.com> writes:
> On Wed, Apr 15, 2020 at 07:43:22PM +0200, Thomas Gleixner wrote:
>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>> > +static inline void vmx_update_sld(struct kvm_vcpu *vcpu, bool on)
>> > +{
>> > +	/*
>> > +	 * Toggle SLD if the guest wants it enabled but its been disabled for
>> > +	 * the userspace VMM, and vice versa.  Note, TIF_SLD is true if SLD has
>> > +	 * been turned off.  Yes, it's a terrible name.
>> 
>> Instead of writing that useless blurb you could have written a patch
>> which changes TIF_SLD to TIF_SLD_OFF to make it clear.
>
> Hah, that's my comment, though I must admit I didn't fully intend for the
> editorial at the end to get submitted upstream.
>
> Anyways, I _did_ point out that TIF_SLD is a terrible name[1][2], and my
> feedback got ignored/overlooked.  I'd be more than happy to write a
> patch, I didn't do so because I assumed that people wanted TIF_SLD as the name for
> whatever reason.

I somehow missed that in the maze of mails regarding this stuff. I've
already written a patch to rename it to TIF_SLD_DISABLED which is pretty
self explaining. But see below.

>> > +	 */
>> > +	if (sld_state == sld_warn && guest_cpu_has_feature_sld(vcpu) &&
>> > +	    on == test_thread_flag(TIF_SLD)) {
>> > +		    sld_update_msr(on);
>> > +		    update_thread_flag(TIF_SLD, !on);
>> 
>> Of course you completely fail to explain why TIF_SLD needs to be fiddled
>> with.
>
> Ya, that comment should be something like:
>
> 	* Toggle SLD if the guest wants it enabled but its been disabled for
> 	* the userspace VMM, and vice versa, so that the flag and MSR state
> 	* are consistent, i.e. its handling during task switches naturally does
> 	* the right thing if KVM is preempted with guest state loaded.

Something to that effect.

>> > @@ -1188,6 +1217,10 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>> >  #endif
>> >
>> > 	vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
>> > +
>> > +	vmx->host_sld_on = !test_thread_flag(TIF_SLD);
>> 
>> This inverted storage is non-intuitive. What's wrong with simply
>> reflecting the TIF_SLD state?
>
> So that the guest/host tracking use the same polairy, and IMO it makes
> the restoration code more intuitive, e.g.:
>
> 	vmx_update_sld(&vmx->vcpu, vmx->host_sld_on);
> vs
> 	vmx_update_sld(&vmx->vcpu, !vmx->host_tif_sld);
>
> I.e. the inversion needs to happen somewhere.

Correct, but we can make it consistently use the 'disabled' convention.

I briefly thought about renaming the flag to TIF_SLD_ENABLED, set it by
default and update the 5 places where it is used. But that's
inconsistent as well simply because it does not make any sense to set
that flag when detection is not available or disabled on the command
line.

>> > @@ -1226,6 +1259,9 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
>> > 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
>> >  #endif
>> > 	load_fixmap_gdt(raw_smp_processor_id());
>> > +
>> > +	vmx_update_sld(&vmx->vcpu, vmx->host_sld_on);
>> > +
>> 
>> vmx_prepare_switch_to_guest() is called via:
>> 
>> kvm_arch_vcpu_ioctl_run()
>>   vcpu_run()
>>     vcpu_enter_guest()
>>       preempt_disable();
>>       kvm_x86_ops.prepare_guest_switch(vcpu);
>> 
>> but vmx_prepare_switch_to_host() is invoked at the very end of:
>> 
>> kvm_arch_vcpu_ioctl_run()
>>   .....
>>   vcpu_run()
>>   .....
>>   vcpu_put()
>>     vmx_vcpu_put()
>>       vmx_prepare_switch_to_host();
>> 
>> That asymmetry does not make any sense without an explanation.
>
> Deferring the "switch to host" until the vCPU is put allows KVM to keep
> certain guest state loaded when staying in the vCPU run loop, e.g.
> MSR_KERNEL_GS_BASE can be exposed to the guest without having to save and
> restore it on every VM-Enter/VM-Exit.

I know why this is done (after staring at the callchains for a while),
but 5 lines of explanation at least in the changelog would have saved my
time.

>> What's even worse is that vmx_prepare_switch_to_host() is invoked with
>> preemption enabled, so MSR state and TIF_SLD state can get out of sync
>> on preemption/migration.
>
> It shouldn't be (called with preempation enabled):
>
> void vcpu_put(struct kvm_vcpu *vcpu)
> {
> 	preempt_disable();
> 	kvm_arch_vcpu_put(vcpu); <-- leads to vmx_prepare_switch_to_host()
> 	preempt_notifier_unregister(&vcpu->preempt_notifier);
> 	__this_cpu_write(kvm_running_vcpu, NULL);
> 	preempt_enable();
> }

Ooops. How did I miss that?

>> > @@ -1946,9 +1992,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> > 
>> > 	switch (msr_index) {
>> > 	case MSR_TEST_CTRL:
>> > -		if (data)
>> > +		if (data & ~vmx_msr_test_ctrl_valid_bits(vcpu))
>> > 			return 1;
>> > 
>> > +		vmx->msr_test_ctrl = data;
>> > +
>> > +		preempt_disable();
>> 
>> This preempt_disable/enable() lacks explanation as well.
>
> Is an explanation still needed if it's made clear (somewhere) that
> interacting with guest_state_loaded needs to be done with preemption
> disabled?

Well, the thing is that finding that explanation somewhere is not always
trivial. Aside of that this really is the wrong place to do that. That
wants to be inside a function which is invoked from here and that
function should have the appropriate commentry
  
>> > +		if (vmx->guest_state_loaded)
>> > +			vmx_update_sld(vcpu, guest_cpu_sld_on(vmx));
>> > +		preempt_enable();
>> 
>> How is updating msr_test_ctrl valid if this is invoked from the IOCTL,
>> i.e. host_initiated == true?
>
> Not sure I understand the underlying question.  The host is always allowed
> to manipulate guest state, including MSRs.

Fair enough.

> I'm pretty sure guest_state_loaded should always be false if host_initiated
> is true, e.g. we could technically do a WARN on guest_state_loaded and
> host_initiated, but the ioctl() is obviously not a hot path and nothing
> will break if the assumption doesn't hold.

You'd create inconsistent state because the guest internal state cache
is not updated, but if you can updated it with !loaded then you can do
that anyway. Shrug.

>> That said, I also hate the fact that you export both the low level MSR
>> function _and_ the state variable. Having all these details including the
>> TIF mangling in the VMX code is just wrong.
>
> I'm not a fan of exporting the low level state either, but IIRC trying to
> hide the low level details while achieving the same resulting functionality
> was even messier.
>
> I don't see any way to avoid having KVM differentiate between sld_warn and
> sld_fatal.  Even if KVM is able to virtualize SLD in sld_fatal mode, e.g.
> by telling the guest it must not try to disable SLD, KVM would still need
> to know the kernel is sld_fatal so that it can forward that information to
> the guest.

Huch? There is absolutely zero code like that. The only place where
sld_state is used is:

+ static inline void vmx_update_sld(struct kvm_vcpu *vcpu, bool on)
+ {
+	if (sld_state == sld_warn && guest_cpu_has_feature_sld(vcpu) &&
+	    on == test_thread_flag(TIF_SLD)) {
+		    sld_update_msr(on);
+		    update_thread_flag(TIF_SLD, !on);
+	}

You might have some faint memories from the previous trainwrecks :)

The fatal mode emulation which is used in this patch set is simply that
the guest can 'write' to the MSR but it's not propagated to the real
MSR. It's just stored in the guest state. There is no way that you can
tell the guest that the MSR is there but fake.

The alternative solution is to prevent the exposure of SLD to the guest
in fatal mode. But that does not buy anything.

The detection is anyway incomplete. If the SLD #AC is raised in guest's
user mode and the guest has user #AC enabled then the exception is
injected into the guest unconditionally and independent of the host's
and guest's SLD state. That's entirely correct because a SLD #AC in user
mode is also a user mode alignment violation; it's not distinguishable.

You could of course analyse the offending instruction and check for a
lock prefix and a cache line overlap, but that still does not prevent
false positives. When the guest is non-malicious and has proper user #AC
handling in place then it would be wrong or at least very surprising to
kill it just because the detection code decided that it is a dangerous
split lock attempt.

So we can go with the proposed mode of allowing the write but not
propagating it. If the resulting split lock #AC originates from CPL != 3
then the guest will be killed with SIGBUS. If it originates from CPL ==
3 and the guest has user #AC disabled then it will be killed as well.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-15 21:22       ` Thomas Gleixner
@ 2020-04-15 21:43         ` Sean Christopherson
  2020-05-05  3:07           ` Sean Christopherson
  2020-04-16 12:34         ` Xiaoyao Li
  1 sibling, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2020-04-15 21:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Xiaoyao Li, Paolo Bonzini, kvm, linux-kernel, x86, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Peter Zijlstra, Arvind Sankar

On Wed, Apr 15, 2020 at 11:22:11PM +0200, Thomas Gleixner wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> > I don't see any way to avoid having KVM differentiate between sld_warn and
> > sld_fatal.  Even if KVM is able to virtualize SLD in sld_fatal mode, e.g.
> > by telling the guest it must not try to disable SLD, KVM would still need
> > to know the kernel is sld_fatal so that it can forward that information to
> > the guest.
> 
> Huch? There is absolutely zero code like that. The only place where
> sld_state is used is:
> 
> + static inline void vmx_update_sld(struct kvm_vcpu *vcpu, bool on)
> + {
> +	if (sld_state == sld_warn && guest_cpu_has_feature_sld(vcpu) &&
> +	    on == test_thread_flag(TIF_SLD)) {
> +		    sld_update_msr(on);
> +		    update_thread_flag(TIF_SLD, !on);
> +	}
> 
> You might have some faint memories from the previous trainwrecks :)

Yeah, I was thinking SLD was only being exposed if the host is sld_warn.
I'll work with Xiaoyao to figure out a cleaner interface for this code.

> The fatal mode emulation which is used in this patch set is simply that
> the guest can 'write' to the MSR but it's not propagated to the real
> MSR. It's just stored in the guest state. There is no way that you can
> tell the guest that the MSR is there but fake.
>
> The alternative solution is to prevent the exposure of SLD to the guest
> in fatal mode. But that does not buy anything.
> 
> The detection is anyway incomplete. If the SLD #AC is raised in guest's
> user mode and the guest has user #AC enabled then the exception is
> injected into the guest unconditionally and independent of the host's
> and guest's SLD state. That's entirely correct because a SLD #AC in user
> mode is also a user mode alignment violation; it's not distinguishable.
> 
> You could of course analyse the offending instruction and check for a
> lock prefix and a cache line overlap, but that still does not prevent
> false positives. When the guest is non-malicious and has proper user #AC
> handling in place then it would be wrong or at least very surprising to
> kill it just because the detection code decided that it is a dangerous
> split lock attempt.
> 
> So we can go with the proposed mode of allowing the write but not
> propagating it. If the resulting split lock #AC originates from CPL != 3
> then the guest will be killed with SIGBUS. If it originates from CPL ==
> 3 and the guest has user #AC disabled then it will be killed as well.

An idea that's been floated around to avoid killing the guest on a CPL==3
split-lock #AC is to add a STICKY bit to MSR_TEST_CTRL that KVM can
virtualize to tell the guest that attempting to disable SLD is futile,
e.g. so that the guest can kill its misbehaving userspace apps instead of
trying to disable SLD and getting killed by the host.

I've discussed it with a few folks internally and it sounds like getting
such a bit added to the SDM would be doable, even if Intel never ships
hardware that supports the bit.  The thought is that getting the STICKY
bit architecturally defined would give KVM/Linux leverage to persuade guest
kernels to add support for the bit.  But anything that touches the SDM
doesn't exactly happen quickly :-/.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-15 19:47   ` Thomas Gleixner
@ 2020-04-16  2:12     ` Xiaoyao Li
  0 siblings, 0 replies; 15+ messages in thread
From: Xiaoyao Li @ 2020-04-16  2:12 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini, kvm, Sean Christopherson
  Cc: linux-kernel, x86, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra, Arvind Sankar

On 4/16/2020 3:47 AM, Thomas Gleixner wrote:
> Xiaoyao Li <xiaoyao.li@intel.com> writes:
> 
>> Due to the fact that TEST_CTRL MSR is per-core scope, i.e., the sibling
>> threads in the same physical CPU core share the same MSR, only
>> advertising feature split lock detection to guest when SMT is disabled
>> or unsupported, for simplicitly.
> 
> That's not for simplicity. It's for correctness because you cannot
> provide consistent state to a guest.
> 

I'll correct it.

Thanks!
-Xiaoyao


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-15 21:22       ` Thomas Gleixner
  2020-04-15 21:43         ` Sean Christopherson
@ 2020-04-16 12:34         ` Xiaoyao Li
  2020-04-16 13:33           ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Xiaoyao Li @ 2020-04-16 12:34 UTC (permalink / raw)
  To: Thomas Gleixner, Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, x86, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Peter Zijlstra, Arvind Sankar

On 4/16/2020 5:22 AM, Thomas Gleixner wrote:
> Sean,
> 
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> On Wed, Apr 15, 2020 at 07:43:22PM +0200, Thomas Gleixner wrote:
>>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>>> +static inline void vmx_update_sld(struct kvm_vcpu *vcpu, bool on)
>>>> +{
>>>> +	/*
>>>> +	 * Toggle SLD if the guest wants it enabled but its been disabled for
>>>> +	 * the userspace VMM, and vice versa.  Note, TIF_SLD is true if SLD has
>>>> +	 * been turned off.  Yes, it's a terrible name.
>>>
>>> Instead of writing that useless blurb you could have written a patch
>>> which changes TIF_SLD to TIF_SLD_OFF to make it clear.
>>
>> Hah, that's my comment, though I must admit I didn't fully intend for the
>> editorial at the end to get submitted upstream.
>>
>> Anyways, I _did_ point out that TIF_SLD is a terrible name[1][2], and my
>> feedback got ignored/overlooked.  I'd be more than happy to write a
>> patch, I didn't do so because I assumed that people wanted TIF_SLD as the name for
>> whatever reason.
> 
> I somehow missed that in the maze of mails regarding this stuff. I've
> already written a patch to rename it to TIF_SLD_DISABLED which is pretty
> self explaining. But see below.
> 
[...]
> 
>>>> @@ -1188,6 +1217,10 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>>>>   #endif
>>>>
>>>> 	vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
>>>> +
>>>> +	vmx->host_sld_on = !test_thread_flag(TIF_SLD);
>>>
>>> This inverted storage is non-intuitive. What's wrong with simply
>>> reflecting the TIF_SLD state?
>>
>> So that the guest/host tracking use the same polairy, and IMO it makes
>> the restoration code more intuitive, e.g.:
>>
>> 	vmx_update_sld(&vmx->vcpu, vmx->host_sld_on);
>> vs
>> 	vmx_update_sld(&vmx->vcpu, !vmx->host_tif_sld);
>>
>> I.e. the inversion needs to happen somewhere.
> 
> Correct, but we can make it consistently use the 'disabled' convention.
> 
> I briefly thought about renaming the flag to TIF_SLD_ENABLED, set it by
> default and update the 5 places where it is used. But that's
> inconsistent as well simply because it does not make any sense to set
> that flag when detection is not available or disabled on the command
> line.
> 

Assuming you'll pick TIF_SLD_DISABLED, I guess we need to set this flag 
by default for the case SLD is no available or disabled on the command, 
for consistency?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-16 12:34         ` Xiaoyao Li
@ 2020-04-16 13:33           ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2020-04-16 13:33 UTC (permalink / raw)
  To: Xiaoyao Li, Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, x86, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Peter Zijlstra, Arvind Sankar

Xiaoyao Li <xiaoyao.li@intel.com> writes:
> On 4/16/2020 5:22 AM, Thomas Gleixner wrote:
>> I briefly thought about renaming the flag to TIF_SLD_ENABLED, set it by
>> default and update the 5 places where it is used. But that's
>> inconsistent as well simply because it does not make any sense to set
>> that flag when detection is not available or disabled on the command
>> line.
>> 
>
> Assuming you'll pick TIF_SLD_DISABLED, I guess we need to set this flag 
> by default for the case SLD is no available or disabled on the command, 
> for consistency?

No, because nothing cares if SLD is off. There is no way to make this
fully consistent under all circumstances.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-04-15 21:43         ` Sean Christopherson
@ 2020-05-05  3:07           ` Sean Christopherson
  2020-05-06 12:12             ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2020-05-05  3:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Xiaoyao Li, Paolo Bonzini, kvm, linux-kernel, x86, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Peter Zijlstra, Arvind Sankar

On Wed, Apr 15, 2020 at 02:43:18PM -0700, Sean Christopherson wrote:
> On Wed, Apr 15, 2020 at 11:22:11PM +0200, Thomas Gleixner wrote:
> > Sean Christopherson <sean.j.christopherson@intel.com> writes:
> > > I don't see any way to avoid having KVM differentiate between sld_warn and
> > > sld_fatal.  Even if KVM is able to virtualize SLD in sld_fatal mode, e.g.
> > > by telling the guest it must not try to disable SLD, KVM would still need
> > > to know the kernel is sld_fatal so that it can forward that information to
> > > the guest.
> > 
> > Huch? There is absolutely zero code like that. The only place where
> > sld_state is used is:
> > 
> > + static inline void vmx_update_sld(struct kvm_vcpu *vcpu, bool on)
> > + {
> > +	if (sld_state == sld_warn && guest_cpu_has_feature_sld(vcpu) &&
> > +	    on == test_thread_flag(TIF_SLD)) {
> > +		    sld_update_msr(on);
> > +		    update_thread_flag(TIF_SLD, !on);
> > +	}
> > 
> > You might have some faint memories from the previous trainwrecks :)
> 
> Yeah, I was thinking SLD was only being exposed if the host is sld_warn.
> I'll work with Xiaoyao to figure out a cleaner interface for this code.

...

> > So we can go with the proposed mode of allowing the write but not
> > propagating it. If the resulting split lock #AC originates from CPL != 3
> > then the guest will be killed with SIGBUS. If it originates from CPL ==
> > 3 and the guest has user #AC disabled then it will be killed as well.
> 
> An idea that's been floated around to avoid killing the guest on a CPL==3
> split-lock #AC is to add a STICKY bit to MSR_TEST_CTRL that KVM can
> virtualize to tell the guest that attempting to disable SLD is futile,
> e.g. so that the guest can kill its misbehaving userspace apps instead of
> trying to disable SLD and getting killed by the host.

Circling back to this.  KVM needs access to sld_state in one form or another
if we want to add a KVM hint when the host is in fatal mode.  Three options
I've come up with:

  1. Bite the bullet and export sld_state.  

  2. Add an is_split_fatal_wrapper().  Ugly since it needs to be non-inline
     to avoid triggering (1).

  3. Add a synthetic feature flag, e.g. X86_FEATURE_SLD_FATAL, and drop
     sld_state altogether.

I like (3) because it requires the least amount of code when all is said
and done, doesn't require more exports, and as a bonus it'd probably be nice
for userspace to see sld_fatal in /proc/cpuinfo.

Thoughts?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 4/4] kvm: vmx: virtualize split lock detection
  2020-05-05  3:07           ` Sean Christopherson
@ 2020-05-06 12:12             ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2020-05-06 12:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Paolo Bonzini, kvm, linux-kernel, x86, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Peter Zijlstra, Arvind Sankar

Sean Christopherson <sean.j.christopherson@intel.com> writes:
> On Wed, Apr 15, 2020 at 02:43:18PM -0700, Sean Christopherson wrote:
>> On Wed, Apr 15, 2020 at 11:22:11PM +0200, Thomas Gleixner wrote:
>> > So we can go with the proposed mode of allowing the write but not
>> > propagating it. If the resulting split lock #AC originates from CPL != 3
>> > then the guest will be killed with SIGBUS. If it originates from CPL ==
>> > 3 and the guest has user #AC disabled then it will be killed as well.
>> 
>> An idea that's been floated around to avoid killing the guest on a CPL==3
>> split-lock #AC is to add a STICKY bit to MSR_TEST_CTRL that KVM can
>> virtualize to tell the guest that attempting to disable SLD is futile,
>> e.g. so that the guest can kill its misbehaving userspace apps instead of
>> trying to disable SLD and getting killed by the host.
>
> Circling back to this.  KVM needs access to sld_state in one form or another
> if we want to add a KVM hint when the host is in fatal mode.  Three options
> I've come up with:
>
>   1. Bite the bullet and export sld_state.  
>
>   2. Add an is_split_fatal_wrapper().  Ugly since it needs to be non-inline
>      to avoid triggering (1).
>
>   3. Add a synthetic feature flag, e.g. X86_FEATURE_SLD_FATAL, and drop
>      sld_state altogether.
>
> I like (3) because it requires the least amount of code when all is said
> and done, doesn't require more exports, and as a bonus it'd probably be nice
> for userspace to see sld_fatal in /proc/cpuinfo.

#3 makes sense and is elegant.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-05-06 12:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-14  6:31 [PATCH v8 0/4] KVM: Add virtualization support of split lock detection Xiaoyao Li
2020-04-14  6:31 ` [PATCH v8 1/4] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES Xiaoyao Li
2020-04-14  6:31 ` [PATCH v8 2/4] kvm: vmx: Enable MSR TEST_CTRL for guest Xiaoyao Li
2020-04-14  6:31 ` [PATCH v8 3/4] x86/split_lock: Export sld_update_msr() and sld_state Xiaoyao Li
2020-04-14  6:31 ` [PATCH v8 4/4] kvm: vmx: virtualize split lock detection Xiaoyao Li
2020-04-15 17:43   ` Thomas Gleixner
2020-04-15 19:18     ` Sean Christopherson
2020-04-15 21:22       ` Thomas Gleixner
2020-04-15 21:43         ` Sean Christopherson
2020-05-05  3:07           ` Sean Christopherson
2020-05-06 12:12             ` Thomas Gleixner
2020-04-16 12:34         ` Xiaoyao Li
2020-04-16 13:33           ` Thomas Gleixner
2020-04-15 19:47   ` Thomas Gleixner
2020-04-16  2:12     ` Xiaoyao Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).