All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Add support for IBRS & IBPB KVM support.
@ 2018-01-12  1:32 Ashok Raj
  2018-01-12  1:32 ` [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl Ashok Raj
                   ` (4 more replies)
  0 siblings, 5 replies; 81+ messages in thread
From: Ashok Raj @ 2018-01-12  1:32 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH
  Cc: Ashok Raj, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

The following patches are based on v3 from Tim Chen

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1582043.html

This patch set supports exposing MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD
for user space.

Thomas is steam blowing v3 :-).. but I didn't want to keep holding this
much longer for the rebase to be complete in tip/x86/pti.

Ashok Raj (4):
  x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  x86/ibrs: Add new helper macros to save/restore MSR_IA32_SPEC_CTRL
  x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL
  x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier

Paolo Bonzini (1):
  x86/svm: Direct access to MSR_IA32_SPEC_CTRL

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  3 +++
 arch/x86/include/asm/spec_ctrl.h   | 29 +++++++++++++++++++++-
 arch/x86/kernel/cpu/spec_ctrl.c    | 19 ++++++++++++++
 arch/x86/kvm/cpuid.c               |  3 ++-
 arch/x86/kvm/svm.c                 | 51 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx.c                 | 51 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                 |  1 +
 8 files changed, 156 insertions(+), 2 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  1:32 [PATCH 0/5] Add support for IBRS & IBPB KVM support Ashok Raj
@ 2018-01-12  1:32 ` Ashok Raj
  2018-01-12  1:41   ` Andy Lutomirski
                     ` (2 more replies)
  2018-01-12  1:32 ` [PATCH 2/5] x86/ibrs: Add new helper macros to save/restore MSR_IA32_SPEC_CTRL Ashok Raj
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 81+ messages in thread
From: Ashok Raj @ 2018-01-12  1:32 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH
  Cc: Ashok Raj, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

- Remove including microcode.h, and use native macros from asm/msr.h
- added license header for spec_ctrl.c

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/spec_ctrl.h | 17 ++++++++++++++++-
 arch/x86/kernel/cpu/spec_ctrl.c  |  1 +
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
index 948959b..2dfa31b 100644
--- a/arch/x86/include/asm/spec_ctrl.h
+++ b/arch/x86/include/asm/spec_ctrl.h
@@ -3,12 +3,27 @@
 #ifndef _ASM_X86_SPEC_CTRL_H
 #define _ASM_X86_SPEC_CTRL_H
 
-#include <asm/microcode.h>
+#include <asm/processor.h>
+#include <asm/msr.h>
 
 void spec_ctrl_scan_feature(struct cpuinfo_x86 *c);
 void spec_ctrl_unprotected_begin(void);
 void spec_ctrl_unprotected_end(void);
 
+static inline u64 native_rdmsrl(unsigned int msr)
+{
+	u64 val;
+
+	val = __rdmsr(msr);
+
+	return val;
+}
+
+static inline void native_wrmsrl(unsigned int msr, u64 val)
+{
+	__wrmsr(msr, (u32) (val & 0xffffffffULL), (u32) (val >> 32));
+}
+
 static inline void __disable_indirect_speculation(void)
 {
 	native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_ENABLE_IBRS);
diff --git a/arch/x86/kernel/cpu/spec_ctrl.c b/arch/x86/kernel/cpu/spec_ctrl.c
index 843b4e6..9e9d013 100644
--- a/arch/x86/kernel/cpu/spec_ctrl.c
+++ b/arch/x86/kernel/cpu/spec_ctrl.c
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 #include <linux/string.h>
 
 #include <asm/spec_ctrl.h>
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 2/5] x86/ibrs: Add new helper macros to save/restore MSR_IA32_SPEC_CTRL
  2018-01-12  1:32 [PATCH 0/5] Add support for IBRS & IBPB KVM support Ashok Raj
  2018-01-12  1:32 ` [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl Ashok Raj
@ 2018-01-12  1:32 ` Ashok Raj
  2018-01-12  1:32 ` [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL Ashok Raj
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 81+ messages in thread
From: Ashok Raj @ 2018-01-12  1:32 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH
  Cc: Ashok Raj, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

Add some helper macros to save/restore MSR_IA32_SPEC_CTRL.

Although we could use the spec_ctrl_unprotected_begin/end macros they seem
be bit unreadable for some uses.

spec_ctrl_get - read MSR_IA32_SPEC_CTRL to save
spec_ctrl_set - write value restore MSR_IA32_SPEC_CTRL
spec_ctrl_restriction_off - same as spec_ctrl_unprotected_begin
spec_ctrl_restriction_on - same as spec_ctrl_unprotected_end

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/spec_ctrl.h | 12 ++++++++++++
 arch/x86/kernel/cpu/spec_ctrl.c  | 11 +++++++++++
 2 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
index 2dfa31b..926feb2 100644
--- a/arch/x86/include/asm/spec_ctrl.h
+++ b/arch/x86/include/asm/spec_ctrl.h
@@ -9,6 +9,10 @@
 void spec_ctrl_scan_feature(struct cpuinfo_x86 *c);
 void spec_ctrl_unprotected_begin(void);
 void spec_ctrl_unprotected_end(void);
+void spec_ctrl_set(u64 val);
+
+#define spec_ctrl_restriction_on	spec_ctrl_unprotected_end
+#define spec_ctrl_restriction_off	spec_ctrl_unprotected_begin
 
 static inline u64 native_rdmsrl(unsigned int msr)
 {
@@ -34,4 +38,12 @@ static inline void __enable_indirect_speculation(void)
 	native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_DISABLE_IBRS);
 }
 
+static inline u64 spec_ctrl_get(void)
+{
+	u64 val;
+
+	val = native_rdmsrl(MSR_IA32_SPEC_CTRL);
+
+	return val;
+}
 #endif /* _ASM_X86_SPEC_CTRL_H */
diff --git a/arch/x86/kernel/cpu/spec_ctrl.c b/arch/x86/kernel/cpu/spec_ctrl.c
index 9e9d013..02fc630 100644
--- a/arch/x86/kernel/cpu/spec_ctrl.c
+++ b/arch/x86/kernel/cpu/spec_ctrl.c
@@ -47,3 +47,14 @@ void spec_ctrl_unprotected_end(void)
 		__disable_indirect_speculation();
 }
 EXPORT_SYMBOL_GPL(spec_ctrl_unprotected_end);
+
+void spec_ctrl_set(u64 val)
+{
+	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
+		if (!val) {
+			spec_ctrl_restriction_off();
+		} else
+			spec_ctrl_restriction_on();
+	}
+}
+EXPORT_SYMBOL(spec_ctrl_set);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL
  2018-01-12  1:32 [PATCH 0/5] Add support for IBRS & IBPB KVM support Ashok Raj
  2018-01-12  1:32 ` [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl Ashok Raj
  2018-01-12  1:32 ` [PATCH 2/5] x86/ibrs: Add new helper macros to save/restore MSR_IA32_SPEC_CTRL Ashok Raj
@ 2018-01-12  1:32 ` Ashok Raj
  2018-01-12  1:58   ` Dave Hansen
  2018-01-12  1:32 ` [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL Ashok Raj
  2018-01-12  1:32 ` [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier Ashok Raj
  4 siblings, 1 reply; 81+ messages in thread
From: Ashok Raj @ 2018-01-12  1:32 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH
  Cc: Ashok Raj, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

Add direct access to MSR_IA32_SPEC_CTRL from a guest. Also save/restore
IBRS values during exits and guest resume path.

Rebasing based on Tim's patch

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/vmx.c   | 41 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..6fa81c7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
 /* These are scattered features in cpufeatures.h. */
 #define KVM_CPUID_BIT_AVX512_4VNNIW     2
 #define KVM_CPUID_BIT_AVX512_4FMAPS     3
+#define KVM_CPUID_BIT_SPEC_CTRL        26
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +393,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+		KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | KF(SPEC_CTRL);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 62ee436..1913896 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -50,6 +50,7 @@
 #include <asm/apic.h>
 #include <asm/irq_remapping.h>
 #include <asm/mmu_context.h>
+#include <asm/spec_ctrl.h>
 
 #include "trace.h"
 #include "pmu.h"
@@ -579,6 +580,7 @@ struct vcpu_vmx {
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
 	u32 secondary_exec_control;
+	u64 spec_ctrl;
 
 	/*
 	 * loaded_vmcs points to the VMCS currently used in this vcpu. For a
@@ -3259,6 +3261,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		msr_info->data = guest_read_tsc(vcpu);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		msr_info->data = to_vmx(vcpu)->spec_ctrl;
+		break;
 	case MSR_IA32_SYSENTER_CS:
 		msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
 		break;
@@ -3366,6 +3371,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr_info);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		to_vmx(vcpu)->spec_ctrl = msr_info->data;
+		break;
 	case MSR_IA32_CR_PAT:
 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -6790,6 +6798,13 @@ static __init int hardware_setup(void)
 		kvm_tsc_scaling_ratio_frac_bits = 48;
 	}
 
+	/*
+	 * If feature is available then setup MSR_IA32_SPEC_CTRL to be in
+	 * passthrough mode for the guest.
+	 */
+	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+		vmx_disable_intercept_for_msr(MSR_IA32_SPEC_CTRL, false);
+
 	vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
 	vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
 	vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
@@ -9242,6 +9257,15 @@ static void vmx_arm_hv_timer(struct kvm_vcpu *vcpu)
 	vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, delta_tsc);
 }
 
+static void save_guest_spec_ctrl(struct vcpu_vmx *vmx)
+{
+	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
+		vmx->spec_ctrl = spec_ctrl_get();
+		spec_ctrl_restriction_on();
+	} else
+		rmb();
+}
+
 static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -9298,6 +9322,21 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	vmx_arm_hv_timer(vcpu);
 
 	vmx->__launched = vmx->loaded_vmcs->launched;
+
+	/*
+	 * Just update whatever the value was set for the MSR in guest.
+	 * If this is unlaunched: Assume that initialized value is 0.
+	 * IRQ's also need to be disabled. If guest value is 0, an interrupt
+	 * could start running in unprotected mode (i.e with IBRS=0).
+	 */
+	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
+		/*
+		 * FIXME: lockdep_assert_irqs_disabled();
+		 */
+		WARN_ON_ONCE(!irqs_disabled());
+		spec_ctrl_set(vmx->spec_ctrl);
+	}
+
 	asm(
 		/* Store host registers */
 		"push %%" _ASM_DX "; push %%" _ASM_BP ";"
@@ -9403,6 +9442,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
 	      );
 
+	save_guest_spec_ctrl(vmx);
+
 	/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
 	if (debugctlmsr)
 		update_debugctlmsr(debugctlmsr);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb..9ffb9d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1006,6 +1006,7 @@ static u32 msrs_to_save[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+	MSR_IA32_SPEC_CTRL,
 };
 
 static unsigned num_msrs_to_save;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL
  2018-01-12  1:32 [PATCH 0/5] Add support for IBRS & IBPB KVM support Ashok Raj
                   ` (2 preceding siblings ...)
  2018-01-12  1:32 ` [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL Ashok Raj
@ 2018-01-12  1:32 ` Ashok Raj
  2018-01-12  7:23   ` David Woodhouse
                     ` (2 more replies)
  2018-01-12  1:32 ` [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier Ashok Raj
  4 siblings, 3 replies; 81+ messages in thread
From: Ashok Raj @ 2018-01-12  1:32 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH
  Cc: Paolo Bonzini, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Jun Nakajima, Asit Mallick, Ashok Raj

From: Paolo Bonzini <pbonzini@redhat.com>

Direct access to MSR_IA32_SPEC_CTRL is important
for performance.  Allow load/store of MSR_IA32_SPEC_CTRL, restore guest
IBRS on VM entry and set restore host values on VM exit.
it yet).

TBD: need to check msr's can be passed through even if feature is not
emuerated by the CPU.

[Ashok: Modified to reuse V3 spec-ctrl patches from Tim]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kvm/svm.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e68f0b..7c14471a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -183,6 +183,8 @@ struct vcpu_svm {
 		u64 gs_base;
 	} host;
 
+	u64 spec_ctrl;
+
 	u32 *msrpm;
 
 	ulong nmi_iret_rip;
@@ -248,6 +250,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_CSTAR,				.always = true  },
 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
 #endif
+	{ .index = MSR_IA32_SPEC_CTRL,          .always = true  },
 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
@@ -917,6 +920,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
 
 		set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
 	}
+
+	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+		set_msr_interception(msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
 }
 
 static void add_msr_offset(u32 offset)
@@ -3576,6 +3582,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_VM_CR:
 		msr_info->data = svm->nested.vm_cr_msr;
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		msr_info->data = svm->spec_ctrl;
+		break;
 	case MSR_IA32_UCODE_REV:
 		msr_info->data = 0x01000065;
 		break;
@@ -3724,6 +3733,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_VM_IGNNE:
 		vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		svm->spec_ctrl = data;
+		break;
 	case MSR_IA32_APICBASE:
 		if (kvm_vcpu_apicv_active(vcpu))
 			avic_update_vapic_bar(to_svm(vcpu), data);
@@ -4871,6 +4883,19 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
 	svm_complete_interrupts(svm);
 }
 
+
+/*
+ * Save guest value of spec_ctrl and also restore host value
+ */
+static void save_guest_spec_ctrl(struct vcpu_svm *svm)
+{
+	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
+		svm->spec_ctrl = spec_ctrl_get();
+		spec_ctrl_restriction_on();
+	} else
+		rmb();
+}
+
 static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4910,6 +4935,14 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	clgi();
 
+	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
+		/*
+		 * FIXME: lockdep_assert_irqs_disabled();
+		 */
+		WARN_ON_ONCE(!irqs_disabled());
+		spec_ctrl_set(svm->spec_ctrl);
+	}
+
 	local_irq_enable();
 
 	asm volatile (
@@ -4985,6 +5018,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
 		);
 
+	save_guest_spec_ctrl(svm);
+
 #ifdef CONFIG_X86_64
 	wrmsrl(MSR_GS_BASE, svm->host.gs_base);
 #else
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier
  2018-01-12  1:32 [PATCH 0/5] Add support for IBRS & IBPB KVM support Ashok Raj
                   ` (3 preceding siblings ...)
  2018-01-12  1:32 ` [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL Ashok Raj
@ 2018-01-12  1:32 ` Ashok Raj
  2018-01-12 10:08   ` Peter Zijlstra
                     ` (2 more replies)
  4 siblings, 3 replies; 81+ messages in thread
From: Ashok Raj @ 2018-01-12  1:32 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH
  Cc: Ashok Raj, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

cpuid ax=0x7, return rdx bit 26 to indicate presence of both
IA32_SPEC_CTRL(MSR 0x48) and IA32_PRED_CMD(MSR 0x49)

BIT0: Indirect Branch Prediction Barrier

When this MSR is written with IBPB=1 it ensures that earlier code's behavior
doesn't control later indirect branch predictions.

Note this MSR is only writable and does not carry any state. Its a barrier
so the code should perform a wrmsr when the barrier is needed.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  3 +++
 arch/x86/kernel/cpu/spec_ctrl.c    |  7 +++++++
 arch/x86/kvm/svm.c                 | 16 ++++++++++++++++
 arch/x86/kvm/vmx.c                 | 10 ++++++++++
 5 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 624b58e..52f37fc 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -213,6 +213,7 @@
 #define X86_FEATURE_MBA			( 7*32+18) /* Memory Bandwidth Allocation */
 #define X86_FEATURE_SPEC_CTRL		( 7*32+19) /* Speculation Control */
 #define X86_FEATURE_SPEC_CTRL_IBRS	( 7*32+20) /* Speculation Control, use IBRS */
+#define X86_FEATURE_PRED_CMD	( 7*32+21) /* Indirect Branch Prediction Barrier */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3e1cb18..1888e19 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -46,6 +46,9 @@
 #define SPEC_CTRL_DISABLE_IBRS		(0 << 0)
 #define SPEC_CTRL_ENABLE_IBRS		(1 << 0)
 
+#define MSR_IA32_PRED_CMD		0x00000049
+#define FEATURE_SET_IBPB		(1<<0)
+
 #define MSR_IA32_PERFCTR0		0x000000c1
 #define MSR_IA32_PERFCTR1		0x000000c2
 #define MSR_FSB_FREQ			0x000000cd
diff --git a/arch/x86/kernel/cpu/spec_ctrl.c b/arch/x86/kernel/cpu/spec_ctrl.c
index 02fc630..6cfec19 100644
--- a/arch/x86/kernel/cpu/spec_ctrl.c
+++ b/arch/x86/kernel/cpu/spec_ctrl.c
@@ -15,6 +15,13 @@ void spec_ctrl_scan_feature(struct cpuinfo_x86 *c)
 			if (!c->cpu_index)
 				static_branch_enable(&spec_ctrl_dynamic_ibrs);
 		}
+		/*
+		 * For Intel CPU's this MSR is shared the same cpuid
+		 * enumeration. When MSR_IA32_SPEC_CTRL is present
+		 * MSR_IA32_SPEC_CTRL is also available
+		 * TBD: AMD might have a separate enumeration for each.
+		 */
+		set_cpu_cap(c, X86_FEATURE_PRED_CMD);
 	}
 }
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7c14471a..36924c9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -251,6 +251,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
 #endif
 	{ .index = MSR_IA32_SPEC_CTRL,          .always = true  },
+	{ .index = MSR_IA32_PRED_CMD,           .always = false },
 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
@@ -531,6 +532,7 @@ struct svm_cpu_data {
 	struct kvm_ldttss_desc *tss_desc;
 
 	struct page *save_area;
+	struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -923,6 +925,8 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
 
 	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
 		set_msr_interception(msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+	if (boot_cpu_has(X86_FEATURE_PRED_CMD))
+		set_msr_interception(msrpm, MSR_IA32_PRED_CMD, 1, 1);
 }
 
 static void add_msr_offset(u32 offset)
@@ -1711,11 +1715,18 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
+    /* 
+     * The VMCB could be recycled, causing a false negative in svm_vcpu_load;
+     * block speculative execution.
+     */
+	if (boot_cpu_has(X86_FEATURE_PRED_CMD))
+        native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 	int i;
 
 	if (unlikely(cpu != vcpu->cpu)) {
@@ -1744,6 +1755,11 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
+	if (sd->current_vmcb != svm->vmcb) {
+		sd->current_vmcb = svm->vmcb;
+		if (boot_cpu_has(X86_FEATURE_PRED_CMD))
+			native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
+	}
 	avic_vcpu_load(vcpu, cpu);
 }
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1913896..caeb9ff 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2280,6 +2280,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
 		vmcs_load(vmx->loaded_vmcs->vmcs);
+		if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+			native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
 	}
 
 	if (!already_loaded) {
@@ -3837,6 +3839,12 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
 	free_vmcs(loaded_vmcs->vmcs);
 	loaded_vmcs->vmcs = NULL;
 	WARN_ON(loaded_vmcs->shadow_vmcs != NULL);
+    /*
+     * The VMCS could be recycled, causing a false negative in vmx_vcpu_load
+     * block speculative execution.
+     */
+	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+        native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
 }
 
 static void free_kvm_area(void)
@@ -6804,6 +6812,8 @@ static __init int hardware_setup(void)
 	 */
 	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
 		vmx_disable_intercept_for_msr(MSR_IA32_SPEC_CTRL, false);
+	if (boot_cpu_has(X86_FEATURE_PRED_CMD))
+		vmx_disable_intercept_for_msr(MSR_IA32_PRED_CMD, false);
 
 	vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
 	vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  1:32 ` [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl Ashok Raj
@ 2018-01-12  1:41   ` Andy Lutomirski
  2018-01-12  1:52     ` Raj, Ashok
  2018-01-12  7:54   ` Greg KH
  2018-01-12 12:28   ` Borislav Petkov
  2 siblings, 1 reply; 81+ messages in thread
From: Andy Lutomirski @ 2018-01-12  1:41 UTC (permalink / raw)
  To: Ashok Raj
  Cc: LKML, Thomas Gleixner, Tim Chen, Andy Lutomirski, Linus Torvalds,
	Greg KH, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 5:32 PM, Ashok Raj <ashok.raj@intel.com> wrote:
> - Remove including microcode.h, and use native macros from asm/msr.h
> - added license header for spec_ctrl.c
>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/include/asm/spec_ctrl.h | 17 ++++++++++++++++-
>  arch/x86/kernel/cpu/spec_ctrl.c  |  1 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
> index 948959b..2dfa31b 100644
> --- a/arch/x86/include/asm/spec_ctrl.h
> +++ b/arch/x86/include/asm/spec_ctrl.h
> @@ -3,12 +3,27 @@
>  #ifndef _ASM_X86_SPEC_CTRL_H
>  #define _ASM_X86_SPEC_CTRL_H
>
> -#include <asm/microcode.h>
> +#include <asm/processor.h>
> +#include <asm/msr.h>
>
>  void spec_ctrl_scan_feature(struct cpuinfo_x86 *c);
>  void spec_ctrl_unprotected_begin(void);
>  void spec_ctrl_unprotected_end(void);
>
> +static inline u64 native_rdmsrl(unsigned int msr)
> +{
> +       u64 val;
> +
> +       val = __rdmsr(msr);
> +
> +       return val;
> +}

What's wrong with native_read_msr()?

> +
> +static inline void native_wrmsrl(unsigned int msr, u64 val)
> +{
> +       __wrmsr(msr, (u32) (val & 0xffffffffULL), (u32) (val >> 32));
> +}

What's wrong with just wrmsrl()?  If you really need a native helper
like this, please add it to arch/x86/asm/msr.h.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  1:41   ` Andy Lutomirski
@ 2018-01-12  1:52     ` Raj, Ashok
  2018-01-12  2:20       ` Andy Lutomirski
  0 siblings, 1 reply; 81+ messages in thread
From: Raj, Ashok @ 2018-01-12  1:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Thomas Gleixner, Tim Chen, Linus Torvalds, Greg KH,
	Dave Hansen, Andrea Arcangeli, Andi Kleen, Arjan Van De Ven,
	David Woodhouse, Peter Zijlstra, Dan Williams, Paolo Bonzini,
	Jun Nakajima, Asit Mallick, ashok.raj

On Thu, Jan 11, 2018 at 05:41:34PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 11, 2018 at 5:32 PM, Ashok Raj <ashok.raj@intel.com> wrote:
> > - Remove including microcode.h, and use native macros from asm/msr.h
> > - added license header for spec_ctrl.c
> >
> > Signed-off-by: Ashok Raj <ashok.raj@intel.com>

[snip]
> > +static inline u64 native_rdmsrl(unsigned int msr)
> > +{
> > +       u64 val;
> > +
> > +       val = __rdmsr(msr);
> > +
> > +       return val;
> > +}
> 
> What's wrong with native_read_msr()?

Yes, i think i should have added to msr.h. The names didn't read as a
pair, one was native_read_msr, wrmsrl could be taken over when paravirt is
defined? 

> 
> > +
> > +static inline void native_wrmsrl(unsigned int msr, u64 val)
> > +{
> > +       __wrmsr(msr, (u32) (val & 0xffffffffULL), (u32) (val >> 32));
> > +}
> 
> What's wrong with just wrmsrl()?  If you really need a native helper
> like this, please add it to arch/x86/asm/msr.h.

I should have done that.. will move these to msr.h instead.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL
  2018-01-12  1:32 ` [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL Ashok Raj
@ 2018-01-12  1:58   ` Dave Hansen
  2018-01-12  3:14     ` Raj, Ashok
  2018-01-12  9:51     ` Peter Zijlstra
  0 siblings, 2 replies; 81+ messages in thread
From: Dave Hansen @ 2018-01-12  1:58 UTC (permalink / raw)
  To: Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH
  Cc: Andrea Arcangeli, Andi Kleen, Arjan Van De Ven, David Woodhouse,
	Peter Zijlstra, Dan Williams, Paolo Bonzini, Jun Nakajima,
	Asit Mallick

On 01/11/2018 05:32 PM, Ashok Raj wrote:
> +static void save_guest_spec_ctrl(struct vcpu_vmx *vmx)
> +{
> +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> +		vmx->spec_ctrl = spec_ctrl_get();
> +		spec_ctrl_restriction_on();
> +	} else
> +		rmb();
> +}

Does this need to be "ifence()"?  Better yet, do we just need to create
a helper for boot_cpu_has(X86_FEATURE_SPEC_CTRL) that does the barrier?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  1:52     ` Raj, Ashok
@ 2018-01-12  2:20       ` Andy Lutomirski
  2018-01-12  3:01         ` Raj, Ashok
  0 siblings, 1 reply; 81+ messages in thread
From: Andy Lutomirski @ 2018-01-12  2:20 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Andy Lutomirski, LKML, Thomas Gleixner, Tim Chen, Linus Torvalds,
	Greg KH, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 5:52 PM, Raj, Ashok <ashok.raj@intel.com> wrote:
> On Thu, Jan 11, 2018 at 05:41:34PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 11, 2018 at 5:32 PM, Ashok Raj <ashok.raj@intel.com> wrote:
>> > - Remove including microcode.h, and use native macros from asm/msr.h
>> > - added license header for spec_ctrl.c
>> >
>> > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
>
> [snip]
>> > +static inline u64 native_rdmsrl(unsigned int msr)
>> > +{
>> > +       u64 val;
>> > +
>> > +       val = __rdmsr(msr);
>> > +
>> > +       return val;
>> > +}
>>
>> What's wrong with native_read_msr()?
>
> Yes, i think i should have added to msr.h. The names didn't read as a
> pair, one was native_read_msr, wrmsrl could be taken over when paravirt is
> defined?

Why do you need to override paravirt?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  2:20       ` Andy Lutomirski
@ 2018-01-12  3:01         ` Raj, Ashok
  2018-01-12  5:03           ` Dave Hansen
  2018-01-13  6:19           ` Andy Lutomirski
  0 siblings, 2 replies; 81+ messages in thread
From: Raj, Ashok @ 2018-01-12  3:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Thomas Gleixner, Tim Chen, Linus Torvalds, Greg KH,
	Dave Hansen, Andrea Arcangeli, Andi Kleen, Arjan Van De Ven,
	David Woodhouse, Peter Zijlstra, Dan Williams, Paolo Bonzini,
	Jun Nakajima, Asit Mallick, ashok.raj

On Thu, Jan 11, 2018 at 06:20:13PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 11, 2018 at 5:52 PM, Raj, Ashok <ashok.raj@intel.com> wrote:
> >>
> >> What's wrong with native_read_msr()?
> >
> > Yes, i think i should have added to msr.h. The names didn't read as a
> > pair, one was native_read_msr, wrmsrl could be taken over when paravirt is
> > defined?
> 
> Why do you need to override paravirt?

The idea was since these MSR's are passed through we shouldn't need to 
handle them any differently. Also its best to do this as soon as possible
and avoid longer paths to get this barrier to hardware.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL
  2018-01-12  1:58   ` Dave Hansen
@ 2018-01-12  3:14     ` Raj, Ashok
  2018-01-12  9:51     ` Peter Zijlstra
  1 sibling, 0 replies; 81+ messages in thread
From: Raj, Ashok @ 2018-01-12  3:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick, Ashok Raj

On Thu, Jan 11, 2018 at 05:58:11PM -0800, Dave Hansen wrote:
> On 01/11/2018 05:32 PM, Ashok Raj wrote:
> > +static void save_guest_spec_ctrl(struct vcpu_vmx *vmx)
> > +{
> > +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> > +		vmx->spec_ctrl = spec_ctrl_get();
> > +		spec_ctrl_restriction_on();
> > +	} else
> > +		rmb();
> > +}
> 
> Does this need to be "ifence()"?  Better yet, do we just need to create
> a helper for boot_cpu_has(X86_FEATURE_SPEC_CTRL) that does the barrier?

Yes... Didn't keep track of ifence() evolution :-)..

We could do a helper, will look into other uses and see we can make find a 
common way to comprehend usages like above.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  3:01         ` Raj, Ashok
@ 2018-01-12  5:03           ` Dave Hansen
  2018-01-12 16:28             ` Josh Poimboeuf
                               ` (2 more replies)
  2018-01-13  6:19           ` Andy Lutomirski
  1 sibling, 3 replies; 81+ messages in thread
From: Dave Hansen @ 2018-01-12  5:03 UTC (permalink / raw)
  To: Raj, Ashok, Andy Lutomirski
  Cc: LKML, Thomas Gleixner, Tim Chen, Linus Torvalds, Greg KH,
	Andrea Arcangeli, Andi Kleen, Arjan Van De Ven, David Woodhouse,
	Peter Zijlstra, Dan Williams, Paolo Bonzini, Jun Nakajima,
	Asit Mallick

On 01/11/2018 07:01 PM, Raj, Ashok wrote:
> On Thu, Jan 11, 2018 at 06:20:13PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 11, 2018 at 5:52 PM, Raj, Ashok <ashok.raj@intel.com> wrote:
>>>>
>>>> What's wrong with native_read_msr()?
>>>
>>> Yes, i think i should have added to msr.h. The names didn't read as a
>>> pair, one was native_read_msr, wrmsrl could be taken over when paravirt is
>>> defined?
>>
>> Why do you need to override paravirt?
> 
> The idea was since these MSR's are passed through we shouldn't need to 
> handle them any differently. Also its best to do this as soon as possible
> and avoid longer paths to get this barrier to hardware.

We were also worried about the indirect calls that are part of the
paravirt interfaces when retpolines are not in place.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL
  2018-01-12  1:32 ` [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL Ashok Raj
@ 2018-01-12  7:23   ` David Woodhouse
  2018-01-12  9:58     ` Peter Zijlstra
  2018-01-12 12:38   ` Paolo Bonzini
  2018-01-12 15:14   ` Tom Lendacky
  2 siblings, 1 reply; 81+ messages in thread
From: David Woodhouse @ 2018-01-12  7:23 UTC (permalink / raw)
  To: Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH
  Cc: Paolo Bonzini, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, Peter Zijlstra, Dan Williams, Jun Nakajima,
	Asit Mallick

[-- Attachment #1: Type: text/plain, Size: 925 bytes --]

On Thu, 2018-01-11 at 17:32 -0800, Ashok Raj wrote:
> 
> @@ -4910,6 +4935,14 @@ static void svm_vcpu_run(struct kvm_vcpu
> *vcpu)
>  
>         clgi();
>  
> +       if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> +               /*
> +                * FIXME: lockdep_assert_irqs_disabled();
> +                */
> +               WARN_ON_ONCE(!irqs_disabled());
> +               spec_ctrl_set(svm->spec_ctrl);
> +       }
> +
>         local_irq_enable();
>  

Same comments here as we've had previously. If you do this without an
'else lfence' then you need a comment showing that you've proved it's
safe.

And I don't think even using static_cpu_has() is good enough. We don't
already "rely" on that for anything but optimisations, AFAICT. Turning
a missed GCC optimisation into a security hole is not a good idea.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  1:32 ` [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl Ashok Raj
  2018-01-12  1:41   ` Andy Lutomirski
@ 2018-01-12  7:54   ` Greg KH
  2018-01-12 12:28   ` Borislav Petkov
  2 siblings, 0 replies; 81+ messages in thread
From: Greg KH @ 2018-01-12  7:54 UTC (permalink / raw)
  To: Ashok Raj
  Cc: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 05:32:15PM -0800, Ashok Raj wrote:
> - Remove including microcode.h, and use native macros from asm/msr.h
> - added license header for spec_ctrl.c

Worst changlog ever :(

Why are you touching spec_ctrl.c in this patch?  How does it belong here
in this series?

Come on, you know better than this...

greg k-h

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL
  2018-01-12  1:58   ` Dave Hansen
  2018-01-12  3:14     ` Raj, Ashok
@ 2018-01-12  9:51     ` Peter Zijlstra
  2018-01-12 10:09       ` David Woodhouse
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2018-01-12  9:51 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, David Woodhouse, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 05:58:11PM -0800, Dave Hansen wrote:
> On 01/11/2018 05:32 PM, Ashok Raj wrote:
> > +static void save_guest_spec_ctrl(struct vcpu_vmx *vmx)
> > +{
> > +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> > +		vmx->spec_ctrl = spec_ctrl_get();
> > +		spec_ctrl_restriction_on();
> > +	} else
> > +		rmb();
> > +}
> 
> Does this need to be "ifence()"?  Better yet, do we just need to create
> a helper for boot_cpu_has(X86_FEATURE_SPEC_CTRL) that does the barrier?

static_cpu_has() + hard asm-goto requirement. Please drop all the above
nonsense on the floor hard.

Let backporters sort out whever they need, don't introduce crap like
that upstream.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL
  2018-01-12  7:23   ` David Woodhouse
@ 2018-01-12  9:58     ` Peter Zijlstra
  2018-01-12 10:13       ` David Woodhouse
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2018-01-12  9:58 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH, Paolo Bonzini,
	Dave Hansen, Andrea Arcangeli, Andi Kleen, Arjan Van De Ven,
	Dan Williams, Jun Nakajima, Asit Mallick

On Fri, Jan 12, 2018 at 07:23:53AM +0000, David Woodhouse wrote:
> On Thu, 2018-01-11 at 17:32 -0800, Ashok Raj wrote:
> > 
> > @@ -4910,6 +4935,14 @@ static void svm_vcpu_run(struct kvm_vcpu
> > *vcpu)
> >  
> >         clgi();
> >  
> > +       if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> > +               /*
> > +                * FIXME: lockdep_assert_irqs_disabled();
> > +                */
> > +               WARN_ON_ONCE(!irqs_disabled());
> > +               spec_ctrl_set(svm->spec_ctrl);
> > +       }
> > +
> >         local_irq_enable();
> >  
> 
> Same comments here as we've had previously. If you do this without an
> 'else lfence' then you need a comment showing that you've proved it's
> safe.
> 
> And I don't think even using static_cpu_has() is good enough. We don't
> already "rely" on that for anything but optimisations, AFAICT. Turning
> a missed GCC optimisation into a security hole is not a good idea.

I disagree, and if you worry about that, we should write a testcase. But
we rely on GCC for correct code generation in lots of places, this isn't
different.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier
  2018-01-12  1:32 ` [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier Ashok Raj
@ 2018-01-12 10:08   ` Peter Zijlstra
  2018-01-12 12:32   ` Borislav Petkov
  2018-01-12 15:31   ` Tom Lendacky
  2 siblings, 0 replies; 81+ messages in thread
From: Peter Zijlstra @ 2018-01-12 10:08 UTC (permalink / raw)
  To: Ashok Raj
  Cc: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH, Dave Hansen, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, David Woodhouse, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 05:32:19PM -0800, Ashok Raj wrote:
> @@ -1711,11 +1715,18 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>  	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
>  	kvm_vcpu_uninit(vcpu);
>  	kmem_cache_free(kvm_vcpu_cache, svm);
> +    /* 
> +     * The VMCB could be recycled, causing a false negative in svm_vcpu_load;
> +     * block speculative execution.
> +     */
> +	if (boot_cpu_has(X86_FEATURE_PRED_CMD))
> +        native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>  }

> @@ -3837,6 +3839,12 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
>  	free_vmcs(loaded_vmcs->vmcs);
>  	loaded_vmcs->vmcs = NULL;
>  	WARN_ON(loaded_vmcs->shadow_vmcs != NULL);
> +    /*
> +     * The VMCS could be recycled, causing a false negative in vmx_vcpu_load
> +     * block speculative execution.
> +     */
> +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> +        native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>  }

Whitespace damage.

Also, why not introduce a helper like:

static inline flush_ibpb(void)
{
	if (static_cpu_has(X86_FEATURE_SPEC_CTRL))
		native_write_msr(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
}

?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL
  2018-01-12  9:51     ` Peter Zijlstra
@ 2018-01-12 10:09       ` David Woodhouse
  2018-01-15 13:45         ` Peter Zijlstra
  0 siblings, 1 reply; 81+ messages in thread
From: David Woodhouse @ 2018-01-12 10:09 UTC (permalink / raw)
  To: Peter Zijlstra, Dave Hansen
  Cc: Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, Dan Williams, Paolo Bonzini,
	Jun Nakajima, Asit Mallick

[-- Attachment #1: Type: text/plain, Size: 2021 bytes --]

On Fri, 2018-01-12 at 10:51 +0100, Peter Zijlstra wrote:
> On Thu, Jan 11, 2018 at 05:58:11PM -0800, Dave Hansen wrote:
> > On 01/11/2018 05:32 PM, Ashok Raj wrote:
> > > +static void save_guest_spec_ctrl(struct vcpu_vmx *vmx)
> > > +{
> > > +   if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> > > +           vmx->spec_ctrl = spec_ctrl_get();
> > > +           spec_ctrl_restriction_on();
> > > +   } else
> > > +           rmb();
> > > +}
> > 
> > Does this need to be "ifence()"?  Better yet, do we just need to create
> > a helper for boot_cpu_has(X86_FEATURE_SPEC_CTRL) that does the barrier?
> 
> static_cpu_has() + hard asm-goto requirement. Please drop all the above
> nonsense on the floor hard.

Peter, NO!

static_cpu_has() + asm-goto is NOT SUFFICIENT.

It's still *possible* for a missed optimisation in GCC to still leave
us with a conditional branch around the wrmsr, letting the CPU
speculate around it too.

Come on, Peter, we've learned this lesson long and hard since the 1990s
when every GCC update would break some stupid¹ shit we did. Don't
regress. WE DO NOT RELY ON UNGUARANTEED BEHAVIOUR OF THE COMPILER.

Devise a sanity check which will force a build-fail if GCC ever misses
the optimisation, and that's fine. (Or point me to an existing way that
it's guaranteed to fail, if such is true already. Don't just ignore the
objection.)

Do not just ASSUME that GCC will always and forever manage to make that
optimisation in every case.

-- 
dwmw2


¹ In our defence, back then there was often no *other* way to do some
  of the things we did, except the "stupid" way. These days, it's much
  easier to add features to GCC and elicit guarantees. Although I'm
  getting rather pissed off at them bikeshedding the retpoline patches
  *so* hard they can't even agree on a command-line option and the name
  of the thunk symbol, regardless of the internal implementation
  details we don't give a shit about.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL
  2018-01-12  9:58     ` Peter Zijlstra
@ 2018-01-12 10:13       ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2018-01-12 10:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH, Paolo Bonzini,
	Dave Hansen, Andrea Arcangeli, Andi Kleen, Arjan Van De Ven,
	Dan Williams, Jun Nakajima, Asit Mallick

[-- Attachment #1: Type: text/plain, Size: 314 bytes --]

On Fri, 2018-01-12 at 10:58 +0100, Peter Zijlstra wrote:
> I disagree, and if you worry about that, we should write a testcase. But
> we rely on GCC for correct code generation in lots of places, this isn't
> different.

It's different because it's not a *correctness* issue... unless we let
you make it one.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  1:32 ` [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl Ashok Raj
  2018-01-12  1:41   ` Andy Lutomirski
  2018-01-12  7:54   ` Greg KH
@ 2018-01-12 12:28   ` Borislav Petkov
  2 siblings, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2018-01-12 12:28 UTC (permalink / raw)
  To: Ashok Raj
  Cc: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH, Dave Hansen, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, David Woodhouse, Peter Zijlstra,
	Dan Williams, Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 05:32:15PM -0800, Ashok Raj wrote:
> - Remove including microcode.h, and use native macros from asm/msr.h
> - added license header for spec_ctrl.c
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/include/asm/spec_ctrl.h | 17 ++++++++++++++++-
>  arch/x86/kernel/cpu/spec_ctrl.c  |  1 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
> index 948959b..2dfa31b 100644
> --- a/arch/x86/include/asm/spec_ctrl.h
> +++ b/arch/x86/include/asm/spec_ctrl.h
> @@ -3,12 +3,27 @@
>  #ifndef _ASM_X86_SPEC_CTRL_H
>  #define _ASM_X86_SPEC_CTRL_H
>  
> -#include <asm/microcode.h>
> +#include <asm/processor.h>
> +#include <asm/msr.h>
>  
>  void spec_ctrl_scan_feature(struct cpuinfo_x86 *c);
>  void spec_ctrl_unprotected_begin(void);
>  void spec_ctrl_unprotected_end(void);
>  
> +static inline u64 native_rdmsrl(unsigned int msr)
> +{
> +	u64 val;
> +
> +	val = __rdmsr(msr);
> +
> +	return val;
> +}

There's no need to add new ones but simply lift the ones from
microcode.h and use them.

No duplication of *MSR functions pls.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier
  2018-01-12  1:32 ` [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier Ashok Raj
  2018-01-12 10:08   ` Peter Zijlstra
@ 2018-01-12 12:32   ` Borislav Petkov
  2018-01-12 12:39     ` Woodhouse, David
  2018-01-12 15:31   ` Tom Lendacky
  2 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2018-01-12 12:32 UTC (permalink / raw)
  To: Ashok Raj
  Cc: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH, Dave Hansen, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, David Woodhouse, Peter Zijlstra,
	Dan Williams, Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 05:32:19PM -0800, Ashok Raj wrote:
> cpuid ax=0x7, return rdx bit 26 to indicate presence of both
> IA32_SPEC_CTRL(MSR 0x48) and IA32_PRED_CMD(MSR 0x49)

So why do we need two X86_FEATURE flags then?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL
  2018-01-12  1:32 ` [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL Ashok Raj
  2018-01-12  7:23   ` David Woodhouse
@ 2018-01-12 12:38   ` Paolo Bonzini
  2018-01-12 15:14   ` Tom Lendacky
  2 siblings, 0 replies; 81+ messages in thread
From: Paolo Bonzini @ 2018-01-12 12:38 UTC (permalink / raw)
  To: Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH
  Cc: Dave Hansen, Andrea Arcangeli, Andi Kleen, Arjan Van De Ven,
	David Woodhouse, Peter Zijlstra, Dan Williams, Jun Nakajima,
	Asit Mallick

On 12/01/2018 02:32, Ashok Raj wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> Direct access to MSR_IA32_SPEC_CTRL is important
> for performance.  Allow load/store of MSR_IA32_SPEC_CTRL, restore guest
> IBRS on VM entry and set restore host values on VM exit.
> it yet).
> 
> TBD: need to check msr's can be passed through even if feature is not
> emuerated by the CPU.
> 
> [Ashok: Modified to reuse V3 spec-ctrl patches from Tim]
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/kvm/svm.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)

If you want to do this, please include the vmx.c part as well...

Paolo

> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 0e68f0b..7c14471a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -183,6 +183,8 @@ struct vcpu_svm {
>  		u64 gs_base;
>  	} host;
>  
> +	u64 spec_ctrl;
> +
>  	u32 *msrpm;
>  
>  	ulong nmi_iret_rip;
> @@ -248,6 +250,7 @@ static const struct svm_direct_access_msrs {
>  	{ .index = MSR_CSTAR,				.always = true  },
>  	{ .index = MSR_SYSCALL_MASK,			.always = true  },
>  #endif
> +	{ .index = MSR_IA32_SPEC_CTRL,          .always = true  },
>  	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
>  	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
>  	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
> @@ -917,6 +920,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
>  
>  		set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
>  	}
> +
> +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> +		set_msr_interception(msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
>  }
>  
>  static void add_msr_offset(u32 offset)
> @@ -3576,6 +3582,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_VM_CR:
>  		msr_info->data = svm->nested.vm_cr_msr;
>  		break;
> +	case MSR_IA32_SPEC_CTRL:
> +		msr_info->data = svm->spec_ctrl;
> +		break;
>  	case MSR_IA32_UCODE_REV:
>  		msr_info->data = 0x01000065;
>  		break;
> @@ -3724,6 +3733,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  	case MSR_VM_IGNNE:
>  		vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
>  		break;
> +	case MSR_IA32_SPEC_CTRL:
> +		svm->spec_ctrl = data;
> +		break;
>  	case MSR_IA32_APICBASE:
>  		if (kvm_vcpu_apicv_active(vcpu))
>  			avic_update_vapic_bar(to_svm(vcpu), data);
> @@ -4871,6 +4883,19 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
>  	svm_complete_interrupts(svm);
>  }
>  
> +
> +/*
> + * Save guest value of spec_ctrl and also restore host value
> + */
> +static void save_guest_spec_ctrl(struct vcpu_svm *svm)
> +{
> +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> +		svm->spec_ctrl = spec_ctrl_get();
> +		spec_ctrl_restriction_on();
> +	} else
> +		rmb();
> +}
> +
>  static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4910,6 +4935,14 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	clgi();
>  
> +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> +		/*
> +		 * FIXME: lockdep_assert_irqs_disabled();
> +		 */
> +		WARN_ON_ONCE(!irqs_disabled());
> +		spec_ctrl_set(svm->spec_ctrl);
> +	}
> +
>  	local_irq_enable();
>  
>  	asm volatile (
> @@ -4985,6 +5018,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  #endif
>  		);
>  
> +	save_guest_spec_ctrl(svm);
> +
>  #ifdef CONFIG_X86_64
>  	wrmsrl(MSR_GS_BASE, svm->host.gs_base);
>  #else
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier
  2018-01-12 12:32   ` Borislav Petkov
@ 2018-01-12 12:39     ` Woodhouse, David
  2018-01-12 15:21       ` Tom Lendacky
  0 siblings, 1 reply; 81+ messages in thread
From: Woodhouse, David @ 2018-01-12 12:39 UTC (permalink / raw)
  To: Borislav Petkov, Ashok Raj
  Cc: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH, Dave Hansen, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

[-- Attachment #1: Type: text/plain, Size: 356 bytes --]

On Fri, 2018-01-12 at 13:32 +0100, Borislav Petkov wrote:
> On Thu, Jan 11, 2018 at 05:32:19PM -0800, Ashok Raj wrote:
> > cpuid ax=0x7, return rdx bit 26 to indicate presence of both
> > IA32_SPEC_CTRL(MSR 0x48) and IA32_PRED_CMD(MSR 0x49)
> 
> So why do we need two X86_FEATURE flags then?

AMD has only the latter and enumerates them differently.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL
  2018-01-12  1:32 ` [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL Ashok Raj
  2018-01-12  7:23   ` David Woodhouse
  2018-01-12 12:38   ` Paolo Bonzini
@ 2018-01-12 15:14   ` Tom Lendacky
  2 siblings, 0 replies; 81+ messages in thread
From: Tom Lendacky @ 2018-01-12 15:14 UTC (permalink / raw)
  To: Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH
  Cc: Paolo Bonzini, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Jun Nakajima, Asit Mallick

On 1/11/2018 7:32 PM, Ashok Raj wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> Direct access to MSR_IA32_SPEC_CTRL is important
> for performance.  Allow load/store of MSR_IA32_SPEC_CTRL, restore guest
> IBRS on VM entry and set restore host values on VM exit.
> it yet).
> 
> TBD: need to check msr's can be passed through even if feature is not
> emuerated by the CPU.
> 
> [Ashok: Modified to reuse V3 spec-ctrl patches from Tim]
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/kvm/svm.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 0e68f0b..7c14471a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -183,6 +183,8 @@ struct vcpu_svm {
>  		u64 gs_base;
>  	} host;
>  
> +	u64 spec_ctrl;
> +
>  	u32 *msrpm;
>  
>  	ulong nmi_iret_rip;
> @@ -248,6 +250,7 @@ static const struct svm_direct_access_msrs {
>  	{ .index = MSR_CSTAR,				.always = true  },
>  	{ .index = MSR_SYSCALL_MASK,			.always = true  },
>  #endif
> +	{ .index = MSR_IA32_SPEC_CTRL,          .always = true  },
>  	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
>  	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
>  	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
> @@ -917,6 +920,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
>  
>  		set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
>  	}
> +
> +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> +		set_msr_interception(msrpm, MSR_IA32_SPEC_CTRL, 1, 1);

This isn't needed.  The entry in the direct_access_msrs will do this in
the loop above.

Thanks,
Tom

>  }
>  
>  static void add_msr_offset(u32 offset)
> @@ -3576,6 +3582,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_VM_CR:
>  		msr_info->data = svm->nested.vm_cr_msr;
>  		break;
> +	case MSR_IA32_SPEC_CTRL:
> +		msr_info->data = svm->spec_ctrl;
> +		break;
>  	case MSR_IA32_UCODE_REV:
>  		msr_info->data = 0x01000065;
>  		break;
> @@ -3724,6 +3733,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  	case MSR_VM_IGNNE:
>  		vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
>  		break;
> +	case MSR_IA32_SPEC_CTRL:
> +		svm->spec_ctrl = data;
> +		break;
>  	case MSR_IA32_APICBASE:
>  		if (kvm_vcpu_apicv_active(vcpu))
>  			avic_update_vapic_bar(to_svm(vcpu), data);
> @@ -4871,6 +4883,19 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
>  	svm_complete_interrupts(svm);
>  }
>  
> +
> +/*
> + * Save guest value of spec_ctrl and also restore host value
> + */
> +static void save_guest_spec_ctrl(struct vcpu_svm *svm)
> +{
> +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> +		svm->spec_ctrl = spec_ctrl_get();
> +		spec_ctrl_restriction_on();
> +	} else
> +		rmb();
> +}
> +
>  static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4910,6 +4935,14 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	clgi();
>  
> +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> +		/*
> +		 * FIXME: lockdep_assert_irqs_disabled();
> +		 */
> +		WARN_ON_ONCE(!irqs_disabled());
> +		spec_ctrl_set(svm->spec_ctrl);
> +	}
> +
>  	local_irq_enable();
>  
>  	asm volatile (
> @@ -4985,6 +5018,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  #endif
>  		);
>  
> +	save_guest_spec_ctrl(svm);
> +
>  #ifdef CONFIG_X86_64
>  	wrmsrl(MSR_GS_BASE, svm->host.gs_base);
>  #else
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier
  2018-01-12 12:39     ` Woodhouse, David
@ 2018-01-12 15:21       ` Tom Lendacky
  0 siblings, 0 replies; 81+ messages in thread
From: Tom Lendacky @ 2018-01-12 15:21 UTC (permalink / raw)
  To: Woodhouse, David, Borislav Petkov, Ashok Raj
  Cc: linux-kernel, Thomas Gleixner, Tim Chen, Andy Lutomirski,
	Linus Torvalds, Greg KH, Dave Hansen, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

On 1/12/2018 6:39 AM, Woodhouse, David wrote:
> On Fri, 2018-01-12 at 13:32 +0100, Borislav Petkov wrote:
>> On Thu, Jan 11, 2018 at 05:32:19PM -0800, Ashok Raj wrote:
>>> cpuid ax=0x7, return rdx bit 26 to indicate presence of both
>>> IA32_SPEC_CTRL(MSR 0x48) and IA32_PRED_CMD(MSR 0x49)
>>
>> So why do we need two X86_FEATURE flags then?
> 
> AMD has only the latter and enumerates them differently.

Correct.  Both 0x48 and 0x49 are tied to the same cpuid bit.  AMD has
a separate cpuid bit for 0x49 (IBPB) alone.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier
  2018-01-12  1:32 ` [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier Ashok Raj
  2018-01-12 10:08   ` Peter Zijlstra
  2018-01-12 12:32   ` Borislav Petkov
@ 2018-01-12 15:31   ` Tom Lendacky
  2018-01-12 15:36     ` Woodhouse, David
  2 siblings, 1 reply; 81+ messages in thread
From: Tom Lendacky @ 2018-01-12 15:31 UTC (permalink / raw)
  To: Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH
  Cc: Dave Hansen, Andrea Arcangeli, Andi Kleen, Arjan Van De Ven,
	David Woodhouse, Peter Zijlstra, Dan Williams, Paolo Bonzini,
	Jun Nakajima, Asit Mallick

On 1/11/2018 7:32 PM, Ashok Raj wrote:
> cpuid ax=0x7, return rdx bit 26 to indicate presence of both
> IA32_SPEC_CTRL(MSR 0x48) and IA32_PRED_CMD(MSR 0x49)
> 
> BIT0: Indirect Branch Prediction Barrier
> 
> When this MSR is written with IBPB=1 it ensures that earlier code's behavior
> doesn't control later indirect branch predictions.
> 
> Note this MSR is only writable and does not carry any state. Its a barrier
> so the code should perform a wrmsr when the barrier is needed.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/include/asm/cpufeatures.h |  1 +
>  arch/x86/include/asm/msr-index.h   |  3 +++
>  arch/x86/kernel/cpu/spec_ctrl.c    |  7 +++++++
>  arch/x86/kvm/svm.c                 | 16 ++++++++++++++++
>  arch/x86/kvm/vmx.c                 | 10 ++++++++++
>  5 files changed, 37 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 624b58e..52f37fc 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -213,6 +213,7 @@
>  #define X86_FEATURE_MBA			( 7*32+18) /* Memory Bandwidth Allocation */
>  #define X86_FEATURE_SPEC_CTRL		( 7*32+19) /* Speculation Control */
>  #define X86_FEATURE_SPEC_CTRL_IBRS	( 7*32+20) /* Speculation Control, use IBRS */
> +#define X86_FEATURE_PRED_CMD	( 7*32+21) /* Indirect Branch Prediction Barrier */
>  
>  /* Virtualization flags: Linux defined, word 8 */
>  #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 3e1cb18..1888e19 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -46,6 +46,9 @@
>  #define SPEC_CTRL_DISABLE_IBRS		(0 << 0)
>  #define SPEC_CTRL_ENABLE_IBRS		(1 << 0)
>  
> +#define MSR_IA32_PRED_CMD		0x00000049
> +#define FEATURE_SET_IBPB		(1<<0)
> +
>  #define MSR_IA32_PERFCTR0		0x000000c1
>  #define MSR_IA32_PERFCTR1		0x000000c2
>  #define MSR_FSB_FREQ			0x000000cd
> diff --git a/arch/x86/kernel/cpu/spec_ctrl.c b/arch/x86/kernel/cpu/spec_ctrl.c
> index 02fc630..6cfec19 100644
> --- a/arch/x86/kernel/cpu/spec_ctrl.c
> +++ b/arch/x86/kernel/cpu/spec_ctrl.c
> @@ -15,6 +15,13 @@ void spec_ctrl_scan_feature(struct cpuinfo_x86 *c)
>  			if (!c->cpu_index)
>  				static_branch_enable(&spec_ctrl_dynamic_ibrs);
>  		}
> +		/*
> +		 * For Intel CPU's this MSR is shared the same cpuid
> +		 * enumeration. When MSR_IA32_SPEC_CTRL is present
> +		 * MSR_IA32_SPEC_CTRL is also available
> +		 * TBD: AMD might have a separate enumeration for each.

AMD will follow the specification that if cpuid ax=0x7, return rdx[26]
is set, it will indicate both MSR registers and features are supported.

But AMD also has a separate bit for IBPB (X86_FEATURE_PRED_CMD) alone.
As all of the IBRS/IBPB stuff happens, that patch will follow.

> +		 */
> +		set_cpu_cap(c, X86_FEATURE_PRED_CMD);>  	}
>  }
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 7c14471a..36924c9 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -251,6 +251,7 @@ static const struct svm_direct_access_msrs {
>  	{ .index = MSR_SYSCALL_MASK,			.always = true  },
>  #endif
>  	{ .index = MSR_IA32_SPEC_CTRL,          .always = true  },
> +	{ .index = MSR_IA32_PRED_CMD,           .always = false },

This should be .always = true

>  	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
>  	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
>  	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
> @@ -531,6 +532,7 @@ struct svm_cpu_data {
>  	struct kvm_ldttss_desc *tss_desc;
>  
>  	struct page *save_area;
> +	struct vmcb *current_vmcb;
>  };
>  
>  static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> @@ -923,6 +925,8 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
>  
>  	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
>  		set_msr_interception(msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
> +	if (boot_cpu_has(X86_FEATURE_PRED_CMD))
> +		set_msr_interception(msrpm, MSR_IA32_PRED_CMD, 1, 1);

Similar to the comment about SPEC_CTRL, this should be removed as it will
be covered by the loop.

>  }
>  
>  static void add_msr_offset(u32 offset)
> @@ -1711,11 +1715,18 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>  	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
>  	kvm_vcpu_uninit(vcpu);
>  	kmem_cache_free(kvm_vcpu_cache, svm);
> +    /* 
> +     * The VMCB could be recycled, causing a false negative in svm_vcpu_load;
> +     * block speculative execution.
> +     */
> +	if (boot_cpu_has(X86_FEATURE_PRED_CMD))
> +        native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>  }
>  
>  static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>  	int i;
>  
>  	if (unlikely(cpu != vcpu->cpu)) {
> @@ -1744,6 +1755,11 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (static_cpu_has(X86_FEATURE_RDTSCP))
>  		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>  
> +	if (sd->current_vmcb != svm->vmcb) {
> +		sd->current_vmcb = svm->vmcb;
> +		if (boot_cpu_has(X86_FEATURE_PRED_CMD))
> +			native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
> +	}
>  	avic_vcpu_load(vcpu, cpu);
>  }
>  
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1913896..caeb9ff 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2280,6 +2280,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
>  		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
>  		vmcs_load(vmx->loaded_vmcs->vmcs);
> +		if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))

This should probably use X86_FEATURE_PRED_CMD.

> +			native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);>  	}
>  
>  	if (!already_loaded) {
> @@ -3837,6 +3839,12 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
>  	free_vmcs(loaded_vmcs->vmcs);
>  	loaded_vmcs->vmcs = NULL;
>  	WARN_ON(loaded_vmcs->shadow_vmcs != NULL);
> +    /*
> +     * The VMCS could be recycled, causing a false negative in vmx_vcpu_load
> +     * block speculative execution.
> +     */
> +	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))

Again, X86_FEATURE_PRED_CMD.

> +        native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>  }
>  
>  static void free_kvm_area(void)
> @@ -6804,6 +6812,8 @@ static __init int hardware_setup(void)
>  	 */
>  	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
>  		vmx_disable_intercept_for_msr(MSR_IA32_SPEC_CTRL, false);
> +	if (boot_cpu_has(X86_FEATURE_PRED_CMD))
> +		vmx_disable_intercept_for_msr(MSR_IA32_PRED_CMD, false);
>  
>  	vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
>  	vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier
  2018-01-12 15:31   ` Tom Lendacky
@ 2018-01-12 15:36     ` Woodhouse, David
  2018-01-12 17:06       ` Tom Lendacky
  0 siblings, 1 reply; 81+ messages in thread
From: Woodhouse, David @ 2018-01-12 15:36 UTC (permalink / raw)
  To: Tom Lendacky, Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH
  Cc: Dave Hansen, Andrea Arcangeli, Andi Kleen, Arjan Van De Ven,
	Peter Zijlstra, Dan Williams, Paolo Bonzini, Jun Nakajima,
	Asit Mallick

[-- Attachment #1: Type: text/plain, Size: 477 bytes --]

On Fri, 2018-01-12 at 09:31 -0600, Tom Lendacky wrote:
> 
> AMD will follow the specification that if cpuid ax=0x7, return rdx[26]
> is set, it will indicate both MSR registers and features are supported.
> 
> But AMD also has a separate bit for IBPB (X86_FEATURE_PRED_CMD) alone.
> As all of the IBRS/IBPB stuff happens, that patch will follow.

Please let's roll it into the patch set. I don't want Intel posting
deliberately AMD-ignoring patches. Sort it out, guys.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  5:03           ` Dave Hansen
@ 2018-01-12 16:28             ` Josh Poimboeuf
  2018-01-12 16:28             ` Woodhouse, David
  2018-01-13  6:20             ` Andy Lutomirski
  2 siblings, 0 replies; 81+ messages in thread
From: Josh Poimboeuf @ 2018-01-12 16:28 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Raj, Ashok, Andy Lutomirski, LKML, Thomas Gleixner, Tim Chen,
	Linus Torvalds, Greg KH, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 09:03:56PM -0800, Dave Hansen wrote:
> On 01/11/2018 07:01 PM, Raj, Ashok wrote:
> > On Thu, Jan 11, 2018 at 06:20:13PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 11, 2018 at 5:52 PM, Raj, Ashok <ashok.raj@intel.com> wrote:
> >>>>
> >>>> What's wrong with native_read_msr()?
> >>>
> >>> Yes, i think i should have added to msr.h. The names didn't read as a
> >>> pair, one was native_read_msr, wrmsrl could be taken over when paravirt is
> >>> defined?
> >>
> >> Why do you need to override paravirt?
> > 
> > The idea was since these MSR's are passed through we shouldn't need to 
> > handle them any differently. Also its best to do this as soon as possible
> > and avoid longer paths to get this barrier to hardware.
> 
> We were also worried about the indirect calls that are part of the
> paravirt interfaces when retpolines are not in place.

But aren't those patched with direct calls during boot?

-- 
Josh

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  5:03           ` Dave Hansen
  2018-01-12 16:28             ` Josh Poimboeuf
@ 2018-01-12 16:28             ` Woodhouse, David
  2018-01-13  6:20             ` Andy Lutomirski
  2 siblings, 0 replies; 81+ messages in thread
From: Woodhouse, David @ 2018-01-12 16:28 UTC (permalink / raw)
  To: Dave Hansen, Raj, Ashok, Andy Lutomirski
  Cc: LKML, Thomas Gleixner, Tim Chen, Linus Torvalds, Greg KH,
	Andrea Arcangeli, Andi Kleen, Arjan Van De Ven, Peter Zijlstra,
	Dan Williams, Paolo Bonzini, Jun Nakajima, Asit Mallick

[-- Attachment #1: Type: text/plain, Size: 1172 bytes --]

On Thu, 2018-01-11 at 21:03 -0800, Dave Hansen wrote:
> On 01/11/2018 07:01 PM, Raj, Ashok wrote:
> > On Thu, Jan 11, 2018 at 06:20:13PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 11, 2018 at 5:52 PM, Raj, Ashok <ashok.raj@intel.com> wrote:
> >>>>
> >>>> What's wrong with native_read_msr()?
> >>>
> >>> Yes, i think i should have added to msr.h. The names didn't read as a
> >>> pair, one was native_read_msr, wrmsrl could be taken over when paravirt is
> >>> defined?
> >>
> >> Why do you need to override paravirt?
> > 
> > The idea was since these MSR's are passed through we shouldn't need to 
> > handle them any differently. Also its best to do this as soon as possible
> > and avoid longer paths to get this barrier to hardware.
> 
> We were also worried about the indirect calls that are part of the
> paravirt interfaces when retpolines are not in place.

I have repeatedly been told that these go away, and are converted to
direct branches by the time the kernel is running userspace.

I suppose I really ought to validate that with qemu -d in_asm or
something. I couldn't see it in the code... but I haven't looked that
hard yet.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier
  2018-01-12 15:36     ` Woodhouse, David
@ 2018-01-12 17:06       ` Tom Lendacky
  0 siblings, 0 replies; 81+ messages in thread
From: Tom Lendacky @ 2018-01-12 17:06 UTC (permalink / raw)
  To: Woodhouse, David, linux-kernel, tim.c.chen, ashok.raj, torvalds,
	tglx, luto, gregkh
  Cc: arjan.van.de.ven, peterz, ak, dan.j.williams, aarcange, pbonzini,
	dave.hansen, jun.nakajima, asit.k.mallick

On 1/12/2018 9:36 AM, Woodhouse, David wrote:
> On Fri, 2018-01-12 at 09:31 -0600, Tom Lendacky wrote:
>>
>> AMD will follow the specification that if cpuid ax=0x7, return rdx[26]
>> is set, it will indicate both MSR registers and features are supported.
>>
>> But AMD also has a separate bit for IBPB (X86_FEATURE_PRED_CMD) alone.
>> As all of the IBRS/IBPB stuff happens, that patch will follow.
> 
> Please let's roll it into the patch set. I don't want Intel posting
> deliberately AMD-ignoring patches. Sort it out, guys.
> 

Based on the current patches, here is what it should be for the
standalone IBPB support:

x86/cpu: Detect standalone IBPB support

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support to detect standalone IBPB feature support.  This feature is
indicated as follows:

  CPUID EAX=0x80000008, ECX=0x00 return EBX[12] indicates support for
  IBPB

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/spec_ctrl.c    |    9 +++++----
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 52f37fc..33f0215 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -273,6 +273,7 @@
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
 #define X86_FEATURE_IRPERF		(13*32+ 1) /* Instructions Retired Count */
 #define X86_FEATURE_XSAVEERPTR		(13*32+ 2) /* Always save/restore FP error pointers */
+#define X86_FEATURE_IBPB		(13*32+12) /* Indirect Branch Prediction Barrier */
 
 /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
 #define X86_FEATURE_DTHERM		(14*32+ 0) /* Digital Thermal Sensor */
diff --git a/arch/x86/kernel/cpu/spec_ctrl.c b/arch/x86/kernel/cpu/spec_ctrl.c
index 6cfec19..1aadd73 100644
--- a/arch/x86/kernel/cpu/spec_ctrl.c
+++ b/arch/x86/kernel/cpu/spec_ctrl.c
@@ -16,12 +16,13 @@ void spec_ctrl_scan_feature(struct cpuinfo_x86 *c)
 				static_branch_enable(&spec_ctrl_dynamic_ibrs);
 		}
 		/*
-		 * For Intel CPU's this MSR is shared the same cpuid
-		 * enumeration. When MSR_IA32_SPEC_CTRL is present
-		 * MSR_IA32_SPEC_CTRL is also available
-		 * TBD: AMD might have a separate enumeration for each.
+		 * The PRED_CMD MSR is shared with the cpuid enumeration
+		 * for SPEC_CTRL.  When MSR_IA32_SPEC_CTRL is present,
+		 * then MSR_IA32_PRED_CMD is, too.
 		 */
 		set_cpu_cap(c, X86_FEATURE_PRED_CMD);
+	} else if (boot_cpu_has(X86_FEATURE_IBPB)) {
+		set_cpu_cap(c, X86_FEATURE_PRED_CMD);
 	}
 }
 


> 
> 
> Amazon Web Services UK Limited. Registered in England and Wales with
> registration number 08650665 and which has its registered office at 60
> Holborn Viaduct, London EC1A 2FD, United Kingdom.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  3:01         ` Raj, Ashok
  2018-01-12  5:03           ` Dave Hansen
@ 2018-01-13  6:19           ` Andy Lutomirski
  1 sibling, 0 replies; 81+ messages in thread
From: Andy Lutomirski @ 2018-01-13  6:19 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Andy Lutomirski, LKML, Thomas Gleixner, Tim Chen, Linus Torvalds,
	Greg KH, Dave Hansen, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 7:01 PM, Raj, Ashok <ashok.raj@intel.com> wrote:
> On Thu, Jan 11, 2018 at 06:20:13PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 11, 2018 at 5:52 PM, Raj, Ashok <ashok.raj@intel.com> wrote:
>> >>
>> >> What's wrong with native_read_msr()?
>> >
>> > Yes, i think i should have added to msr.h. The names didn't read as a
>> > pair, one was native_read_msr, wrmsrl could be taken over when paravirt is
>> > defined?
>>
>> Why do you need to override paravirt?
>
> The idea was since these MSR's are passed through we shouldn't need to
> handle them any differently. Also its best to do this as soon as possible
> and avoid longer paths to get this barrier to hardware.
>

It should end up with essentially the same instructions at runtime.
I'm generally in favor of using the less weird code when it will work.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-12  5:03           ` Dave Hansen
  2018-01-12 16:28             ` Josh Poimboeuf
  2018-01-12 16:28             ` Woodhouse, David
@ 2018-01-13  6:20             ` Andy Lutomirski
  2018-01-13 13:52               ` Van De Ven, Arjan
  2 siblings, 1 reply; 81+ messages in thread
From: Andy Lutomirski @ 2018-01-13  6:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Raj, Ashok, Andy Lutomirski, LKML, Thomas Gleixner, Tim Chen,
	Linus Torvalds, Greg KH, Andrea Arcangeli, Andi Kleen,
	Arjan Van De Ven, David Woodhouse, Peter Zijlstra, Dan Williams,
	Paolo Bonzini, Jun Nakajima, Asit Mallick

On Thu, Jan 11, 2018 at 9:03 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 01/11/2018 07:01 PM, Raj, Ashok wrote:
>> On Thu, Jan 11, 2018 at 06:20:13PM -0800, Andy Lutomirski wrote:
>>> On Thu, Jan 11, 2018 at 5:52 PM, Raj, Ashok <ashok.raj@intel.com> wrote:
>>>>>
>>>>> What's wrong with native_read_msr()?
>>>>
>>>> Yes, i think i should have added to msr.h. The names didn't read as a
>>>> pair, one was native_read_msr, wrmsrl could be taken over when paravirt is
>>>> defined?
>>>
>>> Why do you need to override paravirt?
>>
>> The idea was since these MSR's are passed through we shouldn't need to
>> handle them any differently. Also its best to do this as soon as possible
>> and avoid longer paths to get this barrier to hardware.
>
> We were also worried about the indirect calls that are part of the
> paravirt interfaces when retpolines are not in place.
>

How could those possibly be any worse than any other indirect call in
the kernel?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-13  6:20             ` Andy Lutomirski
@ 2018-01-13 13:52               ` Van De Ven, Arjan
  2018-01-13 15:20                 ` Andy Lutomirski
  0 siblings, 1 reply; 81+ messages in thread
From: Van De Ven, Arjan @ 2018-01-13 13:52 UTC (permalink / raw)
  To: Andy Lutomirski, Hansen, Dave
  Cc: Raj, Ashok, LKML, Thomas Gleixner, Tim Chen, Linus Torvalds,
	Greg KH, Andrea Arcangeli, Andi Kleen, David Woodhouse,
	Peter Zijlstra, Williams, Dan J, Paolo Bonzini, Nakajima, Jun,
	Mallick, Asit K


> > We were also worried about the indirect calls that are part of the
> > paravirt interfaces when retpolines are not in place.
> >
> 
> How could those possibly be any worse than any other indirect call in
> the kernel?

they're worse if they happen before you write the MSR that then protects them?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl
  2018-01-13 13:52               ` Van De Ven, Arjan
@ 2018-01-13 15:20                 ` Andy Lutomirski
  0 siblings, 0 replies; 81+ messages in thread
From: Andy Lutomirski @ 2018-01-13 15:20 UTC (permalink / raw)
  To: Van De Ven, Arjan
  Cc: Andy Lutomirski, Hansen, Dave, Raj, Ashok, LKML, Thomas Gleixner,
	Tim Chen, Linus Torvalds, Greg KH, Andrea Arcangeli, Andi Kleen,
	David Woodhouse, Peter Zijlstra, Williams, Dan J, Paolo Bonzini,
	Nakajima, Jun, Mallick, Asit K



> On Jan 13, 2018, at 5:52 AM, Van De Ven, Arjan <arjan.van.de.ven@intel.com> wrote:
> 
> 
>>> We were also worried about the indirect calls that are part of the
>>> paravirt interfaces when retpolines are not in place.
>>> 
>> 
>> How could those possibly be any worse than any other indirect call in
>> the kernel?
> 
> they're worse if they happen before you write the MSR that then protects them?

I haven't looked at the latest IBRS code, but I can see only two ways to do it:

1. Write the MSRs from asm.  Get exactly what you expect.

2. Write them from C.  Trust the compiler to be sane.  Failure to optimize asm goto correctly or failure of the paravirt code to patch correctly is very low on the list of things to worry about.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL
  2018-01-12 10:09       ` David Woodhouse
@ 2018-01-15 13:45         ` Peter Zijlstra
  2018-01-15 13:59           ` David Woodhouse
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2018-01-15 13:45 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Dave Hansen, Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, Dan Williams, Paolo Bonzini,
	Jun Nakajima, Asit Mallick

On Fri, Jan 12, 2018 at 10:09:08AM +0000, David Woodhouse wrote:
> static_cpu_has() + asm-goto is NOT SUFFICIENT.
> 
> It's still *possible* for a missed optimisation in GCC to still leave
> us with a conditional branch around the wrmsr, letting the CPU
> speculate around it too.

OK, so GCC would have to be bloody retarded to mess this up; but would
something like the below work for you?

The usage is like:

  if (static_branch_unlikely(key)) {
	arch_static_assert();
	stuff();
  }

And then objtool will fail things if the first instruction into that
branch is not immediately after a NOP/JMP patch site (on either the NOP
or the JMP+disp side of things).

---
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 8c0de4282659..6a1a893145ca 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -62,6 +62,15 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
 	return true;
 }
 
+static __always_inline void arch_static_assert(void)
+{
+	asm volatile ("1:\n\t"
+		      ".pushsection .discard.jump_assert, \"aw\" \n\t"
+		      _ASM_ALIGN  "\n\t"
+		      _ASM_PTR "1b \n\t"
+		      ".popsection \n\t");
+}
+
 #ifdef CONFIG_X86_64
 typedef u64 jump_label_t;
 #else
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index f40d46e24bcc..657bfc706bb6 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -687,8 +687,17 @@ static int handle_jump_alt(struct objtool_file *file,
 			   struct instruction *orig_insn,
 			   struct instruction **new_insn)
 {
-	if (orig_insn->type == INSN_NOP)
+	struct instruction *next_insn = list_next_entry(orig_insn, list);
+
+	if (orig_insn->type == INSN_NOP) {
+		/*
+		 * If orig_insn is a NOP, then new_insn is the branch target
+		 * for when it would've been a JMP.
+		 */
+		next_insn->br_static = true;
+		(*new_insn)->br_static = true;
 		return 0;
+	}
 
 	if (orig_insn->type != INSN_JUMP_UNCONDITIONAL) {
 		WARN_FUNC("unsupported instruction at jump label",
@@ -696,7 +705,16 @@ static int handle_jump_alt(struct objtool_file *file,
 		return -1;
 	}
 
-	*new_insn = list_next_entry(orig_insn, list);
+	/*
+	 * Otherwise, orig_insn is a JMP and it will have orig_insn->jump_dest.
+	 * In this case we'll effectively NOP the alt by pointing new_insn at
+	 * next_insn.
+	 */
+	orig_insn->jump_dest->br_static = true;
+	next_insn->br_static = true;
+
+	*new_insn = next_insn;
+
 	return 0;
 }
 
@@ -1067,6 +1085,50 @@ static int read_unwind_hints(struct objtool_file *file)
 	return 0;
 }
 
+static int read_jump_assertions(struct objtool_file *file)
+{
+	struct section *sec, *relasec;
+	struct instruction *insn;
+	struct rela *rela;
+	int i;
+
+	sec = find_section_by_name(file->elf, ".discard.jump_assert");
+	if (!sec)
+		return 0;
+
+	relasec = sec->rela;
+	if (!relasec) {
+		WARN("missing .rela.discard.jump_assert section");
+		return -1;
+	}
+
+	if (sec->len % sizeof(unsigned long)) {
+		WARN("jump_assert size mismatch: %d %ld", sec->len, sizeof(unsigned long));
+		return -1;
+	}
+
+	for (i = 0; i < sec->len / sizeof(unsigned long); i++) {
+		rela = find_rela_by_dest(sec, i * sizeof(unsigned long));
+		if (!rela) {
+			WARN("can't find rela for jump_assert[%d]", i);
+			return -1;
+		}
+
+		insn = find_insn(file, rela->sym->sec, rela->addend);
+		if (!insn) {
+			WARN("can't find insn for jump_assert[%d]", i);
+			return -1;
+		}
+
+		if (!insn->br_static) {
+			WARN_FUNC("static assert FAIL", insn->sec, insn->offset);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 static int decode_sections(struct objtool_file *file)
 {
 	int ret;
@@ -1105,6 +1167,10 @@ static int decode_sections(struct objtool_file *file)
 	if (ret)
 		return ret;
 
+	ret = read_jump_assertions(file);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index dbadb304a410..12e0a3cf0350 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -45,6 +46,7 @@ struct instruction {
 	unsigned char type;
 	unsigned long immediate;
 	bool alt_group, visited, dead_end, ignore, hint, save, restore, ignore_alts;
+	bool br_static;
 	struct symbol *call_dest;
 	struct instruction *jump_dest;
 	struct list_head alts;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL
  2018-01-15 13:45         ` Peter Zijlstra
@ 2018-01-15 13:59           ` David Woodhouse
  2018-01-15 14:45             ` Peter Zijlstra
  0 siblings, 1 reply; 81+ messages in thread
From: David Woodhouse @ 2018-01-15 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, Dan Williams, Paolo Bonzini,
	Jun Nakajima, Asit Mallick

[-- Attachment #1: Type: text/plain, Size: 1401 bytes --]

On Mon, 2018-01-15 at 14:45 +0100, Peter Zijlstra wrote:
> On Fri, Jan 12, 2018 at 10:09:08AM +0000, David Woodhouse wrote:
> > static_cpu_has() + asm-goto is NOT SUFFICIENT.
> > 
> > It's still *possible* for a missed optimisation in GCC to still leave
> > us with a conditional branch around the wrmsr, letting the CPU
> > speculate around it too.
> 
> OK, so GCC would have to be bloody retarded to mess this up;

Like *that's* never happened before? In corner cases where it just gets
confused and certain optimisations go out the window?

>  but would something like the below work for you?
> 
> The usage is like:
> 
>   if (static_branch_unlikely(key)) {
>         arch_static_assert();
>         stuff();
>   }
> 
> And then objtool will fail things if the first instruction into that
> branch is not immediately after a NOP/JMP patch site (on either the NOP
> or the JMP+disp side of things).

That seems reasonable; thanks. Bonus points if you can make the
arch_static_assert happen() automatically with vile tricks like

#define IF_FEATURE(ftr) if (static_cpu_has(ftr)) arch_static_assert, 

So then it just becomes

   IF_FEATURE(key) {
       stuff();
   }

There might not be a sane way to do that though. And it's OK to have to
manually annotate the call sites where this is for correctness and not
purely optimisation.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL
  2018-01-15 13:59           ` David Woodhouse
@ 2018-01-15 14:45             ` Peter Zijlstra
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Zijlstra @ 2018-01-15 14:45 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Dave Hansen, Ashok Raj, linux-kernel, Thomas Gleixner, Tim Chen,
	Andy Lutomirski, Linus Torvalds, Greg KH, Andrea Arcangeli,
	Andi Kleen, Arjan Van De Ven, Dan Williams, Paolo Bonzini,
	Jun Nakajima, Asit Mallick

On Mon, Jan 15, 2018 at 02:59:22PM +0100, David Woodhouse wrote:
> #define IF_FEATURE(ftr) if (static_cpu_has(ftr)) arch_static_assert, 
> 
>    IF_FEATURE(key) {
>        stuff();
>    }
> 
> There might not be a sane way to do that though. And it's OK to have to
> manually annotate the call sites where this is for correctness and not
> purely optimisation.

Something like:

#define if_static_likely(_key) \
	if (static_branch_likely(_key) && (arch_static_assert(), true))

should work I think. The thing about static_cpu_has() is that it doesn't
use jump_labels but alternatives. I could of course also construct an
assert for that, but it needs different things.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v6 0/5] KVM: Expose speculation control feature to guests
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  0 siblings, 0 replies; 81+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky

Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is used by Intel processors to
indicate RDCL_NO and IBRS_ALL.

Keep in mind that the SVM part of the patch is unchanged this time. Mostly to
get feedback/confirmation about the nested handling for VMX first, once this is
done I will update SVM as well.

v6:
- Do not penalize (save/restore IBRS) all L2 guests when anyone of them starts
  using the SPEC_CTRL.

v5:
- svm: add PRED_CMD and SPEC_CTRL to direct_access_msrs list.
- vmx: check also for X86_FEATURE_SPEC_CTRL for msr reads and writes.
- vmx: Use MSR_TYPE_W instead of MSR_TYPE_R for the nested IBPB MSR
- rewrite commit message for IBPB patch [2/5] (Ashok)

v4:
- Add IBRS passthrough for SVM (5/5).
- Handle nested guests properly.
- expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features

Ashok Raj (1):
  KVM: x86: Add IBPB support

KarimAllah Ahmed (4):
  KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

 arch/x86/kvm/cpuid.c |  22 ++++--
 arch/x86/kvm/cpuid.h |   1 +
 arch/x86/kvm/svm.c   |  87 +++++++++++++++++++++++
 arch/x86/kvm/vmx.c   | 196 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c   |   1 +
 5 files changed, 299 insertions(+), 8 deletions(-)

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org

-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v6 0/5] KVM: Expose speculation control feature to guests
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  0 siblings, 0 replies; 81+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu

Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is used by Intel processors to
indicate RDCL_NO and IBRS_ALL.

Keep in mind that the SVM part of the patch is unchanged this time. Mostly to
get feedback/confirmation about the nested handling for VMX first, once this is
done I will update SVM as well.

v6:
- Do not penalize (save/restore IBRS) all L2 guests when anyone of them starts
  using the SPEC_CTRL.

v5:
- svm: add PRED_CMD and SPEC_CTRL to direct_access_msrs list.
- vmx: check also for X86_FEATURE_SPEC_CTRL for msr reads and writes.
- vmx: Use MSR_TYPE_W instead of MSR_TYPE_R for the nested IBPB MSR
- rewrite commit message for IBPB patch [2/5] (Ashok)

v4:
- Add IBRS passthrough for SVM (5/5).
- Handle nested guests properly.
- expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features

Ashok Raj (1):
  KVM: x86: Add IBPB support

KarimAllah Ahmed (4):
  KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

 arch/x86/kvm/cpuid.c |  22 ++++--
 arch/x86/kvm/cpuid.h |   1 +
 arch/x86/kvm/svm.c   |  87 +++++++++++++++++++++++
 arch/x86/kvm/vmx.c   | 196 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c   |   1 +
 5 files changed, 299 insertions(+), 8 deletions(-)

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org

-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  2018-02-01 21:59 ` KarimAllah Ahmed
  (?)
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 17:37   ` Jim Mattson
  2018-02-03 22:50   ` [tip:x86/pti] KVM/x86: " tip-bot for KarimAllah Ahmed
  -1 siblings, 2 replies; 81+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, David Woodhouse

[dwmw2: Stop using KF() for bits in it, too]
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/cpuid.c | 8 +++-----
 arch/x86/kvm/cpuid.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW     2
-#define KVM_CPUID_BIT_AVX512_4FMAPS     3
+/* For scattered features from cpufeatures.h; we currently expose none */
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
 				entry->ecx &= ~F(PKU);
 			entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-			entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+			cpuid_mask(&entry->edx, CPUID_7_EDX);
 		} else {
 			entry->ebx = 0;
 			entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2cea66..9a327d5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
 	[CPUID_7_ECX]         = {         7, 0, CPUID_ECX},
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
+	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-01 21:59 ` KarimAllah Ahmed
  (?)
  (?)
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 17:49   ` Konrad Rzeszutek Wilk
                     ` (3 more replies)
  -1 siblings, 4 replies; 81+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Ashok Raj, Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra, David Woodhouse, KarimAllah Ahmed

From: Ashok Raj <ashok.raj@intel.com>

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
    there is a IBPB in that path.
    (Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
    Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
           - vmx: expose PRED_CMD if guest has it in CPUID
           - svm: only pass through IBPB if guest has it in CPUID
           - vmx: support !cpu_has_vmx_msr_bitmap()]
           - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
        PRED_CMD is a write-only MSR]

Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
---
v6:
- introduce msr_write_intercepted_l01

v5:
- Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR.
- Always merge the bitmaps unconditionally.
- Add PRED_CMD to direct_access_msrs.
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
- rewrite the commit message (from ashok.raj@)
---
 arch/x86/kvm/cpuid.c | 11 +++++++-
 arch/x86/kvm/svm.c   | 28 ++++++++++++++++++
 arch/x86/kvm/vmx.c   | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
 		0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+	/* cpuid 0x80000008.ebx */
+	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+		F(IBPB);
+
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
 		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		if (!g_phys_as)
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
-		entry->ebx = entry->edx = 0;
+		entry->edx = 0;
+		/* IBPB isn't necessarily present in hardware cpuid */
+		if (boot_cpu_has(X86_FEATURE_IBPB))
+			entry->ebx |= F(IBPB);
+		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 		break;
 	}
 	case 0x80000019:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f40d0da..254eefb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -249,6 +249,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_CSTAR,				.always = true  },
 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
 #endif
+	{ .index = MSR_IA32_PRED_CMD,			.always = false },
 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
@@ -529,6 +530,7 @@ struct svm_cpu_data {
 	struct kvm_ldttss_desc *tss_desc;
 
 	struct page *save_area;
+	struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -1703,11 +1705,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
+	/*
+	 * The vmcb page can be recycled, causing a false negative in
+	 * svm_vcpu_load(). So do a full IBPB now.
+	 */
+	indirect_branch_prediction_barrier();
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 	int i;
 
 	if (unlikely(cpu != vcpu->cpu)) {
@@ -1736,6 +1744,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
+	if (sd->current_vmcb != svm->vmcb) {
+		sd->current_vmcb = svm->vmcb;
+		indirect_branch_prediction_barrier();
+	}
 	avic_vcpu_load(vcpu, cpu);
 }
 
@@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr);
 		break;
+	case MSR_IA32_PRED_CMD:
+		if (!msr->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
+			return 1;
+
+		if (data & ~PRED_CMD_IBPB)
+			return 1;
+
+		if (!data)
+			break;
+
+		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+		if (is_guest_mode(vcpu))
+			break;
+		set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
+		break;
 	case MSR_STAR:
 		svm->vmcb->save.star = data;
 		break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d46a61b..263eb1f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -592,6 +592,7 @@ struct vcpu_vmx {
 	u64 		      msr_host_kernel_gs_base;
 	u64 		      msr_guest_kernel_gs_base;
 #endif
+
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
 	u32 secondary_exec_control;
@@ -936,6 +937,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
 					    u16 error_code);
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+							  u32 msr, int type);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -1907,6 +1910,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 	vmcs_write32(EXCEPTION_BITMAP, eb);
 }
 
+/*
+ * Check if MSR is intercepted for L01 MSR bitmap.
+ */
+static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
+{
+	unsigned long *msr_bitmap;
+	int f = sizeof(unsigned long);
+
+	if (!cpu_has_vmx_msr_bitmap())
+		return true;
+
+	msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+
+	if (msr <= 0x1fff) {
+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+		msr &= 0x1fff;
+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+	}
+
+	return true;
+}
+
 static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
 		unsigned long entry, unsigned long exit)
 {
@@ -2285,6 +2311,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
 		vmcs_load(vmx->loaded_vmcs->vmcs);
+		indirect_branch_prediction_barrier();
 	}
 
 	if (!already_loaded) {
@@ -3342,6 +3369,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr_info);
 		break;
+	case MSR_IA32_PRED_CMD:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+			return 1;
+
+		if (data & ~PRED_CMD_IBPB)
+			return 1;
+
+		if (!data)
+			break;
+
+		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+
+		/*
+		 * For non-nested:
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through.
+		 *
+		 * For nested:
+		 * The handling of the MSR bitmap for L2 guests is done in
+		 * nested_vmx_merge_msr_bitmap. We should not touch the
+		 * vmcs02.msr_bitmap here since it gets completely overwritten
+		 * in the merging.
+		 */
+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
+					      MSR_TYPE_W);
+		break;
 	case MSR_IA32_CR_PAT:
 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -10044,9 +10099,23 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	struct page *page;
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
+	/*
+	 * pred_cmd is trying to verify two things:
+	 *
+	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
+	 *    ensures that we do not accidentally generate an L02 MSR bitmap
+	 *    from the L12 MSR bitmap that is too permissive.
+	 * 2. That L1 or L2s have actually used the MSR. This avoids
+	 *    unnecessarily merging of the bitmap if the MSR is unused. This
+	 *    works properly because we only update the L01 MSR bitmap lazily.
+	 *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
+	 *    updated to reflect this when L1 (or its L2s) actually write to
+	 *    the MSR.
+	 */
+	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
 
-	/* This shortcut is ok because we support only x2APIC MSRs so far. */
-	if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
+	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
+	    !pred_cmd)
 		return false;
 
 	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -10079,6 +10148,13 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 				MSR_TYPE_W);
 		}
 	}
+
+	if (pred_cmd)
+		nested_vmx_disable_intercept_for_msr(
+					msr_bitmap_l1, msr_bitmap_l0,
+					MSR_IA32_PRED_CMD,
+					MSR_TYPE_W);
+
 	kunmap(page);
 	kvm_release_page_clean(page);
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-02-01 21:59 ` KarimAllah Ahmed
                   ` (2 preceding siblings ...)
  (?)
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 10:53   ` Darren Kenny
                     ` (2 more replies)
  -1 siblings, 3 replies; 81+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Asit Mallick, Dave Hansen, Arjan Van De Ven,
	Tim Chen, Linus Torvalds, Andrea Arcangeli, Andi Kleen,
	Thomas Gleixner, Dan Williams, Jun Nakajima, Andy Lutomirski,
	Greg KH, Paolo Bonzini, Ashok Raj, David Woodhouse

Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/cpuid.c |  2 +-
 arch/x86/kvm/vmx.c   | 15 +++++++++++++++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 263eb1f..b13314a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -593,6 +593,8 @@ struct vcpu_vmx {
 	u64 		      msr_guest_kernel_gs_base;
 #endif
 
+	u64 		      arch_capabilities;
+
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
 	u32 secondary_exec_control;
@@ -3262,6 +3264,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		msr_info->data = guest_read_tsc(vcpu);
 		break;
+	case MSR_IA32_ARCH_CAPABILITIES:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+			return 1;
+		msr_info->data = to_vmx(vcpu)->arch_capabilities;
+		break;
 	case MSR_IA32_SYSENTER_CS:
 		msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
 		break;
@@ -3397,6 +3405,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
 					      MSR_TYPE_W);
 		break;
+	case MSR_IA32_ARCH_CAPABILITIES:
+		if (!msr_info->host_initiated)
+			return 1;
+		vmx->arch_capabilities = data;
+		break;
 	case MSR_IA32_CR_PAT:
 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5659,6 +5672,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 		++vmx->nmsrs;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
 	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..4ec142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+	MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` KarimAllah Ahmed
                   ` (3 preceding siblings ...)
  (?)
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 11:03   ` Darren Kenny
                     ` (4 more replies)
  -1 siblings, 5 replies; 81+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Asit Mallick, Arjan Van De Ven, Dave Hansen,
	Andi Kleen, Andrea Arcangeli, Linus Torvalds, Tim Chen,
	Thomas Gleixner, Dan Williams, Jun Nakajima, Paolo Bonzini,
	David Woodhouse, Greg KH, Andy Lutomirski, Ashok Raj

[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
v6:
- got rid of save_spec_ctrl_on_exit
- introduce msr_write_intercepted
v5:
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
v4:
- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
- Handling nested guests
v3:
- Save/restore manually
- Fix CPUID handling
- Fix a copy & paste error in the name of SPEC_CTRL MSR in
  disable_intercept.
- support !cpu_has_vmx_msr_bitmap()
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
  when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
---
 arch/x86/kvm/cpuid.c |   9 +++--
 arch/x86/kvm/vmx.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c   |   2 +-
 3 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 0x80000008.ebx */
 	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-		F(IBPB);
+		F(IBPB) | F(IBRS);
 
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+		F(ARCH_CAPABILITIES);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
 		entry->edx = 0;
-		/* IBPB isn't necessarily present in hardware cpuid */
+		/* IBRS and IBPB aren't necessarily present in hardware cpuid */
 		if (boot_cpu_has(X86_FEATURE_IBPB))
 			entry->ebx |= F(IBPB);
+		if (boot_cpu_has(X86_FEATURE_IBRS))
+			entry->ebx |= F(IBRS);
 		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
 		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 		break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b13314a..5d8a6a91 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -594,6 +594,7 @@ struct vcpu_vmx {
 #endif
 
 	u64 		      arch_capabilities;
+	u64 		      spec_ctrl;
 
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
@@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 }
 
 /*
+ * Check if MSR is intercepted for currently loaded MSR bitmap.
+ */
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+{
+	unsigned long *msr_bitmap;
+	int f = sizeof(unsigned long);
+
+	if (!cpu_has_vmx_msr_bitmap())
+		return true;
+
+	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+
+	if (msr <= 0x1fff) {
+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+		msr &= 0x1fff;
+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+	}
+
+	return true;
+}
+
+/*
  * Check if MSR is intercepted for L01 MSR bitmap.
  */
 static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
@@ -3264,6 +3288,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		msr_info->data = guest_read_tsc(vcpu);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+			return 1;
+
+		msr_info->data = to_vmx(vcpu)->spec_ctrl;
+		break;
 	case MSR_IA32_ARCH_CAPABILITIES:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
@@ -3377,6 +3409,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr_info);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+			return 1;
+
+		/* The STIBP bit doesn't fault even if it's not advertised */
+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+			return 1;
+
+		vmx->spec_ctrl = data;
+
+		if (!data)
+			break;
+
+		/*
+		 * For non-nested:
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through.
+		 *
+		 * For nested:
+		 * The handling of the MSR bitmap for L2 guests is done in
+		 * nested_vmx_merge_msr_bitmap. We should not touch the
+		 * vmcs02.msr_bitmap here since it gets completely overwritten
+		 * in the merging. We update the vmcs01 here for L1 as well
+		 * since it will end up touching the MSR anyway now.
+		 */
+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
+					      MSR_IA32_SPEC_CTRL,
+					      MSR_TYPE_RW);
+		break;
 	case MSR_IA32_PRED_CMD:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
@@ -5702,6 +5765,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	u64 cr0;
 
 	vmx->rmode.vm86_active = 0;
+	vmx->spec_ctrl = 0;
 
 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
 	kvm_set_cr8(vcpu, 0);
@@ -9373,6 +9437,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	vmx_arm_hv_timer(vcpu);
 
+	/*
+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
+	 * is no need to worry about the conditional branch over the wrmsr
+	 * being speculatively taken.
+	 */
+	if (vmx->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
 	vmx->__launched = vmx->loaded_vmcs->launched;
 	asm(
 		/* Store host registers */
@@ -9491,6 +9564,27 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
 	      );
 
+	/*
+	 * We do not use IBRS in the kernel. If this vCPU has used the
+	 * SPEC_CTRL MSR it may have left it on; save the value and
+	 * turn it off. This is much more efficient than blindly adding
+	 * it to the atomic save/restore list. Especially as the former
+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+	 *
+	 * For non-nested case:
+	 * If the L01 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 *
+	 * For nested case:
+	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 */
+	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
+		rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
+	if (vmx->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
 
@@ -10115,7 +10209,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
 	/*
-	 * pred_cmd is trying to verify two things:
+	 * pred_cmd & spec_ctrl are trying to verify two things:
 	 *
 	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
 	 *    ensures that we do not accidentally generate an L02 MSR bitmap
@@ -10128,9 +10222,10 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	 *    the MSR.
 	 */
 	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+	bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
 
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
-	    !pred_cmd)
+	    !pred_cmd && !spec_ctrl)
 		return false;
 
 	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -10164,6 +10259,12 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 		}
 	}
 
+	if (spec_ctrl)
+		nested_vmx_disable_intercept_for_msr(
+					msr_bitmap_l1, msr_bitmap_l0,
+					MSR_IA32_SPEC_CTRL,
+					MSR_TYPE_R | MSR_TYPE_W);
+
 	if (pred_cmd)
 		nested_vmx_disable_intercept_for_msr(
 					msr_bitmap_l1, msr_bitmap_l0,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4ec142e..ac38143 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,7 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
-	MSR_IA32_ARCH_CAPABILITIES
+	MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v6 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` KarimAllah Ahmed
                   ` (4 preceding siblings ...)
  (?)
@ 2018-02-01 21:59 ` KarimAllah Ahmed
  2018-02-02 11:06   ` Darren Kenny
  2018-02-02 18:02   ` Konrad Rzeszutek Wilk
  -1 siblings, 2 replies; 81+ messages in thread
From: KarimAllah Ahmed @ 2018-02-01 21:59 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: KarimAllah Ahmed, Asit Mallick, Arjan Van De Ven, Dave Hansen,
	Andi Kleen, Andrea Arcangeli, Linus Torvalds, Tim Chen,
	Thomas Gleixner, Dan Williams, Jun Nakajima, Paolo Bonzini,
	David Woodhouse, Greg KH, Andy Lutomirski, Ashok Raj

[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
v5:
- Add SPEC_CTRL to direct_access_msrs.
---
 arch/x86/kvm/svm.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 254eefb..c6ab343 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -184,6 +184,9 @@ struct vcpu_svm {
 		u64 gs_base;
 	} host;
 
+	u64 spec_ctrl;
+	bool save_spec_ctrl_on_exit;
+
 	u32 *msrpm;
 
 	ulong nmi_iret_rip;
@@ -249,6 +252,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_CSTAR,				.always = true  },
 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
 #endif
+	{ .index = MSR_IA32_SPEC_CTRL,			.always = false },
 	{ .index = MSR_IA32_PRED_CMD,			.always = false },
 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
@@ -1584,6 +1588,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	u32 dummy;
 	u32 eax = 1;
 
+	svm->spec_ctrl = 0;
+
 	if (!init_event) {
 		svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
 					   MSR_IA32_APICBASE_ENABLE;
@@ -3605,6 +3611,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_VM_CR:
 		msr_info->data = svm->nested.vm_cr_msr;
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+			return 1;
+
+		msr_info->data = svm->spec_ctrl;
+		break;
 	case MSR_IA32_UCODE_REV:
 		msr_info->data = 0x01000065;
 		break;
@@ -3696,6 +3709,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+			return 1;
+
+		/* The STIBP bit doesn't fault even if it's not advertised */
+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+			return 1;
+
+		svm->spec_ctrl = data;
+
+		/*
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through. This means we don't have to take the perf
+		 * hit of saving it on vmexit for the common case of guests
+		 * that don't use it.
+		 */
+		if (data && !svm->save_spec_ctrl_on_exit) {
+			svm->save_spec_ctrl_on_exit = true;
+			if (is_guest_mode(vcpu))
+				break;
+			set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+		}
+		break;
 	case MSR_IA32_PRED_CMD:
 		if (!msr->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
@@ -4964,6 +5001,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	local_irq_enable();
 
+	/*
+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
+	 * is no need to worry about the conditional branch over the wrmsr
+	 * being speculatively taken.
+	 */
+	if (svm->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
 	asm volatile (
 		"push %%" _ASM_BP "; \n\t"
 		"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
@@ -5056,6 +5102,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
 		);
 
+	/*
+	 * We do not use IBRS in the kernel. If this vCPU has used the
+	 * SPEC_CTRL MSR it may have left it on; save the value and
+	 * turn it off. This is much more efficient than blindly adding
+	 * it to the atomic save/restore list. Especially as the former
+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+	 */
+	if (svm->save_spec_ctrl_on_exit)
+		rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
+	if (svm->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-02-01 21:59 ` [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KarimAllah Ahmed
@ 2018-02-02 10:53   ` Darren Kenny
  2018-02-02 17:35     ` Jim Mattson
  2018-02-02 17:51   ` Konrad Rzeszutek Wilk
  2018-02-03 22:51   ` [tip:x86/pti] KVM/VMX: " tip-bot for KarimAllah Ahmed
  2 siblings, 1 reply; 81+ messages in thread
From: Darren Kenny @ 2018-02-02 10:53 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Dave Hansen,
	Arjan Van De Ven, Tim Chen, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Andy Lutomirski, Greg KH, Paolo Bonzini, Ashok Raj,
	David Woodhouse

On Thu, Feb 01, 2018 at 10:59:44PM +0100, KarimAllah Ahmed wrote:
>Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
>(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
>contents will come directly from the hardware, but user-space can still
>override it.
>
>[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
>
>Cc: Asit Mallick <asit.k.mallick@intel.com>
>Cc: Dave Hansen <dave.hansen@intel.com>
>Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>Cc: Tim Chen <tim.c.chen@linux.intel.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Andrea Arcangeli <aarcange@redhat.com>
>Cc: Andi Kleen <ak@linux.intel.com>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Jun Nakajima <jun.nakajima@intel.com>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Greg KH <gregkh@linuxfoundation.org>
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: Ashok Raj <ashok.raj@intel.com>
>Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

>---
> arch/x86/kvm/cpuid.c |  2 +-
> arch/x86/kvm/vmx.c   | 15 +++++++++++++++
> arch/x86/kvm/x86.c   |  1 +
> 3 files changed, 17 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>index 033004d..1909635 100644
>--- a/arch/x86/kvm/cpuid.c
>+++ b/arch/x86/kvm/cpuid.c
>@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>
> 	/* cpuid 7.0.edx*/
> 	const u32 kvm_cpuid_7_0_edx_x86_features =
>-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
>+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
>
> 	/* all calls to cpuid_count() should be made on the same cpu */
> 	get_cpu();
>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>index 263eb1f..b13314a 100644
>--- a/arch/x86/kvm/vmx.c
>+++ b/arch/x86/kvm/vmx.c
>@@ -593,6 +593,8 @@ struct vcpu_vmx {
> 	u64 		      msr_guest_kernel_gs_base;
> #endif
>
>+	u64 		      arch_capabilities;
>+
> 	u32 vm_entry_controls_shadow;
> 	u32 vm_exit_controls_shadow;
> 	u32 secondary_exec_control;
>@@ -3262,6 +3264,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_IA32_TSC:
> 		msr_info->data = guest_read_tsc(vcpu);
> 		break;
>+	case MSR_IA32_ARCH_CAPABILITIES:
>+		if (!msr_info->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
>+			return 1;
>+		msr_info->data = to_vmx(vcpu)->arch_capabilities;
>+		break;
> 	case MSR_IA32_SYSENTER_CS:
> 		msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
> 		break;
>@@ -3397,6 +3405,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
> 					      MSR_TYPE_W);
> 		break;
>+	case MSR_IA32_ARCH_CAPABILITIES:
>+		if (!msr_info->host_initiated)
>+			return 1;
>+		vmx->arch_capabilities = data;
>+		break;
> 	case MSR_IA32_CR_PAT:
> 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
> 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
>@@ -5659,6 +5672,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> 		++vmx->nmsrs;
> 	}
>
>+	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
>+		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
>
> 	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index c53298d..4ec142e 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
> #endif
> 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
> 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
>+	MSR_IA32_ARCH_CAPABILITIES
> };
>
> static unsigned num_msrs_to_save;
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
@ 2018-02-02 11:03   ` Darren Kenny
  2018-02-02 11:27     ` David Woodhouse
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 81+ messages in thread
From: Darren Kenny @ 2018-02-02 11:03 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, David Woodhouse, Greg KH, Andy Lutomirski,
	Ashok Raj

On Thu, Feb 01, 2018 at 10:59:45PM +0100, KarimAllah Ahmed wrote:
>[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
>
>Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
>guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
>be using a retpoline+IBPB based approach.
>
>To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
>guests that do not actually use the MSR, only start saving and restoring
>when a non-zero is written to it.
>
>No attempt is made to handle STIBP here, intentionally. Filtering STIBP
>may be added in a future patch, which may require trapping all writes
>if we don't want to pass it through directly to the guest.
>
>[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
>
>Cc: Asit Mallick <asit.k.mallick@intel.com>
>Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>Cc: Dave Hansen <dave.hansen@intel.com>
>Cc: Andi Kleen <ak@linux.intel.com>
>Cc: Andrea Arcangeli <aarcange@redhat.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Tim Chen <tim.c.chen@linux.intel.com>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Jun Nakajima <jun.nakajima@intel.com>
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: David Woodhouse <dwmw@amazon.co.uk>
>Cc: Greg KH <gregkh@linuxfoundation.org>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Ashok Raj <ashok.raj@intel.com>
>Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

>---
>v6:
>- got rid of save_spec_ctrl_on_exit
>- introduce msr_write_intercepted
>v5:
>- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
>v4:
>- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
>- Handling nested guests
>v3:
>- Save/restore manually
>- Fix CPUID handling
>- Fix a copy & paste error in the name of SPEC_CTRL MSR in
>  disable_intercept.
>- support !cpu_has_vmx_msr_bitmap()
>v2:
>- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
>- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
>  when the instance never used the MSR (dwmw@).
>- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
>- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
>---
> arch/x86/kvm/cpuid.c |   9 +++--
> arch/x86/kvm/vmx.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> arch/x86/kvm/x86.c   |   2 +-
> 3 files changed, 110 insertions(+), 6 deletions(-)
>
>diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>index 1909635..13f5d42 100644
>--- a/arch/x86/kvm/cpuid.c
>+++ b/arch/x86/kvm/cpuid.c
>@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>
> 	/* cpuid 0x80000008.ebx */
> 	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
>-		F(IBPB);
>+		F(IBPB) | F(IBRS);
>
> 	/* cpuid 0xC0000001.edx */
> 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
>@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>
> 	/* cpuid 7.0.edx*/
> 	const u32 kvm_cpuid_7_0_edx_x86_features =
>-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
>+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>+		F(ARCH_CAPABILITIES);
>
> 	/* all calls to cpuid_count() should be made on the same cpu */
> 	get_cpu();
>@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> 			g_phys_as = phys_as;
> 		entry->eax = g_phys_as | (virt_as << 8);
> 		entry->edx = 0;
>-		/* IBPB isn't necessarily present in hardware cpuid */
>+		/* IBRS and IBPB aren't necessarily present in hardware cpuid */
> 		if (boot_cpu_has(X86_FEATURE_IBPB))
> 			entry->ebx |= F(IBPB);
>+		if (boot_cpu_has(X86_FEATURE_IBRS))
>+			entry->ebx |= F(IBRS);
> 		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> 		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> 		break;
>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>index b13314a..5d8a6a91 100644
>--- a/arch/x86/kvm/vmx.c
>+++ b/arch/x86/kvm/vmx.c
>@@ -594,6 +594,7 @@ struct vcpu_vmx {
> #endif
>
> 	u64 		      arch_capabilities;
>+	u64 		      spec_ctrl;
>
> 	u32 vm_entry_controls_shadow;
> 	u32 vm_exit_controls_shadow;
>@@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
> }
>
> /*
>+ * Check if MSR is intercepted for currently loaded MSR bitmap.
>+ */
>+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
>+{
>+	unsigned long *msr_bitmap;
>+	int f = sizeof(unsigned long);
>+
>+	if (!cpu_has_vmx_msr_bitmap())
>+		return true;
>+
>+	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
>+
>+	if (msr <= 0x1fff) {
>+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
>+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
>+		msr &= 0x1fff;
>+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
>+	}
>+
>+	return true;
>+}
>+
>+/*
>  * Check if MSR is intercepted for L01 MSR bitmap.
>  */
> static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
>@@ -3264,6 +3288,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_IA32_TSC:
> 		msr_info->data = guest_read_tsc(vcpu);
> 		break;
>+	case MSR_IA32_SPEC_CTRL:
>+		if (!msr_info->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
>+			return 1;
>+
>+		msr_info->data = to_vmx(vcpu)->spec_ctrl;
>+		break;
> 	case MSR_IA32_ARCH_CAPABILITIES:
> 		if (!msr_info->host_initiated &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
>@@ -3377,6 +3409,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_IA32_TSC:
> 		kvm_write_tsc(vcpu, msr_info);
> 		break;
>+	case MSR_IA32_SPEC_CTRL:
>+		if (!msr_info->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
>+			return 1;
>+
>+		/* The STIBP bit doesn't fault even if it's not advertised */
>+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
>+			return 1;
>+
>+		vmx->spec_ctrl = data;
>+
>+		if (!data)
>+			break;
>+
>+		/*
>+		 * For non-nested:
>+		 * When it's written (to non-zero) for the first time, pass
>+		 * it through.
>+		 *
>+		 * For nested:
>+		 * The handling of the MSR bitmap for L2 guests is done in
>+		 * nested_vmx_merge_msr_bitmap. We should not touch the
>+		 * vmcs02.msr_bitmap here since it gets completely overwritten
>+		 * in the merging. We update the vmcs01 here for L1 as well
>+		 * since it will end up touching the MSR anyway now.
>+		 */
>+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
>+					      MSR_IA32_SPEC_CTRL,
>+					      MSR_TYPE_RW);
>+		break;
> 	case MSR_IA32_PRED_CMD:
> 		if (!msr_info->host_initiated &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
>@@ -5702,6 +5765,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> 	u64 cr0;
>
> 	vmx->rmode.vm86_active = 0;
>+	vmx->spec_ctrl = 0;
>
> 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> 	kvm_set_cr8(vcpu, 0);
>@@ -9373,6 +9437,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> 	vmx_arm_hv_timer(vcpu);
>
>+	/*
>+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
>+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
>+	 * is no need to worry about the conditional branch over the wrmsr
>+	 * being speculatively taken.
>+	 */
>+	if (vmx->spec_ctrl)
>+		wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
>+
> 	vmx->__launched = vmx->loaded_vmcs->launched;
> 	asm(
> 		/* Store host registers */
>@@ -9491,6 +9564,27 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
> #endif
> 	      );
>
>+	/*
>+	 * We do not use IBRS in the kernel. If this vCPU has used the
>+	 * SPEC_CTRL MSR it may have left it on; save the value and
>+	 * turn it off. This is much more efficient than blindly adding
>+	 * it to the atomic save/restore list. Especially as the former
>+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
>+	 *
>+	 * For non-nested case:
>+	 * If the L01 MSR bitmap does not intercept the MSR, then we need to
>+	 * save it.
>+	 *
>+	 * For nested case:
>+	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
>+	 * save it.
>+	 */
>+	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
>+		rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
>+
>+	if (vmx->spec_ctrl)
>+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>+
> 	/* Eliminate branch target predictions from guest mode */
> 	vmexit_fill_RSB();
>
>@@ -10115,7 +10209,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
> 	unsigned long *msr_bitmap_l1;
> 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
> 	/*
>-	 * pred_cmd is trying to verify two things:
>+	 * pred_cmd & spec_ctrl are trying to verify two things:
> 	 *
> 	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
> 	 *    ensures that we do not accidentally generate an L02 MSR bitmap
>@@ -10128,9 +10222,10 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
> 	 *    the MSR.
> 	 */
> 	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
>+	bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
>
> 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
>-	    !pred_cmd)
>+	    !pred_cmd && !spec_ctrl)
> 		return false;
>
> 	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
>@@ -10164,6 +10259,12 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
> 		}
> 	}
>
>+	if (spec_ctrl)
>+		nested_vmx_disable_intercept_for_msr(
>+					msr_bitmap_l1, msr_bitmap_l0,
>+					MSR_IA32_SPEC_CTRL,
>+					MSR_TYPE_R | MSR_TYPE_W);
>+
> 	if (pred_cmd)
> 		nested_vmx_disable_intercept_for_msr(
> 					msr_bitmap_l1, msr_bitmap_l0,
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 4ec142e..ac38143 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1009,7 +1009,7 @@ static u32 msrs_to_save[] = {
> #endif
> 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
> 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
>-	MSR_IA32_ARCH_CAPABILITIES
>+	MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
> };
>
> static unsigned num_msrs_to_save;
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 5/5] KVM: SVM: " KarimAllah Ahmed
@ 2018-02-02 11:06   ` Darren Kenny
  2018-02-02 18:02   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 81+ messages in thread
From: Darren Kenny @ 2018-02-02 11:06 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, David Woodhouse, Greg KH, Andy Lutomirski,
	Ashok Raj

On Thu, Feb 01, 2018 at 10:59:46PM +0100, KarimAllah Ahmed wrote:
>[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]
>
>... basically doing exactly what we do for VMX:
>
>- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
>- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
>  actually used it.
>
>Cc: Asit Mallick <asit.k.mallick@intel.com>
>Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>Cc: Dave Hansen <dave.hansen@intel.com>
>Cc: Andi Kleen <ak@linux.intel.com>
>Cc: Andrea Arcangeli <aarcange@redhat.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Tim Chen <tim.c.chen@linux.intel.com>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Jun Nakajima <jun.nakajima@intel.com>
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: David Woodhouse <dwmw@amazon.co.uk>
>Cc: Greg KH <gregkh@linuxfoundation.org>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Ashok Raj <ashok.raj@intel.com>
>Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

>---
>v5:
>- Add SPEC_CTRL to direct_access_msrs.
>---
> arch/x86/kvm/svm.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 59 insertions(+)
>
>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>index 254eefb..c6ab343 100644
>--- a/arch/x86/kvm/svm.c
>+++ b/arch/x86/kvm/svm.c
>@@ -184,6 +184,9 @@ struct vcpu_svm {
> 		u64 gs_base;
> 	} host;
>
>+	u64 spec_ctrl;
>+	bool save_spec_ctrl_on_exit;
>+
> 	u32 *msrpm;
>
> 	ulong nmi_iret_rip;
>@@ -249,6 +252,7 @@ static const struct svm_direct_access_msrs {
> 	{ .index = MSR_CSTAR,				.always = true  },
> 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
> #endif
>+	{ .index = MSR_IA32_SPEC_CTRL,			.always = false },
> 	{ .index = MSR_IA32_PRED_CMD,			.always = false },
> 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
> 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
>@@ -1584,6 +1588,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> 	u32 dummy;
> 	u32 eax = 1;
>
>+	svm->spec_ctrl = 0;
>+
> 	if (!init_event) {
> 		svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
> 					   MSR_IA32_APICBASE_ENABLE;
>@@ -3605,6 +3611,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_VM_CR:
> 		msr_info->data = svm->nested.vm_cr_msr;
> 		break;
>+	case MSR_IA32_SPEC_CTRL:
>+		if (!msr_info->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
>+			return 1;
>+
>+		msr_info->data = svm->spec_ctrl;
>+		break;
> 	case MSR_IA32_UCODE_REV:
> 		msr_info->data = 0x01000065;
> 		break;
>@@ -3696,6 +3709,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> 	case MSR_IA32_TSC:
> 		kvm_write_tsc(vcpu, msr);
> 		break;
>+	case MSR_IA32_SPEC_CTRL:
>+		if (!msr->host_initiated &&
>+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
>+			return 1;
>+
>+		/* The STIBP bit doesn't fault even if it's not advertised */
>+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
>+			return 1;
>+
>+		svm->spec_ctrl = data;
>+
>+		/*
>+		 * When it's written (to non-zero) for the first time, pass
>+		 * it through. This means we don't have to take the perf
>+		 * hit of saving it on vmexit for the common case of guests
>+		 * that don't use it.
>+		 */
>+		if (data && !svm->save_spec_ctrl_on_exit) {
>+			svm->save_spec_ctrl_on_exit = true;
>+			if (is_guest_mode(vcpu))
>+				break;
>+			set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
>+		}
>+		break;
> 	case MSR_IA32_PRED_CMD:
> 		if (!msr->host_initiated &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
>@@ -4964,6 +5001,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>
> 	local_irq_enable();
>
>+	/*
>+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
>+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
>+	 * is no need to worry about the conditional branch over the wrmsr
>+	 * being speculatively taken.
>+	 */
>+	if (svm->spec_ctrl)
>+		wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
>+
> 	asm volatile (
> 		"push %%" _ASM_BP "; \n\t"
> 		"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
>@@ -5056,6 +5102,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
> #endif
> 		);
>
>+	/*
>+	 * We do not use IBRS in the kernel. If this vCPU has used the
>+	 * SPEC_CTRL MSR it may have left it on; save the value and
>+	 * turn it off. This is much more efficient than blindly adding
>+	 * it to the atomic save/restore list. Especially as the former
>+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
>+	 */
>+	if (svm->save_spec_ctrl_on_exit)
>+		rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
>+
>+	if (svm->spec_ctrl)
>+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>+
> 	/* Eliminate branch target predictions from guest mode */
> 	vmexit_fill_RSB();
>
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
@ 2018-02-02 11:27     ` David Woodhouse
  2018-02-02 11:27     ` David Woodhouse
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2018-02-02 11:27 UTC (permalink / raw)
  To: KarimAllah Ahmed, kvm, linux-kernel, x86
  Cc: Asit Mallick, Arjan Van De Ven, Dave Hansen, Andi Kleen,
	Andrea Arcangeli, Linus Torvalds, Tim Chen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Paolo Bonzini, Greg KH,
	Andy Lutomirski, Ashok Raj

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

On Thu, 2018-02-01 at 22:59 +0100, KarimAllah Ahmed wrote:

> [ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
> 
> Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
> guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
> be using a retpoline+IBPB based approach.
> 
> To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
> guests that do not actually use the MSR, only start saving and restoring
> when a non-zero is written to it.
> 
> No attempt is made to handle STIBP here, intentionally. Filtering STIBP
> may be added in a future patch, which may require trapping all writes
> if we don't want to pass it through directly to the guest.
> 
> [dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
> 
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> v6:
> - got rid of save_spec_ctrl_on_exit
> - introduce msr_write_intercepted
> v5:
> - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
> v4:
> - Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
> - Handling nested guests
> v3:
> - Save/restore manually
> - Fix CPUID handling
> - Fix a copy & paste error in the name of SPEC_CTRL MSR in
>   disable_intercept.
> - support !cpu_has_vmx_msr_bitmap()
> v2:
> - remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
> - special case writing '0' in SPEC_CTRL to avoid confusing live-migration
>   when the instance never used the MSR (dwmw@).
> - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
> - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).

This looks very good to me now, and the comments are helpful. Thank you
for your persistence with getting the details right. If we make the SVM
one look like this, as you mentioned, I think we ought finally be ready
to merge it.

Good work ;)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
@ 2018-02-02 11:27     ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2018-02-02 11:27 UTC (permalink / raw)
  To: KarimAllah Ahmed, kvm, linux-kernel, x86
  Cc: Asit Mallick, Arjan Van De Ven, Dave Hansen, Andi Kleen,
	Andrea Arcangeli, Linus Torvalds, Tim Chen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Paolo Bonzini, Greg KH,
	Andy Lutomirski, Ashok Raj

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

On Thu, 2018-02-01 at 22:59 +0100, KarimAllah Ahmed wrote:

> [ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
> 
> Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
> guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
> be using a retpoline+IBPB based approach.
> 
> To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
> guests that do not actually use the MSR, only start saving and restoring
> when a non-zero is written to it.
> 
> No attempt is made to handle STIBP here, intentionally. Filtering STIBP
> may be added in a future patch, which may require trapping all writes
> if we don't want to pass it through directly to the guest.
> 
> [dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
> 
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> v6:
> - got rid of save_spec_ctrl_on_exit
> - introduce msr_write_intercepted
> v5:
> - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
> v4:
> - Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
> - Handling nested guests
> v3:
> - Save/restore manually
> - Fix CPUID handling
> - Fix a copy & paste error in the name of SPEC_CTRL MSR in
>   disable_intercept.
> - support !cpu_has_vmx_msr_bitmap()
> v2:
> - remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
> - special case writing '0' in SPEC_CTRL to avoid confusing live-migration
>   when the instance never used the MSR (dwmw@).
> - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
> - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).

This looks very good to me now, and the comments are helpful. Thank you
for your persistence with getting the details right. If we make the SVM
one look like this, as you mentioned, I think we ought finally be ready
to merge it.

Good work ;)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-02-02 10:53   ` Darren Kenny
@ 2018-02-02 17:35     ` Jim Mattson
  0 siblings, 0 replies; 81+ messages in thread
From: Jim Mattson @ 2018-02-02 17:35 UTC (permalink / raw)
  To: KarimAllah Ahmed, kvm list, LKML, the arch/x86 maintainers,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Ashok Raj, David Woodhouse

On Fri, Feb 2, 2018 at 2:53 AM, Darren Kenny <darren.kenny@oracle.com> wrote:
> On Thu, Feb 01, 2018 at 10:59:44PM +0100, KarimAllah Ahmed wrote:
>>
>> Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
>> (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
>> contents will come directly from the hardware, but user-space can still
>> override it.
>>
>> [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
>>
>> Cc: Asit Mallick <asit.k.mallick@intel.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>> Cc: Tim Chen <tim.c.chen@linux.intel.com>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Greg KH <gregkh@linuxfoundation.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>
>
> Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  2018-02-01 21:59 ` [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KarimAllah Ahmed
@ 2018-02-02 17:37   ` Jim Mattson
  2018-02-03 22:50   ` [tip:x86/pti] KVM/x86: " tip-bot for KarimAllah Ahmed
  1 sibling, 0 replies; 81+ messages in thread
From: Jim Mattson @ 2018-02-02 17:37 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm list, LKML, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, David Woodhouse

On Thu, Feb 1, 2018 at 1:59 PM, KarimAllah Ahmed <karahmed@amazon.de> wrote:
> [dwmw2: Stop using KF() for bits in it, too]
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
@ 2018-02-02 17:49   ` Konrad Rzeszutek Wilk
  2018-02-02 18:02       ` David Woodhouse
  2018-02-03 22:50   ` [tip:x86/pti] KVM/x86: " tip-bot for Ashok Raj
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 81+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 17:49 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Ashok Raj, Asit Mallick, Dave Hansen,
	Arjan Van De Ven, Tim Chen, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Andy Lutomirski, Greg KH, Paolo Bonzini, Peter Zijlstra,
	David Woodhouse

On Thu, Feb 01, 2018 at 10:59:43PM +0100, KarimAllah Ahmed wrote:
> From: Ashok Raj <ashok.raj@intel.com>
> 
> The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
> control mechanism. It keeps earlier branches from influencing
> later ones.
> 
> Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
> It's a command that ensures predicted branch targets aren't used after
> the barrier. Although IBRS and IBPB are enumerated by the same CPUID
> enumeration, IBPB is very different.
> 
> IBPB helps mitigate against three potential attacks:
> 
> * Mitigate guests from being attacked by other guests.
>   - This is addressed by issing IBPB when we do a guest switch.
> 
> * Mitigate attacks from guest/ring3->host/ring3.
>   These would require a IBPB during context switch in host, or after
>   VMEXIT. The host process has two ways to mitigate
>   - Either it can be compiled with retpoline
>   - If its going through context switch, and has set !dumpable then
>     there is a IBPB in that path.
>     (Tim's patch: https://patchwork.kernel.org/patch/10192871)
>   - The case where after a VMEXIT you return back to Qemu might make
>     Qemu attackable from guest when Qemu isn't compiled with retpoline.
>   There are issues reported when doing IBPB on every VMEXIT that resulted
>   in some tsc calibration woes in guest.
> 
> * Mitigate guest/ring0->host/ring0 attacks.
>   When host kernel is using retpoline it is safe against these attacks.
>   If host kernel isn't using retpoline we might need to do a IBPB flush on
>   every VMEXIT.
> 
> Even when using retpoline for indirect calls, in certain conditions 'ret'
> can use the BTB on Skylake-era CPUs. There are other mitigations
> available like RSB stuffing/clearing.
> 
> * IBPB is issued only for SVM during svm_free_vcpu().
>   VMX has a vmclear and SVM doesn't.  Follow discussion here:
>   https://lkml.org/lkml/2018/1/15/146
> 
> Please refer to the following spec for more details on the enumeration
> and control.
> 
> Refer here to get documentation about mitigations.
> 
> https://software.intel.com/en-us/side-channel-security-support
> 
> [peterz: rebase and changelog rewrite]
> [karahmed: - rebase
>            - vmx: expose PRED_CMD if guest has it in CPUID
>            - svm: only pass through IBPB if guest has it in CPUID
>            - vmx: support !cpu_has_vmx_msr_bitmap()]
>            - vmx: support nested]
> [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
>         PRED_CMD is a write-only MSR]
> 
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

with some small nits.
> ---
> v6:
> - introduce msr_write_intercepted_l01
> 
> v5:
> - Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR.
> - Always merge the bitmaps unconditionally.
> - Add PRED_CMD to direct_access_msrs.
> - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
> - rewrite the commit message (from ashok.raj@)
> ---
>  arch/x86/kvm/cpuid.c | 11 +++++++-
>  arch/x86/kvm/svm.c   | 28 ++++++++++++++++++
>  arch/x86/kvm/vmx.c   | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 116 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index c0eb337..033004d 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
>  		0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
>  
> +	/* cpuid 0x80000008.ebx */
> +	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
> +		F(IBPB);
> +
>  	/* cpuid 0xC0000001.edx */
>  	const u32 kvm_cpuid_C000_0001_edx_x86_features =
>  		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
> @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		if (!g_phys_as)
>  			g_phys_as = phys_as;
>  		entry->eax = g_phys_as | (virt_as << 8);
> -		entry->ebx = entry->edx = 0;
> +		entry->edx = 0;
> +		/* IBPB isn't necessarily present in hardware cpuid */

It is with x86/pti nowadays. I think you can remove that comment.

..snip..
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d46a61b..263eb1f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -592,6 +592,7 @@ struct vcpu_vmx {
>  	u64 		      msr_host_kernel_gs_base;
>  	u64 		      msr_guest_kernel_gs_base;
>  #endif
> +

Spurious..

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-02-01 21:59 ` [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KarimAllah Ahmed
  2018-02-02 10:53   ` Darren Kenny
@ 2018-02-02 17:51   ` Konrad Rzeszutek Wilk
  2018-02-03 22:51   ` [tip:x86/pti] KVM/VMX: " tip-bot for KarimAllah Ahmed
  2 siblings, 0 replies; 81+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 17:51 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Dave Hansen,
	Arjan Van De Ven, Tim Chen, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Andy Lutomirski, Greg KH, Paolo Bonzini, Ashok Raj,
	David Woodhouse

On Thu, Feb 01, 2018 at 10:59:44PM +0100, KarimAllah Ahmed wrote:
> Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
> (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
> contents will come directly from the hardware, but user-space can still
> override it.
> 
> [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
> 
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
  2018-02-02 11:03   ` Darren Kenny
  2018-02-02 11:27     ` David Woodhouse
@ 2018-02-02 17:53   ` Konrad Rzeszutek Wilk
  2018-02-02 18:05     ` David Woodhouse
  2018-02-02 17:57   ` Jim Mattson
  2018-02-03 22:51   ` [tip:x86/pti] KVM/VMX: " tip-bot for KarimAllah Ahmed
  4 siblings, 1 reply; 81+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 17:53 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, David Woodhouse, Greg KH, Andy Lutomirski,
	Ashok Raj

.snip..
> @@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
>  }
>  
>  /*
> + * Check if MSR is intercepted for currently loaded MSR bitmap.
> + */
> +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> +{
> +	unsigned long *msr_bitmap;
> +	int f = sizeof(unsigned long);

unsigned int
> +
> +	if (!cpu_has_vmx_msr_bitmap())
> +		return true;
> +
> +	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
> +
> +	if (msr <= 0x1fff) {
> +		return !!test_bit(msr, msr_bitmap + 0x800 / f);
> +	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
> +		msr &= 0x1fff;
> +		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
> +	}
> +
> +	return true;
> +}

with that:

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
                     ` (2 preceding siblings ...)
  2018-02-02 17:53   ` Konrad Rzeszutek Wilk
@ 2018-02-02 17:57   ` Jim Mattson
  2018-02-03 22:51   ` [tip:x86/pti] KVM/VMX: " tip-bot for KarimAllah Ahmed
  4 siblings, 0 replies; 81+ messages in thread
From: Jim Mattson @ 2018-02-02 17:57 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm list, LKML, the arch/x86 maintainers, Asit Mallick,
	Arjan Van De Ven, Dave Hansen, Andi Kleen, Andrea Arcangeli,
	Linus Torvalds, Tim Chen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Paolo Bonzini, David Woodhouse, Greg KH,
	Andy Lutomirski, Ashok Raj

On Thu, Feb 1, 2018 at 1:59 PM, KarimAllah Ahmed <karahmed@amazon.de> wrote:
> [ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
>
> Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
> guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
> be using a retpoline+IBPB based approach.
>
> To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
> guests that do not actually use the MSR, only start saving and restoring
> when a non-zero is written to it.
>
> No attempt is made to handle STIBP here, intentionally. Filtering STIBP
> may be added in a future patch, which may require trapping all writes
> if we don't want to pass it through directly to the guest.
>
> [dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
>
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 17:49   ` Konrad Rzeszutek Wilk
@ 2018-02-02 18:02       ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2018-02-02 18:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Ashok Raj, Asit Mallick, Dave Hansen,
	Arjan Van De Ven, Tim Chen, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Andy Lutomirski, Greg KH, Paolo Bonzini, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]

On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > kvm_cpuid_entry2 *entry, u32 function,
> >                 if (!g_phys_as)
> >                         g_phys_as = phys_as;
> >                 entry->eax = g_phys_as | (virt_as << 8);
> > -               entry->ebx = entry->edx = 0;
> > +               entry->edx = 0;
> > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > +                       entry->ebx |= F(IBPB);
> > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> 
> It is with x86/pti nowadays. I think you can remove that comment.

In this code we use the actual CPUID instruction, then filter stuff out
of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
off any bits which were otherwise present in the hardware and *would*
have been supported by KVM, but which the kernel has decided to pretend
are not present.

Nothing would *set* the IBPB bit though, since that's a "virtual" bit
on Intel hardware. The comment explains why we have that |= F(IBPB),
and if the comment wasn't true, we wouldn't need that code either.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
@ 2018-02-02 18:02       ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2018-02-02 18:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Ashok Raj, Asit Mallick, Dave Hansen,
	Arjan Van De Ven, Tim Chen, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Andy Lutomirski, Greg KH, Paolo Bonzini, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]

On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > kvm_cpuid_entry2 *entry, u32 function,
> >                 if (!g_phys_as)
> >                         g_phys_as = phys_as;
> >                 entry->eax = g_phys_as | (virt_as << 8);
> > -               entry->ebx = entry->edx = 0;
> > +               entry->edx = 0;
> > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > +                       entry->ebx |= F(IBPB);
> > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> 
> It is with x86/pti nowadays. I think you can remove that comment.

In this code we use the actual CPUID instruction, then filter stuff out
of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
off any bits which were otherwise present in the hardware and *would*
have been supported by KVM, but which the kernel has decided to pretend
are not present.

Nothing would *set* the IBPB bit though, since that's a "virtual" bit
on Intel hardware. The comment explains why we have that |= F(IBPB),
and if the comment wasn't true, we wouldn't need that code either.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 5/5] KVM: SVM: " KarimAllah Ahmed
  2018-02-02 11:06   ` Darren Kenny
@ 2018-02-02 18:02   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 81+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 18:02 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, David Woodhouse, Greg KH, Andy Lutomirski,
	Ashok Raj

On Thu, Feb 01, 2018 at 10:59:46PM +0100, KarimAllah Ahmed wrote:
> [ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]
> 
> ... basically doing exactly what we do for VMX:
> 
> - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
> - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
>   actually used it.
> 
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> +	{ .index = MSR_IA32_SPEC_CTRL,			.always = false },


This .always = [false|true] field keeps throwing me off.

So glad: https://www.spinics.net/lists/kvm/msg161606.html explains it better.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-02 17:53   ` Konrad Rzeszutek Wilk
@ 2018-02-02 18:05     ` David Woodhouse
  2018-02-02 18:19       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 81+ messages in thread
From: David Woodhouse @ 2018-02-02 18:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, KarimAllah Ahmed
  Cc: kvm, linux-kernel, x86, Asit Mallick, Arjan Van De Ven,
	Dave Hansen, Andi Kleen, Andrea Arcangeli, Linus Torvalds,
	Tim Chen, Thomas Gleixner, Dan Williams, Jun Nakajima,
	Paolo Bonzini, Greg KH, Andy Lutomirski, Ashok Raj

[-- Attachment #1: Type: text/plain, Size: 837 bytes --]

On Fri, 2018-02-02 at 12:53 -0500, Konrad Rzeszutek Wilk wrote:
> .snip..
> > 
> > @@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct
> > kvm_vcpu *vcpu)
> >  }
> >  
> >  /*
> > + * Check if MSR is intercepted for currently loaded MSR bitmap.
> > + */
> > +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> > +{
> > +	unsigned long *msr_bitmap;
> > +	int f = sizeof(unsigned long);
>
> unsigned int

$ grep -n 'f = sizeof' vmx.c
1934:	int f = sizeof(unsigned long);
5013:	int f = sizeof(unsigned long);
5048:	int f = sizeof(unsigned long);
5097:	int f = sizeof(unsigned long);

It sucks enough that we're doing this stuff repeatedly, and it's a
prime candidate for cleaning up, but I wasn't going to send Karim off
to bikeshed that today. Let's at least keep it consistent.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-02 18:05     ` David Woodhouse
@ 2018-02-02 18:19       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 81+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 18:19 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Asit Mallick,
	Arjan Van De Ven, Dave Hansen, Andi Kleen, Andrea Arcangeli,
	Linus Torvalds, Tim Chen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Paolo Bonzini, Greg KH, Andy Lutomirski, Ashok Raj

On Fri, Feb 02, 2018 at 06:05:54PM +0000, David Woodhouse wrote:
> On Fri, 2018-02-02 at 12:53 -0500, Konrad Rzeszutek Wilk wrote:
> > .snip..
> > > 
> > > @@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct
> > > kvm_vcpu *vcpu)
> > >  }
> > >  
> > >  /*
> > > + * Check if MSR is intercepted for currently loaded MSR bitmap.
> > > + */
> > > +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> > > +{
> > > +	unsigned long *msr_bitmap;
> > > +	int f = sizeof(unsigned long);
> >
> > unsigned int
> 
> $ grep -n 'f = sizeof' vmx.c
> 1934:	int f = sizeof(unsigned long);
> 5013:	int f = sizeof(unsigned long);
> 5048:	int f = sizeof(unsigned long);
> 5097:	int f = sizeof(unsigned long);
> 
> It sucks enough that we're doing this stuff repeatedly, and it's a
> prime candidate for cleaning up, but I wasn't going to send Karim off
> to bikeshed that today. Let's at least keep it consistent.

Sure.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 18:02       ` David Woodhouse
  (?)
@ 2018-02-02 19:56       ` Konrad Rzeszutek Wilk
  2018-02-02 20:16           ` David Woodhouse
  -1 siblings, 1 reply; 81+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 19:56 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra

On Fri, Feb 02, 2018 at 06:02:24PM +0000, David Woodhouse wrote:
> On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > > kvm_cpuid_entry2 *entry, u32 function,
> > >                 if (!g_phys_as)
> > >                         g_phys_as = phys_as;
> > >                 entry->eax = g_phys_as | (virt_as << 8);
> > > -               entry->ebx = entry->edx = 0;
> > > +               entry->edx = 0;
> > > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > > +                       entry->ebx |= F(IBPB);
> > > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> > 
> > It is with x86/pti nowadays. I think you can remove that comment.
> 
> In this code we use the actual CPUID instruction, then filter stuff out
> of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
> off any bits which were otherwise present in the hardware and *would*
> have been supported by KVM, but which the kernel has decided to pretend
> are not present.
> 
> Nothing would *set* the IBPB bit though, since that's a "virtual" bit
> on Intel hardware. The comment explains why we have that |= F(IBPB),
> and if the comment wasn't true, we wouldn't need that code either.

But this seems wrong. That is on Intel CPUs we will advertise on
AMD leaf that the IBPB feature is available.

Shouldn't we just check to see if the machine is AMD before advertising
this bit?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 19:56       ` Konrad Rzeszutek Wilk
@ 2018-02-02 20:16           ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2018-02-02 20:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]



On Fri, 2018-02-02 at 14:56 -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Feb 02, 2018 at 06:02:24PM +0000, David Woodhouse wrote:
> > 
> > On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > > 
> > > > 
> > > > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > > > kvm_cpuid_entry2 *entry, u32 function,
> > > >                  if (!g_phys_as)
> > > >                          g_phys_as = phys_as;
> > > >                  entry->eax = g_phys_as | (virt_as << 8);
> > > > -               entry->ebx = entry->edx = 0;
> > > > +               entry->edx = 0;
> > > > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > > > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > > > +                       entry->ebx |= F(IBPB);
> > > > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > > > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> > > It is with x86/pti nowadays. I think you can remove that comment.
> > In this code we use the actual CPUID instruction, then filter stuff out
> > of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
> > off any bits which were otherwise present in the hardware and *would*
> > have been supported by KVM, but which the kernel has decided to pretend
> > are not present.
> > 
> > Nothing would *set* the IBPB bit though, since that's a "virtual" bit
> > on Intel hardware. The comment explains why we have that |= F(IBPB),
> > and if the comment wasn't true, we wouldn't need that code either.
>
> But this seems wrong. That is on Intel CPUs we will advertise on
> AMD leaf that the IBPB feature is available.
> 
> Shouldn't we just check to see if the machine is AMD before advertising
> this bit?

No. The AMD feature bits give us more fine-grained support for exposing
IBPB or IBRS alone, so we expose those bits on Intel too.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
@ 2018-02-02 20:16           ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2018-02-02 20:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]



On Fri, 2018-02-02 at 14:56 -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Feb 02, 2018 at 06:02:24PM +0000, David Woodhouse wrote:
> > 
> > On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > > 
> > > > 
> > > > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > > > kvm_cpuid_entry2 *entry, u32 function,
> > > >                  if (!g_phys_as)
> > > >                          g_phys_as = phys_as;
> > > >                  entry->eax = g_phys_as | (virt_as << 8);
> > > > -               entry->ebx = entry->edx = 0;
> > > > +               entry->edx = 0;
> > > > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > > > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > > > +                       entry->ebx |= F(IBPB);
> > > > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > > > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> > > It is with x86/pti nowadays. I think you can remove that comment.
> > In this code we use the actual CPUID instruction, then filter stuff out
> > of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
> > off any bits which were otherwise present in the hardware and *would*
> > have been supported by KVM, but which the kernel has decided to pretend
> > are not present.
> > 
> > Nothing would *set* the IBPB bit though, since that's a "virtual" bit
> > on Intel hardware. The comment explains why we have that |= F(IBPB),
> > and if the comment wasn't true, we wouldn't need that code either.
>
> But this seems wrong. That is on Intel CPUs we will advertise on
> AMD leaf that the IBPB feature is available.
> 
> Shouldn't we just check to see if the machine is AMD before advertising
> this bit?

No. The AMD feature bits give us more fine-grained support for exposing
IBPB or IBRS alone, so we expose those bits on Intel too.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:16           ` David Woodhouse
  (?)
@ 2018-02-02 20:28           ` Konrad Rzeszutek Wilk
  2018-02-02 20:31               ` David Woodhouse
                               ` (2 more replies)
  -1 siblings, 3 replies; 81+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 20:28 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra

On Fri, Feb 02, 2018 at 08:16:15PM +0000, David Woodhouse wrote:
> 
> 
> On Fri, 2018-02-02 at 14:56 -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Feb 02, 2018 at 06:02:24PM +0000, David Woodhouse wrote:
> > > 
> > > On Fri, 2018-02-02 at 12:49 -0500, Konrad Rzeszutek Wilk wrote:
> > > > 
> > > > > 
> > > > > @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
> > > > > kvm_cpuid_entry2 *entry, u32 function,
> > > > >                  if (!g_phys_as)
> > > > >                          g_phys_as = phys_as;
> > > > >                  entry->eax = g_phys_as | (virt_as << 8);
> > > > > -               entry->ebx = entry->edx = 0;
> > > > > +               entry->edx = 0;
> > > > > +               /* IBPB isn't necessarily present in hardware cpuid>  */
> > > > > +               if (boot_cpu_has(X86_FEATURE_IBPB))
> > > > > +                       entry->ebx |= F(IBPB);
> > > > > +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> > > > > +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
> > > > It is with x86/pti nowadays. I think you can remove that comment.
> > > In this code we use the actual CPUID instruction, then filter stuff out
> > > of it (with &= kvm_cpuid_XXX_x86_features and then cpuid_mask() to turn
> > > off any bits which were otherwise present in the hardware and *would*
> > > have been supported by KVM, but which the kernel has decided to pretend
> > > are not present.
> > > 
> > > Nothing would *set* the IBPB bit though, since that's a "virtual" bit
> > > on Intel hardware. The comment explains why we have that |= F(IBPB),
> > > and if the comment wasn't true, we wouldn't need that code either.
> >
> > But this seems wrong. That is on Intel CPUs we will advertise on
> > AMD leaf that the IBPB feature is available.
> > 
> > Shouldn't we just check to see if the machine is AMD before advertising
> > this bit?
> 
> No. The AMD feature bits give us more fine-grained support for exposing
> IBPB or IBRS alone, so we expose those bits on Intel too.

But but.. that runs smack against the idea of exposing a platform that
is as close to emulating the real hardware as possible.

As in I would never expect an Intel CPU to expose the IBPB on the 0x8000_0008
leaf. Hence KVM (nor any hypervisor) should not do it either.

Unless Intel is doing it? Did I miss a new spec update?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:28           ` Konrad Rzeszutek Wilk
@ 2018-02-02 20:31               ` David Woodhouse
  2018-02-02 20:52             ` Alan Cox
  2018-02-05 19:24             ` Paolo Bonzini
  2 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2018-02-02 20:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 664 bytes --]

On Fri, 2018-02-02 at 15:28 -0500, Konrad Rzeszutek Wilk wrote:
> 
> > 
> > No. The AMD feature bits give us more fine-grained support for exposing
> > IBPB or IBRS alone, so we expose those bits on Intel too.
> 
> But but.. that runs smack against the idea of exposing a platform that
> is as close to emulating the real hardware as possible.
> 
> As in I would never expect an Intel CPU to expose the IBPB on the 0x8000_0008
> leaf. Hence KVM (nor any hypervisor) should not do it either.
> 
> Unless Intel is doing it? Did I miss a new spec update?

Are you telling me there's no way you can infer from CPUID that you're
running in a hypervisor?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
@ 2018-02-02 20:31               ` David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2018-02-02 20:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 664 bytes --]

On Fri, 2018-02-02 at 15:28 -0500, Konrad Rzeszutek Wilk wrote:
> 
> > 
> > No. The AMD feature bits give us more fine-grained support for exposing
> > IBPB or IBRS alone, so we expose those bits on Intel too.
> 
> But but.. that runs smack against the idea of exposing a platform that
> is as close to emulating the real hardware as possible.
> 
> As in I would never expect an Intel CPU to expose the IBPB on the 0x8000_0008
> leaf. Hence KVM (nor any hypervisor) should not do it either.
> 
> Unless Intel is doing it? Did I miss a new spec update?

Are you telling me there's no way you can infer from CPUID that you're
running in a hypervisor?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:31               ` David Woodhouse
  (?)
@ 2018-02-02 20:52               ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 81+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-02 20:52 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra

On Fri, Feb 02, 2018 at 08:31:27PM +0000, David Woodhouse wrote:
> On Fri, 2018-02-02 at 15:28 -0500, Konrad Rzeszutek Wilk wrote:
> > 
> > > 
> > > No. The AMD feature bits give us more fine-grained support for exposing
> > > IBPB or IBRS alone, so we expose those bits on Intel too.
> > 
> > But but.. that runs smack against the idea of exposing a platform that
> > is as close to emulating the real hardware as possible.
> > 
> > As in I would never expect an Intel CPU to expose the IBPB on the 0x8000_0008
> > leaf. Hence KVM (nor any hypervisor) should not do it either.
> > 
> > Unless Intel is doing it? Did I miss a new spec update?
> 
> Are you telling me there's no way you can infer from CPUID that you're
> running in a hypervisor?

That is not what I am saying. The CPUIDs 0x40000000 ... 0x400000ff
are reserved for hypervisor usage. The SDM is pretty clear about it.

The Intel SDM and the AMD equivalant are pretty clear about what the
other leafs should have on its platform.

[5 minutes later]

And I am eating my words here. 

CPUID.80000008 shows how MAXPHYSADDR is used (on the Intel SDM).

Never mind the noise.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:28           ` Konrad Rzeszutek Wilk
  2018-02-02 20:31               ` David Woodhouse
@ 2018-02-02 20:52             ` Alan Cox
  2018-02-05 19:22               ` Paolo Bonzini
  2018-02-05 19:24             ` Paolo Bonzini
  2 siblings, 1 reply; 81+ messages in thread
From: Alan Cox @ 2018-02-02 20:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Woodhouse, KarimAllah Ahmed, kvm, linux-kernel, x86,
	Ashok Raj, Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra

> > No. The AMD feature bits give us more fine-grained support for exposing
> > IBPB or IBRS alone, so we expose those bits on Intel too.  
> 
> But but.. that runs smack against the idea of exposing a platform that
> is as close to emulating the real hardware as possible.

Agreed, and it's asking for problems in the future if for example Intel
or another non AMD vendor did ever use that leaf for something different.

Now whether there ought to be an MSR range every vendor agrees is never
implemented so software can use it is an interesting discussion.

Alan

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [tip:x86/pti] KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
  2018-02-01 21:59 ` [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KarimAllah Ahmed
  2018-02-02 17:37   ` Jim Mattson
@ 2018-02-03 22:50   ` tip-bot for KarimAllah Ahmed
  1 sibling, 0 replies; 81+ messages in thread
From: tip-bot for KarimAllah Ahmed @ 2018-02-03 22:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, pbonzini, konrad.wilk, jmattson, dwmw, linux-kernel, tglx,
	mingo, rkrcmar, karahmed

Commit-ID:  b7b27aa011a1df42728d1768fc181d9ce69e6911
Gitweb:     https://git.kernel.org/tip/b7b27aa011a1df42728d1768fc181d9ce69e6911
Author:     KarimAllah Ahmed <karahmed@amazon.de>
AuthorDate: Thu, 1 Feb 2018 22:59:42 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 3 Feb 2018 23:06:51 +0100

KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX

[dwmw2: Stop using KF() for bits in it, too]
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Link: https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karahmed@amazon.de

---
 arch/x86/kvm/cpuid.c | 8 +++-----
 arch/x86/kvm/cpuid.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW     2
-#define KVM_CPUID_BIT_AVX512_4FMAPS     3
+/* For scattered features from cpufeatures.h; we currently expose none */
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
 				entry->ecx &= ~F(PKU);
 			entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-			entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+			cpuid_mask(&entry->edx, CPUID_7_EDX);
 		} else {
 			entry->ebx = 0;
 			entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2cea66..9a327d5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
 	[CPUID_7_ECX]         = {         7, 0, CPUID_ECX},
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
+	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [tip:x86/pti] KVM/x86: Add IBPB support
  2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
  2018-02-02 17:49   ` Konrad Rzeszutek Wilk
@ 2018-02-03 22:50   ` tip-bot for Ashok Raj
  2018-02-16  3:44   ` [PATCH v6 2/5] KVM: x86: " Jim Mattson
  2018-05-03  1:27   ` Wanpeng Li
  3 siblings, 0 replies; 81+ messages in thread
From: tip-bot for Ashok Raj @ 2018-02-03 22:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, asit.k.mallick, luto, ashok.raj, gregkh,
	arjan.van.de.ven, peterz, pbonzini, dwmw, dan.j.williams, hpa,
	jun.nakajima, ak, konrad.wilk, aarcange, karahmed, dave.hansen,
	torvalds, tim.c.chen, tglx, mingo

Commit-ID:  15d45071523d89b3fb7372e2135fbd72f6af9506
Gitweb:     https://git.kernel.org/tip/15d45071523d89b3fb7372e2135fbd72f6af9506
Author:     Ashok Raj <ashok.raj@intel.com>
AuthorDate: Thu, 1 Feb 2018 22:59:43 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 3 Feb 2018 23:06:51 +0100

KVM/x86: Add IBPB support

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
    there is a IBPB in that path.
    (Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
    Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
           - vmx: expose PRED_CMD if guest has it in CPUID
           - svm: only pass through IBPB if guest has it in CPUID
           - vmx: support !cpu_has_vmx_msr_bitmap()]
           - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
        PRED_CMD is a write-only MSR]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: kvm@vger.kernel.org
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de

---
 arch/x86/kvm/cpuid.c | 11 +++++++-
 arch/x86/kvm/svm.c   | 28 ++++++++++++++++++
 arch/x86/kvm/vmx.c   | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
 		0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+	/* cpuid 0x80000008.ebx */
+	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+		F(IBPB);
+
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
 		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		if (!g_phys_as)
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
-		entry->ebx = entry->edx = 0;
+		entry->edx = 0;
+		/* IBPB isn't necessarily present in hardware cpuid */
+		if (boot_cpu_has(X86_FEATURE_IBPB))
+			entry->ebx |= F(IBPB);
+		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 		break;
 	}
 	case 0x80000019:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f40d0da..254eefb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -249,6 +249,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_CSTAR,				.always = true  },
 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
 #endif
+	{ .index = MSR_IA32_PRED_CMD,			.always = false },
 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
@@ -529,6 +530,7 @@ struct svm_cpu_data {
 	struct kvm_ldttss_desc *tss_desc;
 
 	struct page *save_area;
+	struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -1703,11 +1705,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
+	/*
+	 * The vmcb page can be recycled, causing a false negative in
+	 * svm_vcpu_load(). So do a full IBPB now.
+	 */
+	indirect_branch_prediction_barrier();
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 	int i;
 
 	if (unlikely(cpu != vcpu->cpu)) {
@@ -1736,6 +1744,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
+	if (sd->current_vmcb != svm->vmcb) {
+		sd->current_vmcb = svm->vmcb;
+		indirect_branch_prediction_barrier();
+	}
 	avic_vcpu_load(vcpu, cpu);
 }
 
@@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr);
 		break;
+	case MSR_IA32_PRED_CMD:
+		if (!msr->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
+			return 1;
+
+		if (data & ~PRED_CMD_IBPB)
+			return 1;
+
+		if (!data)
+			break;
+
+		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+		if (is_guest_mode(vcpu))
+			break;
+		set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
+		break;
 	case MSR_STAR:
 		svm->vmcb->save.star = data;
 		break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6ef2a7b..73acdcf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -593,6 +593,7 @@ struct vcpu_vmx {
 	u64 		      msr_host_kernel_gs_base;
 	u64 		      msr_guest_kernel_gs_base;
 #endif
+
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
 	u32 secondary_exec_control;
@@ -934,6 +935,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
 					    u16 error_code);
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+							  u32 msr, int type);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -1905,6 +1908,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 	vmcs_write32(EXCEPTION_BITMAP, eb);
 }
 
+/*
+ * Check if MSR is intercepted for L01 MSR bitmap.
+ */
+static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
+{
+	unsigned long *msr_bitmap;
+	int f = sizeof(unsigned long);
+
+	if (!cpu_has_vmx_msr_bitmap())
+		return true;
+
+	msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+
+	if (msr <= 0x1fff) {
+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+		msr &= 0x1fff;
+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+	}
+
+	return true;
+}
+
 static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
 		unsigned long entry, unsigned long exit)
 {
@@ -2283,6 +2309,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
 		vmcs_load(vmx->loaded_vmcs->vmcs);
+		indirect_branch_prediction_barrier();
 	}
 
 	if (!already_loaded) {
@@ -3340,6 +3367,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr_info);
 		break;
+	case MSR_IA32_PRED_CMD:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+			return 1;
+
+		if (data & ~PRED_CMD_IBPB)
+			return 1;
+
+		if (!data)
+			break;
+
+		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+
+		/*
+		 * For non-nested:
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through.
+		 *
+		 * For nested:
+		 * The handling of the MSR bitmap for L2 guests is done in
+		 * nested_vmx_merge_msr_bitmap. We should not touch the
+		 * vmcs02.msr_bitmap here since it gets completely overwritten
+		 * in the merging.
+		 */
+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
+					      MSR_TYPE_W);
+		break;
 	case MSR_IA32_CR_PAT:
 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -10042,9 +10097,23 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	struct page *page;
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
+	/*
+	 * pred_cmd is trying to verify two things:
+	 *
+	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
+	 *    ensures that we do not accidentally generate an L02 MSR bitmap
+	 *    from the L12 MSR bitmap that is too permissive.
+	 * 2. That L1 or L2s have actually used the MSR. This avoids
+	 *    unnecessarily merging of the bitmap if the MSR is unused. This
+	 *    works properly because we only update the L01 MSR bitmap lazily.
+	 *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
+	 *    updated to reflect this when L1 (or its L2s) actually write to
+	 *    the MSR.
+	 */
+	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
 
-	/* This shortcut is ok because we support only x2APIC MSRs so far. */
-	if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
+	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
+	    !pred_cmd)
 		return false;
 
 	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -10077,6 +10146,13 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 				MSR_TYPE_W);
 		}
 	}
+
+	if (pred_cmd)
+		nested_vmx_disable_intercept_for_msr(
+					msr_bitmap_l1, msr_bitmap_l0,
+					MSR_IA32_PRED_CMD,
+					MSR_TYPE_W);
+
 	kunmap(page);
 	kvm_release_page_clean(page);
 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [tip:x86/pti] KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-02-01 21:59 ` [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KarimAllah Ahmed
  2018-02-02 10:53   ` Darren Kenny
  2018-02-02 17:51   ` Konrad Rzeszutek Wilk
@ 2018-02-03 22:51   ` tip-bot for KarimAllah Ahmed
  2 siblings, 0 replies; 81+ messages in thread
From: tip-bot for KarimAllah Ahmed @ 2018-02-03 22:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: darren.kenny, jun.nakajima, ak, gregkh, konrad.wilk, jmattson,
	ashok.raj, tim.c.chen, aarcange, karahmed, tglx, hpa,
	linux-kernel, luto, torvalds, dave.hansen, pbonzini,
	arjan.van.de.ven, mingo, asit.k.mallick, dan.j.williams, dwmw

Commit-ID:  28c1c9fabf48d6ad596273a11c46e0d0da3e14cd
Gitweb:     https://git.kernel.org/tip/28c1c9fabf48d6ad596273a11c46e0d0da3e14cd
Author:     KarimAllah Ahmed <karahmed@amazon.de>
AuthorDate: Thu, 1 Feb 2018 22:59:44 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 3 Feb 2018 23:06:52 +0100

KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de

---
 arch/x86/kvm/cpuid.c |  2 +-
 arch/x86/kvm/vmx.c   | 15 +++++++++++++++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 73acdcf..e5f75eb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -594,6 +594,8 @@ struct vcpu_vmx {
 	u64 		      msr_guest_kernel_gs_base;
 #endif
 
+	u64 		      arch_capabilities;
+
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
 	u32 secondary_exec_control;
@@ -3260,6 +3262,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		msr_info->data = guest_read_tsc(vcpu);
 		break;
+	case MSR_IA32_ARCH_CAPABILITIES:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+			return 1;
+		msr_info->data = to_vmx(vcpu)->arch_capabilities;
+		break;
 	case MSR_IA32_SYSENTER_CS:
 		msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
 		break;
@@ -3395,6 +3403,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
 					      MSR_TYPE_W);
 		break;
+	case MSR_IA32_ARCH_CAPABILITIES:
+		if (!msr_info->host_initiated)
+			return 1;
+		vmx->arch_capabilities = data;
+		break;
 	case MSR_IA32_CR_PAT:
 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5657,6 +5670,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 		++vmx->nmsrs;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
 	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..4ec142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+	MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [tip:x86/pti] KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
                     ` (3 preceding siblings ...)
  2018-02-02 17:57   ` Jim Mattson
@ 2018-02-03 22:51   ` tip-bot for KarimAllah Ahmed
  4 siblings, 0 replies; 81+ messages in thread
From: tip-bot for KarimAllah Ahmed @ 2018-02-03 22:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, ak, ashok.raj, darren.kenny, pbonzini, torvalds,
	konrad.wilk, dwmw, jun.nakajima, dan.j.williams, linux-kernel,
	mingo, arjan.van.de.ven, tglx, gregkh, jmattson, karahmed, luto,
	asit.k.mallick, hpa, aarcange, tim.c.chen

Commit-ID:  d28b387fb74da95d69d2615732f50cceb38e9a4d
Gitweb:     https://git.kernel.org/tip/d28b387fb74da95d69d2615732f50cceb38e9a4d
Author:     KarimAllah Ahmed <karahmed@amazon.de>
AuthorDate: Thu, 1 Feb 2018 22:59:45 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 3 Feb 2018 23:06:52 +0100

KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de

---
 arch/x86/kvm/cpuid.c |   9 +++--
 arch/x86/kvm/vmx.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c   |   2 +-
 3 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 0x80000008.ebx */
 	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-		F(IBPB);
+		F(IBPB) | F(IBRS);
 
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+		F(ARCH_CAPABILITIES);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
 		entry->edx = 0;
-		/* IBPB isn't necessarily present in hardware cpuid */
+		/* IBRS and IBPB aren't necessarily present in hardware cpuid */
 		if (boot_cpu_has(X86_FEATURE_IBPB))
 			entry->ebx |= F(IBPB);
+		if (boot_cpu_has(X86_FEATURE_IBRS))
+			entry->ebx |= F(IBRS);
 		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
 		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 		break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e5f75eb..bee4c49 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -595,6 +595,7 @@ struct vcpu_vmx {
 #endif
 
 	u64 		      arch_capabilities;
+	u64 		      spec_ctrl;
 
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
@@ -1911,6 +1912,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 }
 
 /*
+ * Check if MSR is intercepted for currently loaded MSR bitmap.
+ */
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+{
+	unsigned long *msr_bitmap;
+	int f = sizeof(unsigned long);
+
+	if (!cpu_has_vmx_msr_bitmap())
+		return true;
+
+	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+
+	if (msr <= 0x1fff) {
+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+		msr &= 0x1fff;
+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+	}
+
+	return true;
+}
+
+/*
  * Check if MSR is intercepted for L01 MSR bitmap.
  */
 static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
@@ -3262,6 +3286,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		msr_info->data = guest_read_tsc(vcpu);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+			return 1;
+
+		msr_info->data = to_vmx(vcpu)->spec_ctrl;
+		break;
 	case MSR_IA32_ARCH_CAPABILITIES:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
@@ -3375,6 +3407,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr_info);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+			return 1;
+
+		/* The STIBP bit doesn't fault even if it's not advertised */
+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+			return 1;
+
+		vmx->spec_ctrl = data;
+
+		if (!data)
+			break;
+
+		/*
+		 * For non-nested:
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through.
+		 *
+		 * For nested:
+		 * The handling of the MSR bitmap for L2 guests is done in
+		 * nested_vmx_merge_msr_bitmap. We should not touch the
+		 * vmcs02.msr_bitmap here since it gets completely overwritten
+		 * in the merging. We update the vmcs01 here for L1 as well
+		 * since it will end up touching the MSR anyway now.
+		 */
+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
+					      MSR_IA32_SPEC_CTRL,
+					      MSR_TYPE_RW);
+		break;
 	case MSR_IA32_PRED_CMD:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
@@ -5700,6 +5763,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	u64 cr0;
 
 	vmx->rmode.vm86_active = 0;
+	vmx->spec_ctrl = 0;
 
 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
 	kvm_set_cr8(vcpu, 0);
@@ -9371,6 +9435,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	vmx_arm_hv_timer(vcpu);
 
+	/*
+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
+	 * is no need to worry about the conditional branch over the wrmsr
+	 * being speculatively taken.
+	 */
+	if (vmx->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
 	vmx->__launched = vmx->loaded_vmcs->launched;
 	asm(
 		/* Store host registers */
@@ -9489,6 +9562,27 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
 	      );
 
+	/*
+	 * We do not use IBRS in the kernel. If this vCPU has used the
+	 * SPEC_CTRL MSR it may have left it on; save the value and
+	 * turn it off. This is much more efficient than blindly adding
+	 * it to the atomic save/restore list. Especially as the former
+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+	 *
+	 * For non-nested case:
+	 * If the L01 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 *
+	 * For nested case:
+	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 */
+	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
+		rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
+	if (vmx->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
 
@@ -10113,7 +10207,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
 	/*
-	 * pred_cmd is trying to verify two things:
+	 * pred_cmd & spec_ctrl are trying to verify two things:
 	 *
 	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
 	 *    ensures that we do not accidentally generate an L02 MSR bitmap
@@ -10126,9 +10220,10 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	 *    the MSR.
 	 */
 	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+	bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
 
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
-	    !pred_cmd)
+	    !pred_cmd && !spec_ctrl)
 		return false;
 
 	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -10162,6 +10257,12 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 		}
 	}
 
+	if (spec_ctrl)
+		nested_vmx_disable_intercept_for_msr(
+					msr_bitmap_l1, msr_bitmap_l0,
+					MSR_IA32_SPEC_CTRL,
+					MSR_TYPE_R | MSR_TYPE_W);
+
 	if (pred_cmd)
 		nested_vmx_disable_intercept_for_msr(
 					msr_bitmap_l1, msr_bitmap_l0,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4ec142e..ac38143 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,7 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
-	MSR_IA32_ARCH_CAPABILITIES
+	MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:52             ` Alan Cox
@ 2018-02-05 19:22               ` Paolo Bonzini
  0 siblings, 0 replies; 81+ messages in thread
From: Paolo Bonzini @ 2018-02-05 19:22 UTC (permalink / raw)
  To: Alan Cox, Konrad Rzeszutek Wilk
  Cc: David Woodhouse, KarimAllah Ahmed, kvm, linux-kernel, x86,
	Ashok Raj, Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Peter Zijlstra

On 02/02/2018 21:52, Alan Cox wrote:
>>> No. The AMD feature bits give us more fine-grained support for exposing
>>> IBPB or IBRS alone, so we expose those bits on Intel too.  
>> But but.. that runs smack against the idea of exposing a platform that
>> is as close to emulating the real hardware as possible.
> Agreed, and it's asking for problems in the future if for example Intel
> or another non AMD vendor did ever use that leaf for something different.

Leaves starting at 0 are reserved to Intel; leaves starting at
0x80000000 are reserved to AMD.

0x40000000 to 0x400000FF (some will say 0x4FFFFFFF) are reserved to
hypervisors.

> Now whether there ought to be an MSR range every vendor agrees is never
> implemented so software can use it is an interesting discussion.

For MSRs there is no explicit indication, but traditionally Intel is
using numbers based at 0 and AMD is using numbers based at 0xC0000000.

Furthermore, the manuals for virtualization extensions tell you that
Intel isn't planning to go beyond 0x1FFF, and AMD is planning to use
only 0xC0000000-0xC0001FFF and 0xC0010000-0xC0011FFF.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-02 20:28           ` Konrad Rzeszutek Wilk
  2018-02-02 20:31               ` David Woodhouse
  2018-02-02 20:52             ` Alan Cox
@ 2018-02-05 19:24             ` Paolo Bonzini
  2 siblings, 0 replies; 81+ messages in thread
From: Paolo Bonzini @ 2018-02-05 19:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, David Woodhouse
  Cc: KarimAllah Ahmed, kvm, linux-kernel, x86, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Peter Zijlstra

On 02/02/2018 21:28, Konrad Rzeszutek Wilk wrote:
>>>> Nothing would *set* the IBPB bit though, since that's a "virtual" bit
>>>> on Intel hardware. The comment explains why we have that |= F(IBPB),
>>>> and if the comment wasn't true, we wouldn't need that code either.
>>> But this seems wrong. That is on Intel CPUs we will advertise on
>>> AMD leaf that the IBPB feature is available.
>>>
>>> Shouldn't we just check to see if the machine is AMD before advertising
>>> this bit?
>> No. The AMD feature bits give us more fine-grained support for exposing
>> IBPB or IBRS alone, so we expose those bits on Intel too.
> But but.. that runs smack against the idea of exposing a platform that
> is as close to emulating the real hardware as possible.
> 
> As in I would never expect an Intel CPU to expose the IBPB on the 0x8000_0008
> leaf. Hence KVM (nor any hypervisor) should not do it either.

This is KVM_GET_*SUPPORTED*_CPUID.  The actual CPUID bits that are
exposed (and also which CPUID leafs are there, even though this one is
present in both Intel and AMD) are determined by userspace.

Paolo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
  2018-02-02 17:49   ` Konrad Rzeszutek Wilk
  2018-02-03 22:50   ` [tip:x86/pti] KVM/x86: " tip-bot for Ashok Raj
@ 2018-02-16  3:44   ` Jim Mattson
  2018-02-16  4:22     ` Andi Kleen
  2018-05-03  1:27   ` Wanpeng Li
  3 siblings, 1 reply; 81+ messages in thread
From: Jim Mattson @ 2018-02-16  3:44 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm list, LKML, the arch/x86 maintainers, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Paolo Bonzini, Peter Zijlstra, David Woodhouse

On Thu, Feb 1, 2018 at 1:59 PM, KarimAllah Ahmed <karahmed@amazon.de> wrote:

> @@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>         case MSR_IA32_TSC:
>                 kvm_write_tsc(vcpu, msr);
>                 break;
> +       case MSR_IA32_PRED_CMD:
> +               if (!msr->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
> +                       return 1;
> +
> +               if (data & ~PRED_CMD_IBPB)
> +                       return 1;
> +
> +               if (!data)
> +                       break;
> +
> +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);

Should this be wrmsrl_safe? I don't see where we've verified host
support of this MSR.

> @@ -3342,6 +3369,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         case MSR_IA32_TSC:
>                 kvm_write_tsc(vcpu, msr_info);
>                 break;
> +       case MSR_IA32_PRED_CMD:
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> +                       return 1;
> +
> +               if (data & ~PRED_CMD_IBPB)
> +                       return 1;
> +
> +               if (!data)
> +                       break;
> +
> +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);

And here too...wrmsrl_safe?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-16  3:44   ` [PATCH v6 2/5] KVM: x86: " Jim Mattson
@ 2018-02-16  4:22     ` Andi Kleen
  0 siblings, 0 replies; 81+ messages in thread
From: Andi Kleen @ 2018-02-16  4:22 UTC (permalink / raw)
  To: Jim Mattson
  Cc: KarimAllah Ahmed, kvm list, LKML, the arch/x86 maintainers,
	Ashok Raj, Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra, David Woodhouse

> > +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> 
> Should this be wrmsrl_safe? I don't see where we've verified host
> support of this MSR.

In mainline all wrmsr are wrmsrl_safe now.

-Andi

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
                     ` (2 preceding siblings ...)
  2018-02-16  3:44   ` [PATCH v6 2/5] KVM: x86: " Jim Mattson
@ 2018-05-03  1:27   ` Wanpeng Li
  2018-05-03  9:19     ` Paolo Bonzini
  3 siblings, 1 reply; 81+ messages in thread
From: Wanpeng Li @ 2018-05-03  1:27 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, LKML, the arch/x86 maintainers, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Paolo Bonzini,
	Peter Zijlstra, David Woodhouse

Hi Ashok,
2018-02-02 5:59 GMT+08:00 KarimAllah Ahmed <karahmed@amazon.de>:
> From: Ashok Raj <ashok.raj@intel.com>
>
> The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
> control mechanism. It keeps earlier branches from influencing
> later ones.
>
> Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
> It's a command that ensures predicted branch targets aren't used after
> the barrier. Although IBRS and IBPB are enumerated by the same CPUID
> enumeration, IBPB is very different.
>
> IBPB helps mitigate against three potential attacks:
>
> * Mitigate guests from being attacked by other guests.
>   - This is addressed by issing IBPB when we do a guest switch.
>
> * Mitigate attacks from guest/ring3->host/ring3.
>   These would require a IBPB during context switch in host, or after
>   VMEXIT. The host process has two ways to mitigate
>   - Either it can be compiled with retpoline
>   - If its going through context switch, and has set !dumpable then
>     there is a IBPB in that path.
>     (Tim's patch: https://patchwork.kernel.org/patch/10192871)
>   - The case where after a VMEXIT you return back to Qemu might make
>     Qemu attackable from guest when Qemu isn't compiled with retpoline.
>   There are issues reported when doing IBPB on every VMEXIT that resulted
>   in some tsc calibration woes in guest.
>
> * Mitigate guest/ring0->host/ring0 attacks.
>   When host kernel is using retpoline it is safe against these attacks.
>   If host kernel isn't using retpoline we might need to do a IBPB flush on
>   every VMEXIT.
>

So for 1) guest->guest attacks 2) guest/ring3->host/ring3 attacks 3)
guest/ring0->host/ring0 attacks, if IBPB is enough to protect these
three scenarios and retpoline is not needed?

Regards,
Wanpeng Li

> Even when using retpoline for indirect calls, in certain conditions 'ret'
> can use the BTB on Skylake-era CPUs. There are other mitigations
> available like RSB stuffing/clearing.
>
> * IBPB is issued only for SVM during svm_free_vcpu().
>   VMX has a vmclear and SVM doesn't.  Follow discussion here:
>   https://lkml.org/lkml/2018/1/15/146
>
> Please refer to the following spec for more details on the enumeration
> and control.
>
> Refer here to get documentation about mitigations.
>
> https://software.intel.com/en-us/side-channel-security-support
>
> [peterz: rebase and changelog rewrite]
> [karahmed: - rebase
>            - vmx: expose PRED_CMD if guest has it in CPUID
>            - svm: only pass through IBPB if guest has it in CPUID
>            - vmx: support !cpu_has_vmx_msr_bitmap()]
>            - vmx: support nested]
> [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
>         PRED_CMD is a write-only MSR]
>
> Cc: Asit Mallick <asit.k.mallick@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> ---
> v6:
> - introduce msr_write_intercepted_l01
>
> v5:
> - Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR.
> - Always merge the bitmaps unconditionally.
> - Add PRED_CMD to direct_access_msrs.
> - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
> - rewrite the commit message (from ashok.raj@)
> ---
>  arch/x86/kvm/cpuid.c | 11 +++++++-
>  arch/x86/kvm/svm.c   | 28 ++++++++++++++++++
>  arch/x86/kvm/vmx.c   | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 116 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index c0eb337..033004d 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>                 F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
>                 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
>
> +       /* cpuid 0x80000008.ebx */
> +       const u32 kvm_cpuid_8000_0008_ebx_x86_features =
> +               F(IBPB);
> +
>         /* cpuid 0xC0000001.edx */
>         const u32 kvm_cpuid_C000_0001_edx_x86_features =
>                 F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
> @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>                 if (!g_phys_as)
>                         g_phys_as = phys_as;
>                 entry->eax = g_phys_as | (virt_as << 8);
> -               entry->ebx = entry->edx = 0;
> +               entry->edx = 0;
> +               /* IBPB isn't necessarily present in hardware cpuid */
> +               if (boot_cpu_has(X86_FEATURE_IBPB))
> +                       entry->ebx |= F(IBPB);
> +               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
> +               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
>                 break;
>         }
>         case 0x80000019:
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index f40d0da..254eefb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -249,6 +249,7 @@ static const struct svm_direct_access_msrs {
>         { .index = MSR_CSTAR,                           .always = true  },
>         { .index = MSR_SYSCALL_MASK,                    .always = true  },
>  #endif
> +       { .index = MSR_IA32_PRED_CMD,                   .always = false },
>         { .index = MSR_IA32_LASTBRANCHFROMIP,           .always = false },
>         { .index = MSR_IA32_LASTBRANCHTOIP,             .always = false },
>         { .index = MSR_IA32_LASTINTFROMIP,              .always = false },
> @@ -529,6 +530,7 @@ struct svm_cpu_data {
>         struct kvm_ldttss_desc *tss_desc;
>
>         struct page *save_area;
> +       struct vmcb *current_vmcb;
>  };
>
>  static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> @@ -1703,11 +1705,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>         __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
>         kvm_vcpu_uninit(vcpu);
>         kmem_cache_free(kvm_vcpu_cache, svm);
> +       /*
> +        * The vmcb page can be recycled, causing a false negative in
> +        * svm_vcpu_load(). So do a full IBPB now.
> +        */
> +       indirect_branch_prediction_barrier();
>  }
>
>  static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>         struct vcpu_svm *svm = to_svm(vcpu);
> +       struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>         int i;
>
>         if (unlikely(cpu != vcpu->cpu)) {
> @@ -1736,6 +1744,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>         if (static_cpu_has(X86_FEATURE_RDTSCP))
>                 wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>
> +       if (sd->current_vmcb != svm->vmcb) {
> +               sd->current_vmcb = svm->vmcb;
> +               indirect_branch_prediction_barrier();
> +       }
>         avic_vcpu_load(vcpu, cpu);
>  }
>
> @@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>         case MSR_IA32_TSC:
>                 kvm_write_tsc(vcpu, msr);
>                 break;
> +       case MSR_IA32_PRED_CMD:
> +               if (!msr->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
> +                       return 1;
> +
> +               if (data & ~PRED_CMD_IBPB)
> +                       return 1;
> +
> +               if (!data)
> +                       break;
> +
> +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> +               if (is_guest_mode(vcpu))
> +                       break;
> +               set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
> +               break;
>         case MSR_STAR:
>                 svm->vmcb->save.star = data;
>                 break;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d46a61b..263eb1f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -592,6 +592,7 @@ struct vcpu_vmx {
>         u64                   msr_host_kernel_gs_base;
>         u64                   msr_guest_kernel_gs_base;
>  #endif
> +
>         u32 vm_entry_controls_shadow;
>         u32 vm_exit_controls_shadow;
>         u32 secondary_exec_control;
> @@ -936,6 +937,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
>  static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
>                                             u16 error_code);
>  static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
> +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
> +                                                         u32 msr, int type);
>
>  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
>  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> @@ -1907,6 +1910,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
>         vmcs_write32(EXCEPTION_BITMAP, eb);
>  }
>
> +/*
> + * Check if MSR is intercepted for L01 MSR bitmap.
> + */
> +static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
> +{
> +       unsigned long *msr_bitmap;
> +       int f = sizeof(unsigned long);
> +
> +       if (!cpu_has_vmx_msr_bitmap())
> +               return true;
> +
> +       msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
> +
> +       if (msr <= 0x1fff) {
> +               return !!test_bit(msr, msr_bitmap + 0x800 / f);
> +       } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
> +               msr &= 0x1fff;
> +               return !!test_bit(msr, msr_bitmap + 0xc00 / f);
> +       }
> +
> +       return true;
> +}
> +
>  static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
>                 unsigned long entry, unsigned long exit)
>  {
> @@ -2285,6 +2311,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>         if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
>                 per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
>                 vmcs_load(vmx->loaded_vmcs->vmcs);
> +               indirect_branch_prediction_barrier();
>         }
>
>         if (!already_loaded) {
> @@ -3342,6 +3369,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         case MSR_IA32_TSC:
>                 kvm_write_tsc(vcpu, msr_info);
>                 break;
> +       case MSR_IA32_PRED_CMD:
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> +                       return 1;
> +
> +               if (data & ~PRED_CMD_IBPB)
> +                       return 1;
> +
> +               if (!data)
> +                       break;
> +
> +               wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> +
> +               /*
> +                * For non-nested:
> +                * When it's written (to non-zero) for the first time, pass
> +                * it through.
> +                *
> +                * For nested:
> +                * The handling of the MSR bitmap for L2 guests is done in
> +                * nested_vmx_merge_msr_bitmap. We should not touch the
> +                * vmcs02.msr_bitmap here since it gets completely overwritten
> +                * in the merging.
> +                */
> +               vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
> +                                             MSR_TYPE_W);
> +               break;
>         case MSR_IA32_CR_PAT:
>                 if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
>                         if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
> @@ -10044,9 +10099,23 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
>         struct page *page;
>         unsigned long *msr_bitmap_l1;
>         unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
> +       /*
> +        * pred_cmd is trying to verify two things:
> +        *
> +        * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
> +        *    ensures that we do not accidentally generate an L02 MSR bitmap
> +        *    from the L12 MSR bitmap that is too permissive.
> +        * 2. That L1 or L2s have actually used the MSR. This avoids
> +        *    unnecessarily merging of the bitmap if the MSR is unused. This
> +        *    works properly because we only update the L01 MSR bitmap lazily.
> +        *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
> +        *    updated to reflect this when L1 (or its L2s) actually write to
> +        *    the MSR.
> +        */
> +       bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
>
> -       /* This shortcut is ok because we support only x2APIC MSRs so far. */
> -       if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
> +       if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
> +           !pred_cmd)
>                 return false;
>
>         page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
> @@ -10079,6 +10148,13 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
>                                 MSR_TYPE_W);
>                 }
>         }
> +
> +       if (pred_cmd)
> +               nested_vmx_disable_intercept_for_msr(
> +                                       msr_bitmap_l1, msr_bitmap_l0,
> +                                       MSR_IA32_PRED_CMD,
> +                                       MSR_TYPE_W);
> +
>         kunmap(page);
>         kvm_release_page_clean(page);
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-05-03  1:27   ` Wanpeng Li
@ 2018-05-03  9:19     ` Paolo Bonzini
  2018-05-03 12:01       ` Wanpeng Li
  2018-05-03 12:46       ` Tian, Kevin
  0 siblings, 2 replies; 81+ messages in thread
From: Paolo Bonzini @ 2018-05-03  9:19 UTC (permalink / raw)
  To: Wanpeng Li, KarimAllah Ahmed
  Cc: kvm, LKML, the arch/x86 maintainers, Ashok Raj, Asit Mallick,
	Dave Hansen, Arjan Van De Ven, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Dan Williams,
	Jun Nakajima, Andy Lutomirski, Greg KH, Peter Zijlstra,
	David Woodhouse

On 03/05/2018 03:27, Wanpeng Li wrote:
> So for 1) guest->guest attacks 2) guest/ring3->host/ring3 attacks 3)
> guest/ring0->host/ring0 attacks, if IBPB is enough to protect these
> three scenarios and retpoline is not needed?

In theory yes, in practice if you want to do that IBPB is much more
expensive than retpolines, because you'd need an IBPB on vmexit or a
cache flush on vmentry.

Paolo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-05-03  9:19     ` Paolo Bonzini
@ 2018-05-03 12:01       ` Wanpeng Li
  2018-05-03 12:46       ` Tian, Kevin
  1 sibling, 0 replies; 81+ messages in thread
From: Wanpeng Li @ 2018-05-03 12:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KarimAllah Ahmed, kvm, LKML, the arch/x86 maintainers, Ashok Raj,
	Asit Mallick, Dave Hansen, Arjan Van De Ven, Tim Chen,
	Linus Torvalds, Andrea Arcangeli, Andi Kleen, Thomas Gleixner,
	Dan Williams, Jun Nakajima, Andy Lutomirski, Greg KH,
	Peter Zijlstra, David Woodhouse

2018-05-03 17:19 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 03/05/2018 03:27, Wanpeng Li wrote:
>> So for 1) guest->guest attacks 2) guest/ring3->host/ring3 attacks 3)
>> guest/ring0->host/ring0 attacks, if IBPB is enough to protect these
>> three scenarios and retpoline is not needed?
>
> In theory yes, in practice if you want to do that IBPB is much more
> expensive than retpolines, because you'd need an IBPB on vmexit or a
> cache flush on vmentry.

https://lkml.org/lkml/2018/1/4/615 Retpoline is not recommended on
Skylake, so we need to pay the penalty for IBPB flush on each vmexit I
think.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH v6 2/5] KVM: x86: Add IBPB support
  2018-05-03  9:19     ` Paolo Bonzini
  2018-05-03 12:01       ` Wanpeng Li
@ 2018-05-03 12:46       ` Tian, Kevin
  1 sibling, 0 replies; 81+ messages in thread
From: Tian, Kevin @ 2018-05-03 12:46 UTC (permalink / raw)
  To: Paolo Bonzini, Wanpeng Li, KarimAllah Ahmed
  Cc: kvm, LKML, the arch/x86 maintainers, Raj, Ashok, Mallick, Asit K,
	Hansen, Dave, Van De Ven, Arjan, Tim Chen, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Thomas Gleixner, Williams, Dan J,
	Nakajima, Jun, Andy Lutomirski, Greg KH, Peter Zijlstra,
	David Woodhouse

> From: Paolo Bonzini
> Sent: Thursday, May 3, 2018 5:20 PM
> 
> On 03/05/2018 03:27, Wanpeng Li wrote:
> > So for 1) guest->guest attacks 2) guest/ring3->host/ring3 attacks 3)
> > guest/ring0->host/ring0 attacks, if IBPB is enough to protect these
> > three scenarios and retpoline is not needed?
> 
> In theory yes, in practice if you want to do that IBPB is much more
> expensive than retpolines, because you'd need an IBPB on vmexit or a
> cache flush on vmentry.
> 

yes if HT is disabled. otherwise IBPB alone is not sufficient since it's 
just one-time effect while poison from sibling thread can happen 
anytime. in latter case retpoline or IBRS is expected to use with
IBPB in conjunction as a full mitigation.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2018-05-03 12:46 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-01 21:59 [PATCH v6 0/5] KVM: Expose speculation control feature to guests KarimAllah Ahmed
2018-02-01 21:59 ` KarimAllah Ahmed
2018-02-01 21:59 ` [PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KarimAllah Ahmed
2018-02-02 17:37   ` Jim Mattson
2018-02-03 22:50   ` [tip:x86/pti] KVM/x86: " tip-bot for KarimAllah Ahmed
2018-02-01 21:59 ` [PATCH v6 2/5] KVM: x86: Add IBPB support KarimAllah Ahmed
2018-02-02 17:49   ` Konrad Rzeszutek Wilk
2018-02-02 18:02     ` David Woodhouse
2018-02-02 18:02       ` David Woodhouse
2018-02-02 19:56       ` Konrad Rzeszutek Wilk
2018-02-02 20:16         ` David Woodhouse
2018-02-02 20:16           ` David Woodhouse
2018-02-02 20:28           ` Konrad Rzeszutek Wilk
2018-02-02 20:31             ` David Woodhouse
2018-02-02 20:31               ` David Woodhouse
2018-02-02 20:52               ` Konrad Rzeszutek Wilk
2018-02-02 20:52             ` Alan Cox
2018-02-05 19:22               ` Paolo Bonzini
2018-02-05 19:24             ` Paolo Bonzini
2018-02-03 22:50   ` [tip:x86/pti] KVM/x86: " tip-bot for Ashok Raj
2018-02-16  3:44   ` [PATCH v6 2/5] KVM: x86: " Jim Mattson
2018-02-16  4:22     ` Andi Kleen
2018-05-03  1:27   ` Wanpeng Li
2018-05-03  9:19     ` Paolo Bonzini
2018-05-03 12:01       ` Wanpeng Li
2018-05-03 12:46       ` Tian, Kevin
2018-02-01 21:59 ` [PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KarimAllah Ahmed
2018-02-02 10:53   ` Darren Kenny
2018-02-02 17:35     ` Jim Mattson
2018-02-02 17:51   ` Konrad Rzeszutek Wilk
2018-02-03 22:51   ` [tip:x86/pti] KVM/VMX: " tip-bot for KarimAllah Ahmed
2018-02-01 21:59 ` [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KarimAllah Ahmed
2018-02-02 11:03   ` Darren Kenny
2018-02-02 11:27   ` David Woodhouse
2018-02-02 11:27     ` David Woodhouse
2018-02-02 17:53   ` Konrad Rzeszutek Wilk
2018-02-02 18:05     ` David Woodhouse
2018-02-02 18:19       ` Konrad Rzeszutek Wilk
2018-02-02 17:57   ` Jim Mattson
2018-02-03 22:51   ` [tip:x86/pti] KVM/VMX: " tip-bot for KarimAllah Ahmed
2018-02-01 21:59 ` [PATCH v6 5/5] KVM: SVM: " KarimAllah Ahmed
2018-02-02 11:06   ` Darren Kenny
2018-02-02 18:02   ` Konrad Rzeszutek Wilk
  -- strict thread matches above, loose matches on Subject: below --
2018-01-12  1:32 [PATCH 0/5] Add support for IBRS & IBPB KVM support Ashok Raj
2018-01-12  1:32 ` [PATCH 1/5] x86/ibrs: Introduce native_rdmsrl, and native_wrmsrl Ashok Raj
2018-01-12  1:41   ` Andy Lutomirski
2018-01-12  1:52     ` Raj, Ashok
2018-01-12  2:20       ` Andy Lutomirski
2018-01-12  3:01         ` Raj, Ashok
2018-01-12  5:03           ` Dave Hansen
2018-01-12 16:28             ` Josh Poimboeuf
2018-01-12 16:28             ` Woodhouse, David
2018-01-13  6:20             ` Andy Lutomirski
2018-01-13 13:52               ` Van De Ven, Arjan
2018-01-13 15:20                 ` Andy Lutomirski
2018-01-13  6:19           ` Andy Lutomirski
2018-01-12  7:54   ` Greg KH
2018-01-12 12:28   ` Borislav Petkov
2018-01-12  1:32 ` [PATCH 2/5] x86/ibrs: Add new helper macros to save/restore MSR_IA32_SPEC_CTRL Ashok Raj
2018-01-12  1:32 ` [PATCH 3/5] x86/ibrs: Add direct access support for MSR_IA32_SPEC_CTRL Ashok Raj
2018-01-12  1:58   ` Dave Hansen
2018-01-12  3:14     ` Raj, Ashok
2018-01-12  9:51     ` Peter Zijlstra
2018-01-12 10:09       ` David Woodhouse
2018-01-15 13:45         ` Peter Zijlstra
2018-01-15 13:59           ` David Woodhouse
2018-01-15 14:45             ` Peter Zijlstra
2018-01-12  1:32 ` [PATCH 4/5] x86/svm: Direct access to MSR_IA32_SPEC_CTRL Ashok Raj
2018-01-12  7:23   ` David Woodhouse
2018-01-12  9:58     ` Peter Zijlstra
2018-01-12 10:13       ` David Woodhouse
2018-01-12 12:38   ` Paolo Bonzini
2018-01-12 15:14   ` Tom Lendacky
2018-01-12  1:32 ` [PATCH 5/5] x86/feature: Detect the x86 feature Indirect Branch Prediction Barrier Ashok Raj
2018-01-12 10:08   ` Peter Zijlstra
2018-01-12 12:32   ` Borislav Petkov
2018-01-12 12:39     ` Woodhouse, David
2018-01-12 15:21       ` Tom Lendacky
2018-01-12 15:31   ` Tom Lendacky
2018-01-12 15:36     ` Woodhouse, David
2018-01-12 17:06       ` Tom Lendacky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.